DM.Vectors.Transformations
1.0.1
dotnet add package DM.Vectors.Transformations --version 1.0.1
NuGet\Install-Package DM.Vectors.Transformations -Version 1.0.1
<PackageReference Include="DM.Vectors.Transformations" Version="1.0.1" />
paket add DM.Vectors.Transformations --version 1.0.1
#r "nuget: DM.Vectors.Transformations, 1.0.1"
// Install DM.Vectors.Transformations as a Cake Addin
#addin nuget:?package=DM.Vectors.Transformations&version=1.0.1
// Install DM.Vectors.Transformations as a Cake Tool
#tool nuget:?package=DM.Vectors.Transformations&version=1.0.1
DM.Vectors.Transformations
When you update a package please refresh/reload/close-open your project or solution usefull transformations on vectors for datamining purpose Transformations are implemented as extension methods for Vectors see DM.Vectors for examples on vectors see DM.Parsers.Csv to parse csv files to vectors This project is under continuous developpement which means more and more transformations will be available
Getting Started
using System;
using System.Collections.Generic;
using System.Text;
using System.Linq;
using DM.Vectors;
using DM.Vectors.Transformations;
using DM.Lemmatizers;
using DM.Lemmatizers.Wrappers;
namespace Test
{
class TestVectorsTransformations
{
public static void MainTest()
{
// vector to store string data for test
IGVector<string> strVect = null;
// vector to store datetime data for test
IGVector<DateTime?> nullableDateTimeVect = null;
// A wrapper around LemmaSharpLemmatizer for use with vectors
// Get LemmaSharp dlls from http://lemmatise.ijs.si/Software/Version3
// Or use any other lemmatizer and implement the ILemmatizer interface
// see DM.Lemmatizers.Wrappers for the implementation of LemmaSharpMultiLangWrapper
// Licence of any code used in DM.Lemmatizers.Wrappers is inherited from the wrapped package
// It's GPL or MIT for most packages
ILemmatizer lemmatizer = new LemmaSharpMultiLang();
// first parameter is a word count dictionary for each row in strVect
// second parameter is a global word count dictionary from strVect
// the result is a set of words that are considered relevent and used to create count vectors
// count vector is just a vector that store the count of a specific word in a row from strVect
// lemmatization is done before
Func<IDictionary<string, int>[], IDictionary<string, int>, ISet<string>> wordSelector = null;
// first is an array of lemmatized strings from strVect
// second is the set of words kept using wordSelector
Tuple<string[], ISet<string>> lemsSelectedWords = null;
// set of word count vectors extracted using selected words
IEnumerable<Tuple<string, IGVector<int>>> wordVectCountSet = null;
// set of values in a categorical vector
ISet<string> vals = null;
// first is the categorical value
// second is the vector representing whether the value is present or not at each row
IEnumerable<Tuple<string, IGVector<bool>>> valVectExistsSet = null;
// Create test strVect
strVect = new GDenseSparseVector<string>(10);
strVect.SetAt(0, "dogs books allowing chiens livres autorisant");
strVect.SetAt(5, "(:*dogs--books (allowing) [chiens livres] autorisant");
strVect.SetAt(2, "(:*cats- :D .books, ??! [écoles livres] allez going going");
// Keep only words recurring at least 2 times
wordSelector = (mapWordCounts, mergedMapWordCount) =>
{
ISet<string> selectedWords =
mergedMapWordCount
.Where(wc => wc.Value >= 2)
.Select(wc => wc.Key)
.ToHashSet();
return selectedWords;
};
// Lemmatize strVect,
// perform word count at each row,
// merge word counts to get global word count,
// perform word selection using wordselector
// return lemmatized array and selected words
lemsSelectedWords = strVect.LemmatizeMapWordCountMergeSelectWords
(0, strVect.Length, lemmatizer, wordSelector);
// Create a string Vector from the lemmatized array
strVect = GVectorCreator.CreateFromDense
(lemsSelectedWords.Item1, 0, 0, lemsSelectedWords.Item1.Length);
// Extract count vectors from strVect and selected words
wordVectCountSet = strVect.ExtractMapWordCountVectSet(0, strVect.Length, lemsSelectedWords.Item2);
// Show result
foreach (Tuple<string, IGVector<int>> wordVectCount in wordVectCountSet)
{
IGVector<int> intVect = wordVectCount.Item2;
Console.WriteLine("\n\n");
Console.WriteLine
(
"vect capacity: " + intVect.Capacity +
" vect length: " + intVect.Length +
" vect dense length: " + intVect.DenseLength
);
Console.WriteLine("word: " + wordVectCount.Item1);
Console.WriteLine("word count at each row: ");
for (int i = 0; i < intVect.Length; ++i)
Console.WriteLine(intVect.GetAt(i));
}
// Create a categorical vector
strVect.SetAt(0, "dog");
strVect.SetAt(1, "cat");
strVect.SetAt(2, "mouse");
strVect.SetAt(4, "dog");
strVect.SetAt(5, "dog");
strVect.SetAt(7, "mouse");
// Extract distinct values
// Enumerable.Range(0, strVect.Length).Select(i => strVect.GetAt(i)).ToHashset();
vals = strVect.ExtractDistinctVals(0, strVect.Length);
// Extract bool vectors representing whether a value exists at row or not
valVectExistsSet = strVect.ExtractBoolVects(0, strVect.Length, vals);
// Show result
foreach (Tuple<string, IGVector<bool>> valVectExists in valVectExistsSet)
{
IGVector<bool> boolVect = valVectExists.Item2;
Console.WriteLine("\n\n");
Console.WriteLine
(
"vect capacity: " + boolVect.Capacity +
" vect length: " + boolVect.Length +
" vect dense length: " + boolVect.DenseLength
);
Console.WriteLine("val: " + valVectExists.Item1);
Console.WriteLine("val presence in each row: ");
for (int i = 0; i < boolVect.Length; ++i)
Console.WriteLine(boolVect.GetAt(i));
}
}
}
}
Authors
DataminingMasters
License
This project is licensed under the MIT License - see the LICENSE.md file for details
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
.NET Core | netcoreapp2.0 is compatible. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
-
.NETCoreApp 2.0
- DM.Lemmatizers (>= 1.0.0)
- DM.Serializers (>= 1.0.3)
- DM.Vectors (>= 1.0.6)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last updated |
---|