RecordLinkageNet 1.0.0

dotnet add package RecordLinkageNet --version 1.0.0
NuGet\Install-Package RecordLinkageNet -Version 1.0.0
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="RecordLinkageNet" Version="1.0.0" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add RecordLinkageNet --version 1.0.0
#r "nuget: RecordLinkageNet, 1.0.0"
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install RecordLinkageNet as a Cake Addin
#addin nuget:?package=RecordLinkageNet&version=1.0.0

// Install RecordLinkageNet as a Cake Tool
#tool nuget:?package=RecordLinkageNet&version=1.0.0

dotnet 6.0 build code quality check code coverage Code Coverage

Overview

aim: opensource library which offers help to compare datasets (csv, database tables,classes) in a memory-limited environment

license BSD 2-Clause

This project is a pure c# port of the super useful python package recordlinkage. Besides it tries to use the effective parts of the c# language (e.g. linq, dataflow).

features

  • string comparision with multiple string metrics
  • uses scoring method to calculate overall similarity
  • uses own datatable struture to reduce memory footprint (in comparsison to system.data.datatable)
  • uses dataflow to reduce memory footprint
  • uses parallelism to reduce runtime
  • limits: right now every datacell is string

plattforms:

all plattform which supports .NET 6.0 so:

  • Linux
  • MacOs
  • Windows

minimal examples

This project should look and feel like using the pyhton equivalent:

//we create some testdata //see UnitTest.TestDataPerson
List<TestDataPerson> testDataPeopleA = new List<TestDataPerson>
{
    new TestDataPerson("Thomas", "Mueller", "Lindetrasse", "Testhausen", "12345"),
    new TestDataPerson("Thomas", "Mueller", "Lindenstrasse", "Testcity", "012345"),
    new TestDataPerson("Thomas", "Müller", "Lindenstrasse", "Testcity", "012345"),
    new TestDataPerson("Tomas", "Müller", "Lindenstroad", "Testhausen", "012342"),
    new TestDataPerson("Tomas", "Müller", "Lindenstroad", "Dorf", "012342")
};
DataTableFeather tabA = TableConverter.CreateTableFeatherFromDataObjectList(testDataPeopleA);

//we load some data from sqlite file
DataTableFeather tabB = RecordLinkageNet.Util.SqliteReader.ReadTableFromSqliteFile("filenameof.sqlite","testtablename");

ConditionList conList = new ConditionList();
Condition.StringMethod testMethod = Condition.StringMethod.JaroWinklerSimilarity;
conList.String("NameFirst", "NameFirst", testMethod);
conList.String("Street", "Street", testMethod);
conList.String("PostalCode", "PostalCode", Condition.StringMethod.Exact);
conList.String("NameLast", "NameLast", testMethod);

//configure comparison
Configuration config = Configuration.Instance;
config.AddIndex(new IndexFeather().Create(tabB, tabA));
config.AddConditionList(conList);
config.SetStrategy(Configuration.CalculationStrategy.WeightedConditionSum);
config.SetNumberTransposeModus(NumberTransposeHelper.TransposeModus.LOG10); ;

//we init a worker
WorkScheduler workScheduler = new WorkScheduler();
var pipeLineCancellation = new CancellationTokenSource();//for optional cancellation
var resultTask = workScheduler.Compare(pipeLineCancellation.Token);

await resultTask;

int amount = resultTask.Result.Count();

The project implements mutliple metrics for string comparision as extensions:

  • HammingDistance
  • DamerauLevenshteinDistance
  • JaroDistance
  • JaroWinklerSimilarity
  • ShannonEntropyDistance
using RecordLinkageNet.Core.Distance;
 
var result1 = "foo".HammingDistance("bar");//3
var result2 = "foo".DamerauLevenshteinDistance("bar");//3
var result3 = "foo".JaroWinklerSimilarity("bar");//0

The distances metrics are well tested with results from python lib jellyfish.

structure:

folder description
RecordLinkageNet c# library code
UnitTest test for the lib

thanks to

Product Compatible and additional computed target framework versions.
.NET net6.0 is compatible.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
1.0.0 128 9/21/2023