FastBatchLevenshtein 1.0.0

dotnet add package FastBatchLevenshtein --version 1.0.0
                    
NuGet\Install-Package FastBatchLevenshtein -Version 1.0.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="FastBatchLevenshtein" Version="1.0.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="FastBatchLevenshtein" Version="1.0.0" />
                    
Directory.Packages.props
<PackageReference Include="FastBatchLevenshtein" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add FastBatchLevenshtein --version 1.0.0
                    
#r "nuget: FastBatchLevenshtein, 1.0.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package FastBatchLevenshtein@1.0.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=FastBatchLevenshtein&version=1.0.0
                    
Install as a Cake Addin
#tool nuget:?package=FastBatchLevenshtein&version=1.0.0
                    
Install as a Cake Tool

FastBatchLevenshtein

FastBatchLevenshtein is a small, allocation-free fuzzy matching engine optimized for batch searching over many short keys using a bounded Levenshtein distance.

It is designed around a common workflow:

  1. Normalize many names/keys once (to a compact ASCII representation)
  2. Build an in-memory search engine over those normalized keys
  3. Repeatedly search with normalized queries using a maximum edit distance and a minimum similarity threshold

The API is intentionally low-level to avoid allocations and to be fast.

Targets

This library targets modern .NET runtimes (e.g. .NET 8 / .NET 9 / .NET 10).

Core concepts

Normalized keys

The engine searches over normalized keys represented as bytes (ReadOnlyMemory<byte> / ReadOnlySpan<byte>). The library includes helpers:

  • FBLHelper.Normalize(ReadOnlySpan<char> input, Span<byte> destination)
  • FBLHelper.NormalizeQuery(string input)
  • FBLHelper.NormalizeStringes(string[] names)

The normalizer emits only:

  • ASCII letters a-z (lowercased)
  • digits 0-9

Everything else is skipped.

For non-ASCII input it uses Unicode FormD decomposition and drops non-spacing marks (accents/diacritics), then applies the same ASCII-only filter.

Buckets

Internally, the engine stores keys in three fixed-width buckets by length:

  • bucket width 8 (keys of length 1..8)
  • bucket width 16 (keys of length 9..16)
  • bucket width 32 (keys of length 17..32)

This keeps candidate access fast and predictable.

Similarity score

Each match includes both:

  • Levenshtein edit distance (EditDistance)
  • a similarity score (Score) in range 0..1

Score is computed as:

1 - (distance / max(queryLength, candidateLength))

The MinRatio option filters results by this score.

Quick start

1) Normalize keys and build an engine

Below is an allocation-friendly pattern: use a stack buffer to normalize, then persist exactly the written bytes for storage.

using FastBatchLevenshtein;

string[] names =
[
    "John Smith",
    "Jon Smyth",
    "Jane Doe",
];

ReadOnlyMemory<byte>[] keys = new ReadOnlyMemory<byte>[names.Length];
Span<byte> tmp = stackalloc byte[FBLEngine.MaxKeyLength];

for (int i = 0; i < names.Length; i++)
{
    int written = FBLHelper.Normalize(names[i].AsSpan(), tmp);
    keys[i] = tmp[..written].ToArray();
}

// Build engine with implicit ids (0..N-1).
var engine = new FBLEngine(keys);

If you prefer convenience (allocates), you can normalize the whole dataset in one call:

ReadOnlyMemory<byte>[] keys = FBLHelper.NormalizeStringes(names);
var engine = new FBLEngine(keys);

Normalize the query into a temporary buffer, then search into a caller-provided results buffer:

Span<byte> qTmp = stackalloc byte[FBLEngine.MaxKeyLength];
int qLen = FBLHelper.Normalize("john smit".AsSpan(), qTmp);
ReadOnlySpan<byte> query = qTmp[..qLen];

Span<MatchResult> results = stackalloc MatchResult[16];
int count = engine.Search(query, results, new SearchOptions(MaxEdits: 2, MinRatio: 0.75f));

foreach (MatchResult r in results[..count])
    Console.WriteLine($"Id={r.CandidateId} dist={r.EditDistance} score={r.Score:0.000}");

If you prefer convenience for queries, you can use:

byte[] query = FBLHelper.NormalizeQuery("john smit");
int count = engine.Search(query, results);

Working with ids

If you want candidate ids to map to your own identifiers, pass them when constructing the engine:

int[] ids = [ 1001, 1002, 1003 ];
var engine = new FBLEngine(keys, ids);

Or build from NormalizedRecord:

NormalizedRecord[] records =
[
    new(1001, keys[0]),
    new(1002, keys[1]),
    new(1003, keys[2]),
];

var engine = new FBLEngine(records);

Convenience helpers

FBLHelper includes allocating convenience helpers for common workflows:

NormalizeQuery

Use when you want a normalized query as a freshly allocated byte array:

byte[] query = FBLHelper.NormalizeQuery("john smit");

NormalizeStringes

Use when you want to normalize a whole dataset of strings into keys suitable for constructing an engine:

string[] names = [ "John Smith", "Jon Smyth", "Jane Doe" ];
ReadOnlyMemory<byte>[] keys = FBLHelper.NormalizeStringes(names);
var engine = new FBLEngine(keys);

Important assumptions and constraints

This library makes several assumptions to stay fast and allocation-free.

1) Keys are already normalized

FBLEngine does not normalize strings. You must provide normalized byte keys.

You can use FBLNameNormalizer or implement your own normalization, but the engine assumes:

  • the data you search is the same representation as the query (same normalization rules)

2) Maximum key length is 32 bytes

  • FBLEngine.MaxKeyLength is 32.
  • Empty keys are ignored.
  • Keys longer than 32 bytes are ignored.
  • Queries longer than 32 bytes return 0 matches.

This limit is also baked into the internal bounded Levenshtein implementation (it uses stackalloc buffers sized to MaxKeyLength + 1).

The provided FBLNameNormalizer emits only ASCII letters and digits.

Non-ASCII is handled by stripping diacritics and discarding non-ASCII characters. This means some scripts/characters may be removed entirely.

If your domain requires non-ASCII matching, you likely need a different normalization strategy (and potentially a different engine design).

4) Output buffers limit results

Search writes into a caller-provided Span<MatchResult>.

  • If the span is too small, search stops early when it fills.
  • Results are returned in the engine's internal iteration order (not sorted).

If you need sorting (e.g., by score or distance), sort the returned slice yourself:

var slice = results[..count];
slice.Sort((a, b) => b.Score.CompareTo(a.Score));

5) Thread safety

FBLEngine is immutable after construction. Concurrent calls to Search are safe.

6) Memory / lifetime

When you build an engine from ReadOnlyMemory<byte>[], it copies keys into internal arrays. Your input arrays do not need to remain alive after construction.

Tuning search behavior

SearchOptions:

  • MaxEdits (default: 2)

    • maximum allowed edit distance
    • larger values increase work and may reduce relevance
  • MinRatio (default: 0.75)

    • filters by similarity score (0..1)
    • raise to reduce results, lower to include more

FAQ

Why bytes instead of strings?

Using bytes allows normalization to produce a compact representation and lets the engine avoid per-search allocations. Span<T> APIs also make it easy to integrate with high-performance code.

Does it handle transpositions (Damerau-Levenshtein)?

No. The distance is standard Levenshtein (insert/delete/replace).

License

See repository license information.

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 is compatible.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
  • net10.0

    • No dependencies.
  • net8.0

    • No dependencies.
  • net9.0

    • No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.0.0 96 5/11/2026