FastBatchLevenshtein 1.0.0

.NET 8.0

dotnet add package FastBatchLevenshtein --version 1.0.0

NuGet\Install-Package FastBatchLevenshtein -Version 1.0.0

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="FastBatchLevenshtein" Version="1.0.0" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="FastBatchLevenshtein" Version="1.0.0" />
                    

                            Directory.Packages.props

<PackageReference Include="FastBatchLevenshtein" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add FastBatchLevenshtein --version 1.0.0

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: FastBatchLevenshtein, 1.0.0"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package FastBatchLevenshtein@1.0.0

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=FastBatchLevenshtein&version=1.0.0
                    

                            Install as a Cake Addin

#tool nuget:?package=FastBatchLevenshtein&version=1.0.0
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

FastBatchLevenshtein

FastBatchLevenshtein is a small, allocation-free fuzzy matching engine optimized for batch searching over many short keys using a bounded Levenshtein distance.

It is designed around a common workflow:

Normalize many names/keys once (to a compact ASCII representation)
Build an in-memory search engine over those normalized keys
Repeatedly search with normalized queries using a maximum edit distance and a minimum similarity threshold

The API is intentionally low-level to avoid allocations and to be fast.

Targets

This library targets modern .NET runtimes (e.g. .NET 8 / .NET 9 / .NET 10).

Core concepts

Normalized keys

The engine searches over normalized keys represented as bytes (ReadOnlyMemory<byte> / ReadOnlySpan<byte>). The library includes helpers:

FBLHelper.Normalize(ReadOnlySpan<char> input, Span<byte> destination)
FBLHelper.NormalizeQuery(string input)
FBLHelper.NormalizeStringes(string[] names)

The normalizer emits only:

ASCII letters a-z (lowercased)
digits 0-9

Everything else is skipped.

For non-ASCII input it uses Unicode FormD decomposition and drops non-spacing marks (accents/diacritics), then applies the same ASCII-only filter.

Buckets

Internally, the engine stores keys in three fixed-width buckets by length:

bucket width 8 (keys of length 1..8)
bucket width 16 (keys of length 9..16)
bucket width 32 (keys of length 17..32)

This keeps candidate access fast and predictable.

Similarity score

Each match includes both:

Levenshtein edit distance (EditDistance)
a similarity score (Score) in range 0..1

Score is computed as:

1 - (distance / max(queryLength, candidateLength))

The MinRatio option filters results by this score.

Quick start

1) Normalize keys and build an engine

Below is an allocation-friendly pattern: use a stack buffer to normalize, then persist exactly the written bytes for storage.

using FastBatchLevenshtein;

string[] names =
[
    "John Smith",
    "Jon Smyth",
    "Jane Doe",
];

ReadOnlyMemory<byte>[] keys = new ReadOnlyMemory<byte>[names.Length];
Span<byte> tmp = stackalloc byte[FBLEngine.MaxKeyLength];

for (int i = 0; i < names.Length; i++)
{
    int written = FBLHelper.Normalize(names[i].AsSpan(), tmp);
    keys[i] = tmp[..written].ToArray();
}

// Build engine with implicit ids (0..N-1).
var engine = new FBLEngine(keys);

If you prefer convenience (allocates), you can normalize the whole dataset in one call:

ReadOnlyMemory<byte>[] keys = FBLHelper.NormalizeStringes(names);
var engine = new FBLEngine(keys);

2) Search

Normalize the query into a temporary buffer, then search into a caller-provided results buffer:

Span<byte> qTmp = stackalloc byte[FBLEngine.MaxKeyLength];
int qLen = FBLHelper.Normalize("john smit".AsSpan(), qTmp);
ReadOnlySpan<byte> query = qTmp[..qLen];

Span<MatchResult> results = stackalloc MatchResult[16];
int count = engine.Search(query, results, new SearchOptions(MaxEdits: 2, MinRatio: 0.75f));

foreach (MatchResult r in results[..count])
    Console.WriteLine($"Id={r.CandidateId} dist={r.EditDistance} score={r.Score:0.000}");

If you prefer convenience for queries, you can use:

byte[] query = FBLHelper.NormalizeQuery("john smit");
int count = engine.Search(query, results);

Working with ids

If you want candidate ids to map to your own identifiers, pass them when constructing the engine:

int[] ids = [ 1001, 1002, 1003 ];
var engine = new FBLEngine(keys, ids);

Or build from NormalizedRecord:

NormalizedRecord[] records =
[
    new(1001, keys[0]),
    new(1002, keys[1]),
    new(1003, keys[2]),
];

var engine = new FBLEngine(records);

Convenience helpers

FBLHelper includes allocating convenience helpers for common workflows:

NormalizeQuery

Use when you want a normalized query as a freshly allocated byte array:

byte[] query = FBLHelper.NormalizeQuery("john smit");

NormalizeStringes

Use when you want to normalize a whole dataset of strings into keys suitable for constructing an engine:

string[] names = [ "John Smith", "Jon Smyth", "Jane Doe" ];
ReadOnlyMemory<byte>[] keys = FBLHelper.NormalizeStringes(names);
var engine = new FBLEngine(keys);

Important assumptions and constraints

This library makes several assumptions to stay fast and allocation-free.

1) Keys are already normalized

FBLEngine does not normalize strings. You must provide normalized byte keys.

You can use FBLNameNormalizer or implement your own normalization, but the engine assumes:

the data you search is the same representation as the query (same normalization rules)

2) Maximum key length is 32 bytes

FBLEngine.MaxKeyLength is 32.
Empty keys are ignored.
Keys longer than 32 bytes are ignored.
Queries longer than 32 bytes return 0 matches.

This limit is also baked into the internal bounded Levenshtein implementation (it uses stackalloc buffers sized to MaxKeyLength + 1).

3) ASCII letters/digits only (recommended)

The provided FBLNameNormalizer emits only ASCII letters and digits.

Non-ASCII is handled by stripping diacritics and discarding non-ASCII characters. This means some scripts/characters may be removed entirely.

If your domain requires non-ASCII matching, you likely need a different normalization strategy (and potentially a different engine design).

4) Output buffers limit results

Search writes into a caller-provided Span<MatchResult>.

If the span is too small, search stops early when it fills.
Results are returned in the engine's internal iteration order (not sorted).

If you need sorting (e.g., by score or distance), sort the returned slice yourself:

var slice = results[..count];
slice.Sort((a, b) => b.Score.CompareTo(a.Score));

5) Thread safety

FBLEngine is immutable after construction. Concurrent calls to Search are safe.

6) Memory / lifetime

When you build an engine from ReadOnlyMemory<byte>[], it copies keys into internal arrays. Your input arrays do not need to remain alive after construction.

Tuning search behavior

SearchOptions:

MaxEdits (default: 2)
- maximum allowed edit distance
- larger values increase work and may reduce relevance
MinRatio (default: 0.75)
- filters by similarity score (0..1)
- raise to reduce results, lower to include more

FAQ

Why bytes instead of strings?

Using bytes allows normalization to produce a compact representation and lets the engine avoid per-search allocations. Span<T> APIs also make it easy to integrate with high-performance code.

Does it handle transpositions (Damerau-Levenshtein)?

No. The distance is standard Levenshtein (insert/delete/replace).

License

See repository license information.

Product	Compatible and additional computed target framework versions.
.NET	net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net10.0
- No dependencies.
net8.0
- No dependencies.
net9.0
- No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
1.0.0	96	5/11/2026