SemanticKernel.Reranker.BM25 1.0.0

.NET 8.0

dotnet add package SemanticKernel.Reranker.BM25 --version 1.0.0

NuGet\Install-Package SemanticKernel.Reranker.BM25 -Version 1.0.0

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="SemanticKernel.Reranker.BM25" Version="1.0.0" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="SemanticKernel.Reranker.BM25" Version="1.0.0" />
                    

                            Directory.Packages.props

<PackageReference Include="SemanticKernel.Reranker.BM25" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add SemanticKernel.Reranker.BM25 --version 1.0.0

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: SemanticKernel.Reranker.BM25, 1.0.0"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package SemanticKernel.Reranker.BM25@1.0.0

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=SemanticKernel.Reranker.BM25&version=1.0.0
                    

                            Install as a Cake Addin

#tool nuget:?package=SemanticKernel.Reranker.BM25&version=1.0.0
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

BM25 Reranker

A robust C# library for reranking search results using the classic BM25 algorithm with advanced natural language processing, leveraging the Catalyst NLP library.

Introduction
Why BM25 with NLP?
Features
Getting Started
Usage Example
How It Works
Customization
License

Introduction

This project provides a flexible C# implementation of BM25, a state-of-the-art ranking function used by search engines, enhanced with advanced natural language processing capabilities.
With this library, you can rerank search results or candidate passages using sophisticated tokenization, lemmatization, stop word removal, and multi-language support through the Catalyst NLP library.

Why BM25 with NLP?

Traditional BM25 relies on exact token overlap between query and document. However, raw text processing can be noisy:

Text contains punctuation, stop words, and varying word forms.
"running" vs "run", "cars" vs "car", mixed case, etc.
Different languages require different processing approaches.

By incorporating advanced NLP preprocessing:

The reranker uses lemmatization to normalize word forms (running → run).
Automatic language detection ensures proper processing for multilingual content.
Stop words are filtered out to focus on meaningful terms.
Part-of-speech tagging helps identify important content words.

NLP preprocessing enhances the precision and effectiveness of traditional BM25 scoring.

Features

BM25 core algorithm: Highly tunable (k1, b parameters).
Advanced NLP processing: Powered by the Catalyst library for tokenization and linguistic analysis.
Multi-language support: Automatic language detection with support for English, French, German, and more.
Intelligent preprocessing: Lemmatization, stop word removal, and part-of-speech filtering.
Asynchronous processing: Async tokenization and scoring for high performance.
Easy to extend: Customizable parameters and configurable language models.

Getting Started

Prerequisites

.NET 8.0+

Installation

Install the package via NuGet Package Manager or via the .NET CLI:

Usage Example

using SemanticKernel.Reranker.BM25;

// Sample documents to index
var documents = new List<string>
{
    "The quick brown fox jumps over the lazy dog.",
    "A brown dog jumps over another dog.",
    "The quick brown fox.",
    "Machine learning is a subset of artificial intelligence.",
    "Natural language processing helps computers understand human language."
};

// Create BM25 reranker with default parameters (k1=1.5, b=0.75)
var bm25 = new BM25Reranker(documents);

// Rank documents for a query
var results = await bm25.RankAsync("quick brown fox", topN: 3);

// Display results
foreach (var (documentIndex, score) in results)
{
    Console.WriteLine($"Document #{documentIndex}: Score = {score:F4}");
    Console.WriteLine($"Content: {documents[documentIndex]}");
    Console.WriteLine();
}

How It Works

Document Preprocessing: Each document is processed through the Catalyst NLP pipeline:
- Automatic language detection
- Tokenization into individual words
- Lemmatization to normalize word forms
- Stop word removal
- Part-of-speech filtering (removes punctuation and symbols)
Index Building: The system builds an inverted index with:
- Document frequency (DF) for each term
- Document lengths and average document length
- Preprocessed token lists for efficient scoring
Query Processing: Query text undergoes the same NLP preprocessing as documents
BM25 Scoring: For each document, calculates the BM25 score using:
- Term frequency (TF) in the document
- Inverse document frequency (IDF)
- Document length normalization
- Tunable parameters k1 and b

Customization

BM25 Parameters

You can customize the BM25 algorithm behavior:

// Custom k1 and b parameters
var bm25 = new BM25Reranker(documents, k1: 2.0, b: 0.5);

k1 (default: 1.5): Controls term frequency saturation. Higher values give more weight to repeated terms.
b (default: 0.75): Controls document length normalization. 0 = no normalization, 1 = full normalization.

Language Support

The library automatically detects document language and applies appropriate NLP models. Supported languages include:

English
French
German
Additional languages supported by Catalyst

License

This project is licensed under the MIT License - see the LICENSE.txt file for details.

Product	Compatible and additional computed target framework versions.
.NET	net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net8.0
- Catalyst (>= 1.0.54164)
- Catalyst.Models.English (>= 1.0.30952)
- Catalyst.Models.French (>= 1.0.30952)
- Catalyst.Models.German (>= 1.0.30952)
- Microsoft.SemanticKernel.Core (>= 1.61.0)
- System.Numerics.Tensors (>= 9.0.7)

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
1.0.0	12	8/19/2025
0.0.2	8	8/20/2025
0.0.1	12	8/20/2025
0.0.1-alpha01	10	8/19/2025