TurboQuant 0.1.1

dotnet add package TurboQuant --version 0.1.1
                    
NuGet\Install-Package TurboQuant -Version 0.1.1
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="TurboQuant" Version="0.1.1" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="TurboQuant" Version="0.1.1" />
                    
Directory.Packages.props
<PackageReference Include="TurboQuant" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add TurboQuant --version 0.1.1
                    
#r "nuget: TurboQuant, 0.1.1"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package TurboQuant@0.1.1
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=TurboQuant&version=0.1.1
                    
Install as a Cake Addin
#tool nuget:?package=TurboQuant&version=0.1.1
                    
Install as a Cake Tool

TurboQuant

Near-optimal vector quantization for .NET 8+

Compress embeddings to 2-4 bits with <0.5% quality loss. Zero dependencies. AOT-ready.

NuGet Build License .NET AOT


Implements Algorithm 1 from TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate (Zandieh et al., ICLR 2026). Lloyd-Max quantization on the exact Beta(d/2, 1/2) distribution with randomized rotation.

768-dim float32 embedding  ──>  TurboQuant 4-bit  ──>  512 bytes
       3072 bytes                                       6x smaller
                                                        cosine > 0.995

Install

dotnet add package TurboQuant

30-second Example

using TurboQuant;

var quantizer = TurboQuantBuilder.Create(dim: 768).WithBits(4).BuildMSE();

var packed = quantizer.Quantize(embedding);        // float[768] -> PackedVector
var restored = quantizer.Dequantize(packed);       // PackedVector -> float[768]
var score = quantizer.ApproxSimilarity(a, b);      // LUT-based, zero allocation

// Zero-alloc dequantize into existing buffer
Span<float> buffer = stackalloc float[768];
quantizer.Dequantize(packed, buffer);

// Serialize for storage
var bytes = packed.ToBytes();
var loaded = PackedVector.FromBytes(bytes);

Why TurboQuant?

TurboQuant Product Quantization Scalar Quantization
Quality (4-bit) cosine > 0.995 cosine ~ 0.96 cosine ~ 0.98
Quantization speed ~44 us/vec ~500 ms/vec ~1 us/vec
Similarity search LUT, zero alloc Asymmetric dist Full decompress
Needs training data? No Yes (codebook training) No
Online / streaming? Yes No (batch only) Yes
Theory-backed MSE? Yes (proven optimal) No No

Performance

Benchmarked on Apple Silicon (ARM64), .NET 10, Release mode:

Operation dim=256 dim=768 Heap Alloc
Quantize 44 us 44 us 152 B
Dequantize 42 us 42 us 1 KB
ApproxSimilarity 1.2 us 0.8 us 0 B
DotProduct (SIMD) 136 ns 505 ns 0 B

ApproxSimilarity uses a precomputed 16x16 lookup table for 4-bit. No unpacking, no centroid lookup. Direct byte-level scan.

Compression Quality

Validated against the paper's theoretical bounds:

Bits Ratio D_mse Cosine Paper Upper Bound
4-bit ~6-8x ⇐ 0.0095 > 0.995 0.011
3-bit ~8-10x ⇐ 0.035 > 0.98 0.043
2-bit ~12-16x ⇐ 0.118 > 0.93 0.170

Ratios depend on whether the dimension is a power of 2 (no padding needed) or not.


Use Cases

Embedding Compression (OpenAI, Ollama, Azure AI)

var embedding = await embeddingModel.GenerateAsync(["your text"]);
var vector = embedding[0].Vector.ToArray();

var quantizer = TurboQuantBuilder.Create(dim: 768).WithBits(4).BuildMSE();
var packed = quantizer.Quantize(vector);

// Store packed vectors in your DB, cache, or memory
// Search directly on compressed data (zero allocation):
var score = quantizer.ApproxSimilarity(queryPacked, docPacked);

100K embeddings (768d): 292 MB → 37 MB

RAG Pipeline: Compressed Vector Store

var quantizer = TurboQuantBuilder.Create(dim: 768).WithBits(4).BuildMSE();
var store = new Dictionary<string, PackedVector>();

// Index
foreach (var doc in documents)
{
    var emb = await generator.GenerateAsync([doc.Text]);
    store[doc.Id] = quantizer.Quantize(emb[0].Vector.ToArray());
}

// Search (LUT-based, zero allocation)
var queryPacked = quantizer.Quantize(queryEmbedding);
var topK = store
    .Select(kv => (kv.Key, Score: quantizer.ApproxSimilarity(queryPacked, kv.Value)))
    .OrderByDescending(x => x.Score)
    .Take(10);

LLM KV Cache: Extend Context Length

var (cache, window) = TurboQuantBuilder
    .Create(dim: 128)             // head dimension
    .WithKeyBits(4)               // keys: 4-bit (attention precision)
    .WithValueBits(2)             // values: 2-bit (more aggressive)
    .WithResidualWindow(tokens: 128)
    .BuildKVCache();

// Append during generation. Recent tokens stay in float32,
// older tokens are automatically compressed
window.Append(keyVector, valueVector);

// Retrieve for attention computation
var allKeys = window.GetAllKeys();     // recent: exact, older: decompressed
var allValues = window.GetAllValues();

32 layers x 32 heads x 4K tokens x 128d: 32 GB → ~6 GB

Semantic Kernel: Compressed Memory

public class CompressedMemoryStore : IMemoryStore
{
    private readonly TurboQuantMSE _quantizer;
    private readonly ConcurrentDictionary<string, (PackedVector Vec, MemoryRecord Rec)> _data = new();

    public CompressedMemoryStore(int dim) =>
        _quantizer = TurboQuantBuilder.Create(dim).WithBits(4).BuildMSE();

    public Task<string> UpsertAsync(string collection, MemoryRecord record, CancellationToken ct)
    {
        var packed = _quantizer.Quantize(record.Embedding.Vector.ToArray());
        _data[record.Metadata.Id] = (packed, record);
        return Task.FromResult(record.Metadata.Id);
    }

    public Task<MemoryRecord?> GetNearestMatchAsync(
        string collection, ReadOnlyMemory<float> query, double minScore, CancellationToken ct)
    {
        var qPacked = _quantizer.Quantize(query.Span);
        return Task.FromResult(_data.Values
            .Select(d => (d.Rec, Score: (double)_quantizer.ApproxSimilarity(qPacked, d.Vec)))
            .Where(x => x.Score >= minScore)
            .MaxBy(x => x.Score).Rec);
    }
}

ML.NET Pipeline

var quantizer = TurboQuantBuilder.Create(dim: 384).WithBits(4).BuildMSE();

var pipeline = mlContext.Transforms.CustomMapping<Input, Output>(
    (input, output) =>
    {
        var packed = quantizer.Quantize(input.Features);
        output.Compressed = quantizer.Dequantize(packed);
    },
    contractName: "TurboQuant");

ONNX Runtime: Post-Inference Compression

using var session = new InferenceSession("embedding_model.onnx");
var quantizer = TurboQuantBuilder.Create(dim: 768).WithBits(4).BuildMSE();

using var results = session.Run(inputs);
var outputTensor = results.First().AsEnumerable<float>().ToArray();

var compressed = new PackedVector[batchSize];
quantizer.QuantizeBatch(outputTensor, compressed);

Serialization

// Save
var bytes = packed.ToBytes();
File.WriteAllBytes("vector.bin", bytes);

// Load
var loaded = PackedVector.FromBytes(File.ReadAllBytes("vector.bin"));
var restored = quantizer.Dequantize(loaded);

Rotation Strategies

TurboQuant defaults to random orthogonal rotation (QR decomposition of a Gaussian matrix), which matches the paper exactly and works with any dimension without overhead.

For power-of-2 dimensions (128, 256, 512, 1024), the Hadamard rotation is an excellent alternative: identical quality, O(d log d) time, and O(d) memory instead of O(d^2).

For non-power-of-2 dimensions (384, 768, 1536), Hadamard pads internally to the next power of 2, which increases packed data size by up to 33%. A Trace.TraceWarning is emitted when this happens.

// Default: random orthogonal (paper-correct, any dimension)
var quantizer = TurboQuantBuilder.Create(dim: 768).WithBits(4).BuildMSE();

// Fast path for power-of-2 dimensions (recommended for KV cache with head dim 64/128)
var fast = TurboQuantBuilder.Create(dim: 1024).WithBits(4).WithHadamardRotation().BuildMSE();
Random Orthogonal (default) Hadamard
Time O(d^2) O(d log d)
Memory O(d^2) per quantizer O(d) per quantizer
Packed size Exact (d coordinates) Exact if power-of-2, padded otherwise
dim=768 quantize ~400 us ~44 us (but 33% more storage)
dim=1024 quantize ~700 us ~50 us (no padding)
D_mse = theoretical = theoretical (power-of-2), better (non-power-of-2)

Best practice: if your embedding model supports configurable dimensions (e.g. OpenAI, nomic), choose a power-of-2 dimension (512, 1024) to get the best of both strategies.

Note on seed correlation: the rotation matrix is generated from a random seed. If your test vectors are generated from the same seed (e.g. new Random(42) for both), the rotation and data become correlated, causing degraded quality. This does not happen in practice (embeddings come from a model, not from Random), but use distinct seeds in synthetic benchmarks.

How It Works

                    ┌──────────────┐
   float[768]  ──>  │  Normalize   │ store ||x||
                    └──────┬───────┘
                           v
                    ┌──────────────┐
                    │   Rotate     │ random orthogonal (default) or Hadamard
                    └──────┬───────┘
                           v
                    ┌──────────────┐
                    │  Quantize    │ per-coordinate Lloyd-Max on Beta(d/2, 1/2)
                    └──────┬───────┘
                           v
                    ┌──────────────┐
                    │  Bit-Pack    │ 4-bit: 2 values/byte, 3-bit: 8 values/3 bytes
                    └──────┬───────┘
                           v
                    PackedVector (data + norm)

The rotation ensures each coordinate is approximately Beta-distributed regardless of input, enabling provably near-optimal scalar quantization. D_mse is within factor sqrt(3)pi/2 ~ 2.7x of the information-theoretic lower bound (Theorem 3 in the paper).

ApproxSimilarity computes cosine similarity directly on packed bytes via a precomputed 16x16 centroid dot-product lookup table (4-bit). No unpacking, no centroid lookup, zero heap allocation.

Project Layout

src/TurboQuant/
  Core/
    Rotation/       RandomRotation (default, paper-correct), HadamardRotation (fast, power-of-2)
    Codebook/       BetaCodebook (Lloyd-Max on exact Beta distribution, with LUT)
    Packing/        BitPacker (2/3/4-bit), PackedVector (serializable, immutable)
    Quantizers/     TurboQuantMSE, TurboQuantAsymmetric
    Simd/           DotProductSimd (AVX2/SSE2/NEON via .NET intrinsics)
  Cache/            KVCache, ResidualWindow
  Diagnostics/      CompressionStats (thread-safe), QuantizationBenchmark
  TurboQuantBuilder.cs

Build & Test

dotnet build TurboQuantNet.slnx
dotnet test
dotnet run --project samples/TurboQuant.Demo
dotnet run -c Release --project benchmarks/TurboQuant.Benchmarks

License

MIT. Based on the paper by Zandieh, Daliri, Hadian & Mirrokni (Google Research / NYU / Google DeepMind).

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
  • net10.0

    • No dependencies.
  • net8.0

    • No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.1.1 84 4/3/2026