Codec.Net 0.1.0

dotnet add package Codec.Net --version 0.1.0
                    
NuGet\Install-Package Codec.Net -Version 0.1.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Codec.Net" Version="0.1.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Codec.Net" Version="0.1.0" />
                    
Directory.Packages.props
<PackageReference Include="Codec.Net" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Codec.Net --version 0.1.0
                    
#r "nuget: Codec.Net, 0.1.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package Codec.Net@0.1.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=Codec.Net&version=0.1.0
                    
Install as a Cake Addin
#tool nuget:?package=Codec.Net&version=0.1.0
                    
Install as a Cake Tool

Codec.Net

Isomorphic tokenizer + detokenizer for the Codec binary transport protocol — for .NET.

Decodes streaming token IDs from Codec-compliant servers (vLLM, SGLang) and encodes text into IDs for the bidirectional path. Pure managed code, no native dependencies beyond MessagePack.

The functional twin of @codecai/web (browser/Node) and codecai (Python). Same tokenizer dialect maps work everywhere.

Install

dotnet add package Codec.Net

Targets net8.0. Works in any .NET 8+ host: ASP.NET Core, Blazor, MAUI, console, Unity 2023+, Function Apps.

Quick start — decode a stream

using Codec;

// 1. Load and pin the dialect map by sha256.
var map = await MapLoader.LoadAsync(new LoadOptions
{
    Url  = "https://cdn.jsdelivr.net/gh/wdunn001/codec-maps/maps/qwen/qwen2.json",
    Hash = "sha256:c73972f7a580…",
});

// 2. Stream from a Codec-compliant server.
using var http = new HttpClient();
var requestBody = """
    { "model": "Qwen/Qwen2.5-7B-Instruct",
      "prompt": "Explain entropy.",
      "stream_format": "msgpack",
      "max_tokens": 256 }
    """;
using var req = new HttpRequestMessage(HttpMethod.Post, "http://localhost:8000/v1/completions");
req.Content = new StringContent(requestBody, System.Text.Encoding.UTF8, "application/json");
using var resp = await http.SendAsync(req, HttpCompletionOption.ResponseHeadersRead);
resp.EnsureSuccessStatusCode();

// 3. Detokenize lazily — only when rendering for a human.
var detok = new Detokenizer(map);
await using var body = await resp.Content.ReadAsStreamAsync();
await foreach (var frame in StreamDecoder.DecodeMsgpackStreamAsync(body))
{
    var text = detok.Render(frame.Ids, new DetokenizeOptions { Partial = !frame.Done });
    Console.Write(text);
}

Quick start — encode text (bidirectional path)

When you want zero text on the wire in either direction — agent A's output IDs feeding straight into agent B's input — encode text to IDs locally before sending:

var tok = new BPETokenizer(map);
var promptIds = tok.Encode("Explain entropy.");   // pure C# BPE, exact

// Send IDs as a normal OpenAI prompt: int[] (no special endpoint needed).
var body = JsonSerializer.Serialize(new
{
    prompt = promptIds,
    stream_format = "msgpack",
    max_tokens = 256,
});

For huge prompts (>50K tokens, e.g. RAG with long context), the dedicated /v1/completions/codec endpoint accepts a binary msgpack request body with the same effect. See PROTOCOL.md for both paths.

API

Type Purpose
MapLoader.LoadAsync(opts) Fetch + sha256-verify + cache a dialect map
MemoryMapCache Default in-memory IMapCache. Implement for IDB / KV
TokenizerMap.FromJson(...) / Validate(...) Parse + schema check
Detokenizer Stateful detokenizer: byte_level + metaspace + byte fallback + partial UTF-8
Detokenizer.Detokenize(map, ids) One-shot for non-streaming use
BPETokenizer Pure C# BPE: byte_level + metaspace
LongestMatchTokenizer Vocab-only fallback for canonical-IR maps
Tokenize.Pick(map) Build the right tokenizer for the loaded map
Tokenize.Encode(map, text) One-shot helper
StreamDecoder.DecodeMsgpackStreamAsync(stream) StreamIAsyncEnumerable<CodecFrame>
StreamDecoder.DecodeProtobufStreamAsync(stream) Same for length-prefixed protobuf
StreamDecoder.DecodeProtobufFrame(span) One-shot frame decoder (no length prefix)

Correctness

  • Byte-level decode: every vocab token is a sequence of GPT-2-encoded bytes. The Detokenizer reverses the byte→unicode table and accumulates bytes across tokens until a complete UTF-8 sequence forms. Tested with 3-byte () and 4-byte (🚀) sequences.
  • Metaspace decode: becomes space; SentencePiece byte-fallback IDs (<0x00><0xFF>) decoded through the same UTF-8 buffer.
  • Partial sequences across frames: Detokenizer is stateful — call Render(ids, new DetokenizeOptions { Partial = true }) while frames stream, then Partial = false (or default) on the last frame so the buffer flushes. Reset() between conversations.
  • BPE merge ordering: greedy by priority, not left-to-right. Matches HuggingFace tokenizers reference behavior. Test fixture verifies this explicitly.
  • HuggingFace round-trip: real Qwen-2 (152K vocab, byte_level) round-trips ASCII, code, emoji, multi-script CJK / Latin diacritics. Bit-identical with HF's Rust tokenizers library.
  • Hash verification uses System.Security.Cryptography.SHA256. Mismatch throws TokenizerMapHashMismatchException.

Map sources

MapLoader.LoadAsync accepts any URL — the sha256 hash is what matters. For curated pre-generated maps:

https://cdn.jsdelivr.net/gh/wdunn001/codec-maps/maps/<family>.json

14 families covering 70+ aliases — see codec-maps for the index.

To generate from a HuggingFace tokenizer.json:

npx @codecai/maps-cli build my-org/my-model --id=my-org/my-model

Compression

MapLoader enables AutomaticDecompression for gzip and brotli on its HttpClient, so jsDelivr's Content-Encoding: br (3.4× smaller transfers) works transparently. For Codec streaming responses, the server negotiates Content-Encoding based on the request's Accept-Encoding. Pass Accept-Encoding: zstd, br, gzip and the .NET runtime decompresses the response stream before DecodeMsgpackStreamAsync ever sees it.

License

MIT. See LICENSE.

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.1.0 33 5/6/2026