TokenKit 1.0.0

There is a newer version of this package available.
See the version list below for details.
dotnet tool install --global TokenKit --version 1.0.0
                    
This package contains a .NET tool you can call from the shell/command line.
dotnet new tool-manifest
                    
if you are setting up this repo
dotnet tool install --local TokenKit --version 1.0.0
                    
This package contains a .NET tool you can call from the shell/command line.
#tool dotnet:?package=TokenKit&version=1.0.0
                    
nuke :add-package TokenKit --version 1.0.0
                    

Build NuGet Version NuGet Downloads Codecov License: MIT

๐Ÿง  TokenKit

TokenKit โ€” A lightweight .NET 8.0 library and CLI tool for unified tokenization, validation, and model registry management across multiple LLM providers (OpenAI, Anthropic, Gemini, etc.)


โœจ Features

Category Description
๐Ÿ”ข Tokenization Analyze text or files and count tokens using provider-specific encodings
๐Ÿ’ฐ Cost Estimation Automatically calculate estimated API cost based on token usage
โœ… Prompt Validation Validate that prompts fit within model context limits
๐Ÿ”„ Model Registry Maintain up-to-date model metadata (maxTokens, pricing, encodings, etc.)
๐Ÿงฉ CLI & SDK Use TokenKit as a .NET library or a standalone global CLI
๐Ÿ“ฆ Self-contained All data stored in Registry/models.data.json, auto-updated via command
๐ŸŒ Optional Live Scraper Fetch the latest OpenAI model data using an API key, or use trusted fallback data

โš™๏ธ Installation

๐Ÿ“ฆ NuGet (Library use)

dotnet add package TokenKit

๐Ÿงฐ Global CLI Tool

dotnet tool install -g TokenKit

๐Ÿš€ Quick Start (CLI)

Analyze, validate, or update model data directly from your terminal.

1๏ธโƒฃ Analyze Inline Text

tokenkit analyze "Hello from TokenKit!" --model gpt-4o

Output:

{
  "Model": "gpt-4o",
  "Provider": "OpenAI",
  "TokenCount": 4,
  "EstimatedCost": 0.00002,
  "Valid": true
}

2๏ธโƒฃ Analyze File Input

tokenkit analyze prompt.txt --model gpt-4o

3๏ธโƒฃ Pipe Input (stdin)

echo "This is piped text input" | tokenkit analyze --model gpt-4o

4๏ธโƒฃ Validate Prompt

tokenkit validate "A very long prompt to validate" --model gpt-4o

Output:

{
  "IsValid": true,
  "Message": "OK"
}

5๏ธโƒฃ Update Model Data

Default Update (No API Key)

Fetch the latest built-in model metadata:

tokenkit update-models
Update Using OpenAI API Key

Use your OpenAI key to fetch live model data from the /v1/models endpoint:

tokenkit update-models --openai-key sk-xxxx
Update from JSON (stdin)

Pipe a JSON file with model specs:

cat newmodels.json | tokenkit update-models

Example newmodels.json:

[
  {
    "Id": "gpt-4o-mini",
    "Provider": "OpenAI",
    "MaxTokens": 64000,
    "InputPricePer1K": 0.002,
    "OutputPricePer1K": 0.01,
    "Encoding": "cl100k_base"
  }
]

6๏ธโƒฃ Scrape Model Data (Preview Only)

Fetch latest OpenAI model data (does not overwrite your registry):

tokenkit scrape-models --openai-key sk-xxxx

If no key is provided, TokenKit falls back to its offline model list.

Example Output:

๐Ÿ” Fetching latest OpenAI model data...
โœ… Retrieved 3 models:
  - OpenAI: gpt-4o (128000 tokens)
  - OpenAI: gpt-4o-mini (64000 tokens)
  - OpenAI: gpt-3.5-turbo (4096 tokens)

๐Ÿงฉ Model Registry

TokenKit stores all known model information in:

src/TokenKit/Registry/models.data.json

Each entry contains:

{
  "Id": "gpt-4o",
  "Provider": "OpenAI",
  "MaxTokens": 128000,
  "InputPricePer1K": 0.005,
  "OutputPricePer1K": 0.015,
  "Encoding": "cl100k_base"
}

๐Ÿงฎ CLI Command Reference

Command Description
tokenkit analyze "<text | path>" --model <model-id> Analyze and count tokens for inline text, file, or stdin input
tokenkit validate "<text | path>" --model <model-id> Validate prompt against model token limits
tokenkit update-models Update local registry using default fallback data
tokenkit update-models --openai-key <key> Update registry using OpenAI API (requires valid key)
cat newmodels.json | tokenkit update-models Update registry from piped JSON input
tokenkit scrape-models [--openai-key <key>] Fetch and preview OpenAI model data without saving
tokenkit --help Display CLI usage guide

๐Ÿง  Programmatic Use (SDK)

using TokenKit.Registry;
using TokenKit.Services;

var model = ModelRegistry.Get("gpt-4o");
var tokenizer = new TokenizerService();

var result = tokenizer.Analyze("Hello from TokenKit!", model!.Id);
var cost = CostEstimator.Estimate(model, result.TokenCount);

Console.WriteLine($"Tokens: {result.TokenCount}, Cost: ${cost}");

๐Ÿงช Testing

TokenKit includes an xUnit test suite with coverage for tokenization, cost estimation, and registry loading.

dotnet test

๐Ÿ›  Project Structure

TokenKit/
โ”œโ”€โ”€ src/
โ”‚   โ””โ”€โ”€ TokenKit/
โ”‚       โ”œโ”€โ”€ Models/
โ”‚       โ”œโ”€โ”€ Services/
โ”‚       โ”œโ”€โ”€ Registry/
โ”‚       โ”œโ”€โ”€ CLI/
โ”‚       โ””โ”€โ”€ Program.cs
โ””โ”€โ”€ tests/
    โ””โ”€โ”€ TokenKit.Tests/


โš™๏ธ Phase 6 Additions (Advanced Tokenization & CLI UX)

Feature Description
๐Ÿงฉ Multi-Encoder Support TokenKit now supports multiple tokenization engines via the --engine flag (simple, sharptoken, mltokenizers).
โš™๏ธ CLI Runtime Switching Analyze or validate text using any supported encoder on demand.
๐Ÿ“ฆ models list Command View all registered models (provider, ID, token limits, pricing) in a clean tabular view.
๐Ÿ” Provider Filtering Use tokenkit models list --provider OpenAI to filter models by provider (case-insensitive).
๐Ÿงช Multi-Engine Tests Added xUnit tests verifying token count consistency across encoders.
โš ๏ธ Disclaimer TokenKit provides cost estimates and token counts based on available model data. The author is not responsible for legacy, outdated, or provider-changed calculation differences.

๐Ÿง  Example Usage

List all models
tokenkit models list
Filter by provider (case-insensitive)
tokenkit models list --provider openai
tokenkit models list --provider Anthropic
Analyze with a specific encoder
tokenkit analyze "Hello from TokenKit" --model gpt-4o --engine sharptoken

๐Ÿ—บ๏ธ Roadmap

  • Tokenization, cost, and validation services
  • CLI for analyze, validate, and update-models
  • Stdin + file + inline input support
  • Model registry auto-load and safe paths
  • Live model scraping from OpenAI (optional API key)
  • Add tokenkit models list command
  • Optional SharpToken / Microsoft.ML.Tokenizers integration
  • Publish stable v1.0.0 to NuGet + dotnet tool feed

๐Ÿ’ก License

Licensed under the MIT License.
ยฉ 2025 Andrew Clements


๐ŸŽจ Phase 8 โ€” CLI Polish, Logging & Automation Support

Feature Description
๐Ÿงพ Colorized Output All CLI commands now use ConsoleStyler for clear, color-coded feedback (green โœ…, yellow โš ๏ธ, red โŒ).
๐Ÿคซ Quiet Mode (--quiet) Suppresses console output while still writing structured logs to tokenkit.log. Ideal for CI/CD pipelines.
โš™๏ธ Structured Logging Every operation is logged with timestamps and severity in tokenkit.log (auto-rotating, max 1MB).
๐Ÿงฉ JSON Mode (--json) Outputs raw JSON (no colors or emojis) for automation and machine-readable workflows.
๐Ÿง  ASCII Banner TokenKit now includes a startup banner and version info header for professional CLI presentation.
๐Ÿงช Enhanced Tests Coverage expanded to include encoders, CLI output modes, and logging behavior.

๐Ÿงช Extended CLI Examples

๐Ÿ”น Standard Analysis

tokenkit analyze "Hello from TokenKit!" --model gpt-4o

โœ… Produces colorized JSON summary + log entry.

๐Ÿ”น JSON Mode (Automation / CI)

tokenkit analyze "Hello world" --model gpt-4o --json

Outputs pure JSON only, suppressing banner and emojis:

{
  "Model": "gpt-4o",
  "Provider": "OpenAI",
  "TokenCount": 7,
  "EstimatedCost": 0.000105,
  "Engine": "simple",
  "Valid": true
}

๐Ÿ”น Quiet Mode (Log Only)

tokenkit analyze "Silent test" --model gpt-4o --quiet

No console output. Log file receives entries like:

2025-10-17 22:43:15 [INFO] Analyze started with model=gpt-4o
2025-10-17 22:43:15 [SUCCESS] Analyzed 7 tokens using simple (gpt-4o)

๐Ÿ”น Model Listing with JSON

tokenkit models list --json

๐Ÿ“œ Logs

All CLI runs write to tokenkit.log (auto-rotated at 1 MB).
You can find it under your TokenKit working directory, e.g.:

src/TokenKit/bin/Debug/net8.0/tokenkit.log

๐Ÿ“ˆ Code Coverage

TokenKit targets 100% test coverage with xUnit and Codecov integration.
Run coverage locally:

dotnet test --collect:"XPlat Code Coverage"

View detailed results in Codecov:
Codecov


๐Ÿ—บ๏ธ Updated Roadmap (as of 2025-10-17)

Phase Feature Status
1 Core tokenization + cost estimation โœ… Done
2 Validation logic โœ… Done
3 Model registry (JSON-based) โœ… Done
4 CLI commands (analyze, validate, update-models) โœ… Done
5 Scraper service (OpenAI API optional) โœ… Done
6 Advanced encoders (SharpToken, ML.Tokenizers) โœ… Done
7 Tests + Codecov integration โœ… Done
8 CLI polish (--json, --quiet, logging, banner) โœ… Done
9 NuGet + global CLI release (v1.0.0) ๐Ÿ”„ Pending Release

ยฉ 2025 Andrew Clements โ€” MIT License
Flow Labs / TokenKit โ€” https://github.com/AndrewClements84/TokenKit

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

This package has no dependencies.

Version Downloads Last Updated
1.2.0 191 10/19/2025
1.1.0 187 10/19/2025
1.0.0 116 10/17/2025