TokenKit 1.0.0
See the version list below for details.
dotnet tool install --global TokenKit --version 1.0.0
dotnet new tool-manifest
dotnet tool install --local TokenKit --version 1.0.0
#tool dotnet:?package=TokenKit&version=1.0.0
nuke :add-package TokenKit --version 1.0.0
๐ง TokenKit
TokenKit โ A lightweight .NET 8.0 library and CLI tool for unified tokenization, validation, and model registry management across multiple LLM providers (OpenAI, Anthropic, Gemini, etc.)
โจ Features
| Category | Description |
|---|---|
| ๐ข Tokenization | Analyze text or files and count tokens using provider-specific encodings |
| ๐ฐ Cost Estimation | Automatically calculate estimated API cost based on token usage |
| โ Prompt Validation | Validate that prompts fit within model context limits |
| ๐ Model Registry | Maintain up-to-date model metadata (maxTokens, pricing, encodings, etc.) |
| ๐งฉ CLI & SDK | Use TokenKit as a .NET library or a standalone global CLI |
| ๐ฆ Self-contained | All data stored in Registry/models.data.json, auto-updated via command |
| ๐ Optional Live Scraper | Fetch the latest OpenAI model data using an API key, or use trusted fallback data |
โ๏ธ Installation
๐ฆ NuGet (Library use)
dotnet add package TokenKit
๐งฐ Global CLI Tool
dotnet tool install -g TokenKit
๐ Quick Start (CLI)
Analyze, validate, or update model data directly from your terminal.
1๏ธโฃ Analyze Inline Text
tokenkit analyze "Hello from TokenKit!" --model gpt-4o
Output:
{
"Model": "gpt-4o",
"Provider": "OpenAI",
"TokenCount": 4,
"EstimatedCost": 0.00002,
"Valid": true
}
2๏ธโฃ Analyze File Input
tokenkit analyze prompt.txt --model gpt-4o
3๏ธโฃ Pipe Input (stdin)
echo "This is piped text input" | tokenkit analyze --model gpt-4o
4๏ธโฃ Validate Prompt
tokenkit validate "A very long prompt to validate" --model gpt-4o
Output:
{
"IsValid": true,
"Message": "OK"
}
5๏ธโฃ Update Model Data
Default Update (No API Key)
Fetch the latest built-in model metadata:
tokenkit update-models
Update Using OpenAI API Key
Use your OpenAI key to fetch live model data from the /v1/models endpoint:
tokenkit update-models --openai-key sk-xxxx
Update from JSON (stdin)
Pipe a JSON file with model specs:
cat newmodels.json | tokenkit update-models
Example newmodels.json:
[
{
"Id": "gpt-4o-mini",
"Provider": "OpenAI",
"MaxTokens": 64000,
"InputPricePer1K": 0.002,
"OutputPricePer1K": 0.01,
"Encoding": "cl100k_base"
}
]
6๏ธโฃ Scrape Model Data (Preview Only)
Fetch latest OpenAI model data (does not overwrite your registry):
tokenkit scrape-models --openai-key sk-xxxx
If no key is provided, TokenKit falls back to its offline model list.
Example Output:
๐ Fetching latest OpenAI model data...
โ
Retrieved 3 models:
- OpenAI: gpt-4o (128000 tokens)
- OpenAI: gpt-4o-mini (64000 tokens)
- OpenAI: gpt-3.5-turbo (4096 tokens)
๐งฉ Model Registry
TokenKit stores all known model information in:
src/TokenKit/Registry/models.data.json
Each entry contains:
{
"Id": "gpt-4o",
"Provider": "OpenAI",
"MaxTokens": 128000,
"InputPricePer1K": 0.005,
"OutputPricePer1K": 0.015,
"Encoding": "cl100k_base"
}
๐งฎ CLI Command Reference
| Command | Description |
|---|---|
tokenkit analyze "<text | path>" --model <model-id> |
Analyze and count tokens for inline text, file, or stdin input |
tokenkit validate "<text | path>" --model <model-id> |
Validate prompt against model token limits |
tokenkit update-models |
Update local registry using default fallback data |
tokenkit update-models --openai-key <key> |
Update registry using OpenAI API (requires valid key) |
cat newmodels.json | tokenkit update-models |
Update registry from piped JSON input |
tokenkit scrape-models [--openai-key <key>] |
Fetch and preview OpenAI model data without saving |
tokenkit --help |
Display CLI usage guide |
๐ง Programmatic Use (SDK)
using TokenKit.Registry;
using TokenKit.Services;
var model = ModelRegistry.Get("gpt-4o");
var tokenizer = new TokenizerService();
var result = tokenizer.Analyze("Hello from TokenKit!", model!.Id);
var cost = CostEstimator.Estimate(model, result.TokenCount);
Console.WriteLine($"Tokens: {result.TokenCount}, Cost: ${cost}");
๐งช Testing
TokenKit includes an xUnit test suite with coverage for tokenization, cost estimation, and registry loading.
dotnet test
๐ Project Structure
TokenKit/
โโโ src/
โ โโโ TokenKit/
โ โโโ Models/
โ โโโ Services/
โ โโโ Registry/
โ โโโ CLI/
โ โโโ Program.cs
โโโ tests/
โโโ TokenKit.Tests/
โ๏ธ Phase 6 Additions (Advanced Tokenization & CLI UX)
| Feature | Description |
|---|---|
| ๐งฉ Multi-Encoder Support | TokenKit now supports multiple tokenization engines via the --engine flag (simple, sharptoken, mltokenizers). |
| โ๏ธ CLI Runtime Switching | Analyze or validate text using any supported encoder on demand. |
๐ฆ models list Command |
View all registered models (provider, ID, token limits, pricing) in a clean tabular view. |
| ๐ Provider Filtering | Use tokenkit models list --provider OpenAI to filter models by provider (case-insensitive). |
| ๐งช Multi-Engine Tests | Added xUnit tests verifying token count consistency across encoders. |
| โ ๏ธ Disclaimer | TokenKit provides cost estimates and token counts based on available model data. The author is not responsible for legacy, outdated, or provider-changed calculation differences. |
๐ง Example Usage
List all models
tokenkit models list
Filter by provider (case-insensitive)
tokenkit models list --provider openai
tokenkit models list --provider Anthropic
Analyze with a specific encoder
tokenkit analyze "Hello from TokenKit" --model gpt-4o --engine sharptoken
๐บ๏ธ Roadmap
- Tokenization, cost, and validation services
- CLI for
analyze,validate, andupdate-models - Stdin + file + inline input support
- Model registry auto-load and safe paths
- Live model scraping from OpenAI (optional API key)
- Add
tokenkit models listcommand - Optional SharpToken / Microsoft.ML.Tokenizers integration
- Publish stable v1.0.0 to NuGet + dotnet tool feed
๐ก License
Licensed under the MIT License.
ยฉ 2025 Andrew Clements
๐จ Phase 8 โ CLI Polish, Logging & Automation Support
| Feature | Description |
|---|---|
| ๐งพ Colorized Output | All CLI commands now use ConsoleStyler for clear, color-coded feedback (green โ
, yellow โ ๏ธ, red โ). |
๐คซ Quiet Mode (--quiet) |
Suppresses console output while still writing structured logs to tokenkit.log. Ideal for CI/CD pipelines. |
| โ๏ธ Structured Logging | Every operation is logged with timestamps and severity in tokenkit.log (auto-rotating, max 1MB). |
๐งฉ JSON Mode (--json) |
Outputs raw JSON (no colors or emojis) for automation and machine-readable workflows. |
| ๐ง ASCII Banner | TokenKit now includes a startup banner and version info header for professional CLI presentation. |
| ๐งช Enhanced Tests | Coverage expanded to include encoders, CLI output modes, and logging behavior. |
๐งช Extended CLI Examples
๐น Standard Analysis
tokenkit analyze "Hello from TokenKit!" --model gpt-4o
โ Produces colorized JSON summary + log entry.
๐น JSON Mode (Automation / CI)
tokenkit analyze "Hello world" --model gpt-4o --json
Outputs pure JSON only, suppressing banner and emojis:
{
"Model": "gpt-4o",
"Provider": "OpenAI",
"TokenCount": 7,
"EstimatedCost": 0.000105,
"Engine": "simple",
"Valid": true
}
๐น Quiet Mode (Log Only)
tokenkit analyze "Silent test" --model gpt-4o --quiet
No console output. Log file receives entries like:
2025-10-17 22:43:15 [INFO] Analyze started with model=gpt-4o
2025-10-17 22:43:15 [SUCCESS] Analyzed 7 tokens using simple (gpt-4o)
๐น Model Listing with JSON
tokenkit models list --json
๐ Logs
All CLI runs write to tokenkit.log (auto-rotated at 1 MB).
You can find it under your TokenKit working directory, e.g.:
src/TokenKit/bin/Debug/net8.0/tokenkit.log
๐ Code Coverage
TokenKit targets 100% test coverage with xUnit and Codecov integration.
Run coverage locally:
dotnet test --collect:"XPlat Code Coverage"
View detailed results in Codecov:
๐บ๏ธ Updated Roadmap (as of 2025-10-17)
| Phase | Feature | Status |
|---|---|---|
| 1 | Core tokenization + cost estimation | โ Done |
| 2 | Validation logic | โ Done |
| 3 | Model registry (JSON-based) | โ Done |
| 4 | CLI commands (analyze, validate, update-models) |
โ Done |
| 5 | Scraper service (OpenAI API optional) | โ Done |
| 6 | Advanced encoders (SharpToken, ML.Tokenizers) |
โ Done |
| 7 | Tests + Codecov integration | โ Done |
| 8 | CLI polish (--json, --quiet, logging, banner) |
โ Done |
| 9 | NuGet + global CLI release (v1.0.0) | ๐ Pending Release |
ยฉ 2025 Andrew Clements โ MIT License
Flow Labs / TokenKit โ https://github.com/AndrewClements84/TokenKit
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
This package has no dependencies.