Compendium.Adapters.HuggingFace
1.0.0-preview.0
dotnet add package Compendium.Adapters.HuggingFace --version 1.0.0-preview.0
NuGet\Install-Package Compendium.Adapters.HuggingFace -Version 1.0.0-preview.0
<PackageReference Include="Compendium.Adapters.HuggingFace" Version="1.0.0-preview.0" />
<PackageVersion Include="Compendium.Adapters.HuggingFace" Version="1.0.0-preview.0" />
<PackageReference Include="Compendium.Adapters.HuggingFace" />
paket add Compendium.Adapters.HuggingFace --version 1.0.0-preview.0
#r "nuget: Compendium.Adapters.HuggingFace, 1.0.0-preview.0"
#:package Compendium.Adapters.HuggingFace@1.0.0-preview.0
#addin nuget:?package=Compendium.Adapters.HuggingFace&version=1.0.0-preview.0&prerelease
#tool nuget:?package=Compendium.Adapters.HuggingFace&version=1.0.0-preview.0&prerelease
compendium-adapter-huggingface
Hugging Face AI provider adapter for the Compendium event-sourcing framework. Implements IAIProvider from Compendium.Abstractions.AI against the two main Hugging Face inference surfaces — dedicated Inference Endpoints (paid, managed) and the shared Serverless Inference API — through the OpenAI-compatible Messages API and the feature-extraction task.
Built from template-compendium-adapter-dotnet per ADR-0006 (multi-repo adapter split).
Install
dotnet add package Compendium.Adapters.HuggingFace
Quick start — dedicated Inference Endpoint (production)
services.AddCompendiumHuggingFace(opt =>
{
opt.Mode = HuggingFaceMode.InferenceEndpoint;
opt.HfToken = Environment.GetEnvironmentVariable("HF_TOKEN")!;
opt.EndpointUrl = "https://<your-endpoint>.endpoints.huggingface.cloud";
});
var ai = sp.GetRequiredService<IAIProvider>();
var result = await ai.CompleteAsync(new CompletionRequest
{
Model = string.Empty, // endpoint is bound to one model at deploy time
Messages = new() { Message.User("What is event sourcing?") }
});
Quick start — Serverless Inference (prototyping)
services.AddCompendiumHuggingFace(opt =>
{
opt.Mode = HuggingFaceMode.ServerlessInference;
opt.HfToken = Environment.GetEnvironmentVariable("HF_TOKEN")!;
opt.DefaultChatModel = "meta-llama/Meta-Llama-3.1-8B-Instruct";
opt.DefaultEmbeddingModel = "sentence-transformers/all-MiniLM-L6-v2";
});
Mode comparison
| Aspect | InferenceEndpoint |
ServerlessInference |
|---|---|---|
| URL shape | https://<name>.endpoints.huggingface.cloud (one per endpoint) |
https://api-inference.huggingface.co/models/<id> |
| Cost model | Per-hour compute (CPU / GPU / accelerator) | Free tier + Pro (per-token, shared org quota) |
| Rate limits | Your endpoint's compute capacity only | Aggressive shared limits; HTTP 429 common |
| Cold starts | None (warm) or autoscale-to-zero opt-in (~30 s warm-up) | Frequent — first call to a model often loads it |
| Latency | Predictable | Variable |
| Model selection | One model per endpoint, fixed at deploy time | Any model in the Hugging Face Hub |
| Production fit | Yes (with autoscaling, regions, EU residency) | No — prototyping / spike work only |
Options
| Option | Required | Default | Notes |
|---|---|---|---|
Mode |
yes | InferenceEndpoint |
One of InferenceEndpoint, ServerlessInference |
HfToken |
yes | — | A hf_* user access token from https://huggingface.co/settings/tokens |
EndpointUrl |
endpoint mode | — | Full URL of your dedicated endpoint |
ServerlessBaseUrl |
serverless mode | https://api-inference.huggingface.co/models |
Override if HF moves the API |
DefaultChatModel |
— | meta-llama/Meta-Llama-3.1-8B-Instruct |
Used as fallback when request has no Model. In endpoint mode it's sent informationally; the server uses what the endpoint is bound to |
DefaultEmbeddingModel |
— | sentence-transformers/all-MiniLM-L6-v2 |
Same caveat |
DefaultMaxTokens |
— | 1024 |
Floor for chat completions |
TimeoutSeconds |
— | 120 |
Per-request HTTP timeout |
RetryAttempts |
— | 3 |
Applied by Microsoft.Extensions.Http.Resilience |
EnableLogging |
— | false |
Logs request/response bodies at Debug |
Bind from configuration with services.AddCompendiumHuggingFace(configuration) (binds section HuggingFace).
Model recommendations
For dedicated Inference Endpoints, the deploy-time model choice matters more than anything in this adapter. Good starting points:
- Llama-3.1-8B-Instruct — broad chat, tool-calling support via TGI.
- Qwen-2.5-7B-Instruct — strong English + Chinese, fast on a single L4.
- Mistral-7B-Instruct-v0.3 — small footprint, lower cost-to-serve.
- DeepSeek-V3 / R1 — when you need reasoning depth and can afford the compute.
For embeddings, BAAI/bge-large-en-v1.5 (general English) and sentence-transformers/all-MiniLM-L6-v2 (fast, small) cover most needs.
Tool / function calling — caveat
Tool calling is best-effort. The OpenAI-compatible tools + tool_choice body fields are passed through unchanged, and modern TGI-backed endpoints (Llama-3.1+, Qwen-2.5+, Mistral instruct families) generally honor them. Non-TGI endpoints (vision, audio, classification, older runtimes) silently ignore the fields and return a plain assistant message. There is no advance signal of capability — defensive consumers should check response.GetToolCalls().Count > 0 rather than assume.
Production checklist
- Pin to a dedicated Inference Endpoint (not serverless) for any user-facing workload.
- Enable autoscale-to-zero if traffic is bursty — cuts idle bill to near-zero in exchange for a one-time 30-second warm-up on the first request after idle.
- Pick the EU region (
region: eu-west-1at deploy time) if you're subject to GDPR data-residency requirements. - Set a Pro account budget alert — serverless can burst into the paid tier silently.
- Cache the typed
HttpClientvia DI (AddCompendiumHuggingFacedoes this automatically). - Test the fallback path — your code should still function (degraded) if the endpoint is autoscaled-to-zero during cold start (HF returns HTTP 503 with
{"error":"model is currently loading"}). - Watch for HTTP 429 on serverless and back off;
AIErrors.RateLimitExceededis surfaced as a typedResult.Failurefor clean retry logic. - Don't ship the HF token in source. Use environment variables, secret managers, or Compendium's secret store.
Repository conventions
| Aspect | Choice |
|---|---|
| Target | .NET 9, C# 13 |
| Test framework | xUnit 2.9.3 + FluentAssertions 6.12.1 + NSubstitute 5.1.0 |
| Coverage | 98.4 % line / 94.3 % branch (102 tests) — gate at 90 % |
| HTTP mocking | RichardSzalay.MockHttp 7.0.0 |
| Result pattern | Result<T> from Compendium.Core |
| Test naming | {SUT}Tests / {Method}_{Scenario}_{Expected} + AAA explicit |
Build & test locally
dotnet restore
dotnet build -c Release
dotnet test -c Release --filter "FullyQualifiedName!~IntegrationTests"
Releasing
Tag with a v prefix on main to publish to nuget.org + GitHub Packages:
git tag v1.0.0-preview.0
git push origin v1.0.0-preview.0
(Tagging is done by the orchestrator after merge; do not push tags from feature branches.)
License
MIT — Copyright © 2026 Sassy Solutions.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net9.0
- Compendium.Abstractions.AI (>= 1.0.1)
- Compendium.Core (>= 1.0.1)
- Microsoft.Extensions.Configuration.Abstractions (>= 9.0.16)
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 9.0.16)
- Microsoft.Extensions.Http (>= 9.0.16)
- Microsoft.Extensions.Http.Resilience (>= 9.0.0)
- Microsoft.Extensions.Logging.Abstractions (>= 9.0.16)
- Microsoft.Extensions.Options (>= 9.0.16)
- Microsoft.Extensions.Options.ConfigurationExtensions (>= 9.0.16)
- Microsoft.Extensions.Options.DataAnnotations (>= 9.0.16)
- System.Text.Json (>= 9.0.16)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 1.0.0-preview.0 | 50 | 5/21/2026 |