ElBruno.LocalLLMs
0.1.8
dotnet add package ElBruno.LocalLLMs --version 0.1.8
NuGet\Install-Package ElBruno.LocalLLMs -Version 0.1.8
<PackageReference Include="ElBruno.LocalLLMs" Version="0.1.8" />
<PackageVersion Include="ElBruno.LocalLLMs" Version="0.1.8" />
<PackageReference Include="ElBruno.LocalLLMs" />
paket add ElBruno.LocalLLMs --version 0.1.8
#r "nuget: ElBruno.LocalLLMs, 0.1.8"
#:package ElBruno.LocalLLMs@0.1.8
#addin nuget:?package=ElBruno.LocalLLMs&version=0.1.8
#tool nuget:?package=ElBruno.LocalLLMs&version=0.1.8
ElBruno.LocalLLMs
Run local LLMs in .NET through IChatClient โ the same interface you'd use for Azure OpenAI, Ollama, or any other provider. Powered by ONNX Runtime GenAI.
Features
- ๐
IChatClientimplementation โ seamless integration with Microsoft.Extensions.AI - ๐ฆ Automatic model download โ models are fetched from HuggingFace on first use
- ๐ Zero friction โ works out of the box with sensible defaults (Phi-3.5 mini)
- ๐ฅ๏ธ Multi-hardware โ CPU, CUDA, and DirectML execution providers
- ๐ DI-friendly โ register with
AddLocalLLMs()in ASP.NET Core - ๐ Streaming โ token-by-token streaming via
GetStreamingResponseAsync - ๐ Multi-model โ switch between Phi-3.5, Phi-4, Qwen2.5, Llama 3.2, and more
Installation
dotnet add package ElBruno.LocalLLMs
This works everywhere (CPU). To enable GPU acceleration, add one extra package:
# ๐ข NVIDIA GPU (CUDA):
dotnet add package Microsoft.ML.OnnxRuntimeGenAI.Cuda
# ๐ต Any Windows GPU โ AMD, Intel, NVIDIA (DirectML):
dotnet add package Microsoft.ML.OnnxRuntimeGenAI.DirectML
๐ The library defaults to
ExecutionProvider.Autoโ it tries GPU first and falls back to CPU automatically. No code changes needed.
Quick Start
using ElBruno.LocalLLMs;
using Microsoft.Extensions.AI;
// Create a local chat client (downloads Phi-3.5 mini on first run)
using var client = await LocalChatClient.CreateAsync();
var response = await client.GetResponseAsync([
new(ChatRole.User, "What is the capital of France?")
]);
Console.WriteLine(response.Text);
Streaming
using ElBruno.LocalLLMs;
using Microsoft.Extensions.AI;
using var client = await LocalChatClient.CreateAsync(new LocalLLMsOptions
{
Model = KnownModels.Phi35MiniInstruct
});
await foreach (var update in client.GetStreamingResponseAsync([
new(ChatRole.System, "You are a helpful assistant."),
new(ChatRole.User, "Explain quantum computing in simple terms.")
]))
{
Console.Write(update.Text);
}
Dependency Injection
builder.Services.AddLocalLLMs(options =>
{
options.Model = KnownModels.Phi35MiniInstruct;
options.ExecutionProvider = ExecutionProvider.DirectML;
});
// Inject IChatClient anywhere
public class MyService(IChatClient chatClient) { ... }
Supported Models
| Tier | Model | Parameters | ONNX | ID |
|---|---|---|---|---|
| โช Tiny | TinyLlama-1.1B-Chat | 1.1B | โ Native | tinyllama-1.1b-chat |
| โช Tiny | SmolLM2-1.7B-Instruct | 1.7B | โ Native | smollm2-1.7b-instruct |
| โช Tiny | Qwen2.5-0.5B-Instruct | 0.5B | โ Native | qwen2.5-0.5b-instruct |
| โช Tiny | Qwen2.5-1.5B-Instruct | 1.5B | โ Native | qwen2.5-1.5b-instruct |
| โช Tiny | Gemma-2B-IT | 2B | โ Native | gemma-2b-it |
| โช Tiny | StableLM-2-1.6B-Chat | 1.6B | ๐ Convert | stablelm-2-1.6b-chat |
| ๐ข Small | Phi-3.5 mini instruct | 3.8B | โ Native | phi-3.5-mini-instruct |
| ๐ข Small | Qwen2.5-3B-Instruct | 3B | โ Native | qwen2.5-3b-instruct |
| ๐ข Small | Llama-3.2-3B-Instruct | 3B | โ Native | llama-3.2-3b-instruct |
| ๐ข Small | Gemma-2-2B-IT | 2B | โ Native | gemma-2-2b-it |
| ๐ก Medium | Qwen2.5-7B-Instruct | 7B | โ Native | qwen2.5-7b-instruct |
| ๐ก Medium | Llama-3.1-8B-Instruct | 8B | โ Native | llama-3.1-8b-instruct |
| ๐ก Medium | Mistral-7B-Instruct-v0.3 | 7B | โ Native | mistral-7b-instruct-v0.3 |
| ๐ก Medium | Gemma-2-9B-IT | 9B | โ Native | gemma-2-9b-it |
| ๐ก Medium | Phi-4 | 14B | โ Native | phi-4 |
| ๐ก Medium | DeepSeek-R1-Distill-Qwen-14B | 14B | โ Native | deepseek-r1-distill-qwen-14b |
| ๐ก Medium | Mistral-Small-24B-Instruct | 24B | โ Native | mistral-small-24b-instruct |
| ๐ด Large | Qwen2.5-14B-Instruct | 14B | โ Native | qwen2.5-14b-instruct |
| ๐ด Large | Qwen2.5-32B-Instruct | 32B | โ Native | qwen2.5-32b-instruct |
| ๐ด Large | Llama-3.3-70B-Instruct | 70B | โ ONNX | llama-3.3-70b-instruct |
| ๐ด Large | Mixtral-8x7B-Instruct-v0.1 | 8x7B | ๐ Convert | mixtral-8x7b-instruct-v0.1 |
| ๐ด Large | DeepSeek-R1-Distill-Llama-70B | 70B | ๐ Convert | deepseek-r1-distill-llama-70b |
| ๐ด Large | Command-R (35B) | 35B | ๐ Convert | command-r-35b |
See the Supported Models Guide for detailed model cards, performance benchmarks, and selection guidance.
Samples
| Sample | Description |
|---|---|
| HelloChat | Minimal console chat |
| StreamingChat | Token-by-token streaming |
| MultiModelChat | Switch models at runtime |
| DependencyInjection | ASP.NET Core DI registration |
Requirements
- .NET 8.0 or .NET 10.0
- CPU (default), NVIDIA GPU (CUDA), or Windows GPU (DirectML)
- ~2-8 GB disk space per model (depending on size and quantization)
Documentation
- Getting Started โ installation, first steps, configuration
- Supported Models โ full model reference with tiers, specs, decision tree
- Architecture โ design decisions and internal structure
- Samples Guide โ walkthrough of each sample application
- Benchmarks โ how to run and interpret performance benchmarks
- ONNX Conversion โ converting HuggingFace models to ONNX format
- Publishing โ NuGet package publishing with OIDC
- Contributing โ how to contribute
- Changelog โ version history
๐ค Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
๐ License
This project is licensed under the MIT License โ see the LICENSE file for details.
๐ About the Author
Hi! I'm ElBruno ๐งก, a passionate developer and content creator exploring AI, .NET, and modern development practices.
Made with โค๏ธ by ElBruno
If you like this project, consider following my work across platforms:
- ๐ป Podcast: No Tienen Nombre โ Spanish-language episodes on AI, development, and tech culture
- ๐ป Blog: ElBruno.com โ Deep dives on embeddings, RAG, .NET, and local AI
- ๐บ YouTube: youtube.com/elbruno โ Demos, tutorials, and live coding
- ๐ LinkedIn: @elbruno โ Professional updates and insights
- ๐ Twitter: @elbruno โ Quick tips, releases, and tech news
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- ElBruno.HuggingFace.Downloader (>= 0.6.0)
- Microsoft.Extensions.AI.Abstractions (>= 10.4.0)
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 10.0.5)
- Microsoft.ML.OnnxRuntimeGenAI (>= 0.12.2)
-
net8.0
- ElBruno.HuggingFace.Downloader (>= 0.6.0)
- Microsoft.Extensions.AI.Abstractions (>= 10.4.0)
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 10.0.5)
- Microsoft.ML.OnnxRuntimeGenAI (>= 0.12.2)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.