OllamaFlow.Core 1.1.3

dotnet add package OllamaFlow.Core --version 1.1.3
                    
NuGet\Install-Package OllamaFlow.Core -Version 1.1.3
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="OllamaFlow.Core" Version="1.1.3" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="OllamaFlow.Core" Version="1.1.3" />
                    
Directory.Packages.props
<PackageReference Include="OllamaFlow.Core" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add OllamaFlow.Core --version 1.1.3
                    
#r "nuget: OllamaFlow.Core, 1.1.3"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package OllamaFlow.Core@1.1.3
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=OllamaFlow.Core&version=1.1.3
                    
Install as a Cake Addin
#tool nuget:?package=OllamaFlow.Core&version=1.1.3
                    
Install as a Cake Tool

OllamaFlow

<div align="center"> <img src="https://github.com/jchristn/ollamaflow/blob/main/assets/icon.png?raw=true" width="200" height="184" alt="OllamaFlow">

Intelligent Load Balancing and Model Orchestration for Ollama and OpenAI Platforms

License: MIT .NET Docker Documentation Web UI </div>

🚀 Scale Your AI Infrastructure

OllamaFlow is a lightweight, intelligent orchestration layer that unifies multiple AI backend instances into a high-availability inference cluster. Supporting both Ollama and OpenAI API formats on the frontend with native transformation capabilities, OllamaFlow delivers scalability, high availability, and security control - enabling you to scale AI workloads across multiple backends while ensuring zero-downtime model serving and fine-grained control over inference and embeddings deployments.

📖 Complete Documentation | 🎨 Web UI Dashboard

Why OllamaFlow?

  • 🎯 Multiple Virtual Endpoints: Create multiple frontend endpoints, each mapping to their own set of AI backends
  • 🔄 Universal API Support: Frontend supports both Ollama and OpenAI API formats
  • 🌐 Multi-Backend Support: Connect to Ollama, OpenAI, vLLM, SharpAI, and other OpenAI-compatible backends
  • ⚖️ Smart Load Balancing: Distribute requests intelligently across healthy backends
  • 🔒 Security and Control: Fine-grained control over request types and parameter enforcement for secure inference and embeddings deployments
  • 🔧 Automatic Model Sync: Ensure all backends have the required models (Ollama-compatible backends only)
  • ❤️ Health Monitoring: Real-time health checks with configurable thresholds
  • 📊 Zero Downtime: Provide high-availability to mitigate effects of backend failures
  • 🛠️ RESTful Admin API: Full control through a comprehensive management API
  • 🎨 Web Dashboard: Optional web UI for visual cluster management and monitoring

🎨 Key Features

Load Balancing

  • Round-robin and random distribution strategies
  • Request routing based on backend health and capacity
  • Automatic failover for unhealthy backends
  • Configurable rate limiting per backend
  • Sticky sessions based on custom headers or IP address

Model Management

  • Automatic model discovery across all Ollama backends
  • Intelligent synchronization - pulls missing models automatically on Ollama-compatible backends
  • Dynamic model requirements - update required models on Ollama-compatible backends
  • Parallel downloads with configurable concurrency

High Availability

  • Real-time health monitoring with customizable check intervals
  • Automatic failover for unhealthy backends
  • Request queuing during high load
  • Connection pooling for optimal performance

Security and Control

  • Request type restrictions - Control embeddings and completions access at frontend and backend levels
  • Pinned request properties - Enforce or override parameters for compliance (models, context size, temperature, etc.)
  • Bearer token authentication for admin APIs
  • Multi-tenant isolation through separate virtual frontends

Enterprise Ready

  • Comprehensive logging with syslog support
  • Docker and Docker Compose ready
  • SQLite database for configuration persistence
  • Production-tested for scalability and high availability

🏃 Quick Start

# Pull the image
docker pull jchristn/ollamaflow:v1.1.0

# Run with default configuration
docker run -d \
  -p 43411:43411 \
  -v $(pwd)/ollamaflow.json:/app/ollamaflow.json \
  -v $(pwd)/ollamaflow.db:/app/ollamaflow.db \
  jchristn/ollamaflow:v1.1.0

Using .NET

# Clone the repository
git clone https://github.com/jchristn/ollamaflow.git
cd ollamaflow/src

# Build and run
dotnet build
cd OllamaFlow.Server/bin/Debug/net8.0
dotnet OllamaFlow.Server.dll

⚙️ Configuration

OllamaFlow uses a simple JSON configuration file named ollamaflow.json. Here's a minimal example:

{
  "Webserver": {
    "Hostname": "*",
    "Port": 43411
  },
  "Logging": {
    "MinimumSeverity": 6,
    "ConsoleLogging": true
  },
  "Frontends": ["..."],
  "Backends": ["..."]
}

Frontend Configuration

Frontends define your virtual Ollama endpoints:

{
  "Identifier": "main-frontend",
  "Name": "Production Ollama Frontend",
  "Hostname": "*",
  "LoadBalancing": "RoundRobin",
  "Backends": ["gpu-1", "gpu-2", "gpu-3"],
  "RequiredModels": ["llama3", "all-minilm"],
  "AllowEmbeddings": true,
  "AllowCompletions": true,
  "PinnedEmbeddingsProperties": {
    "model": "all-minilm"
  },
  "PinnedCompletionsProperties": {
    "model": "llama3",
    "options": {
      "num_ctx": 4096,
      "temperature": 0.3
    }
  }
}

Backend Configuration

Backends represent your actual AI inference instances (Ollama, OpenAI, vLLM, SharpAI, etc.):

{
  "Identifier": "gpu-1",
  "Name": "GPU Server 1",
  "Hostname": "192.168.1.100",
  "Port": 11434,
  "MaxParallelRequests": 4,
  "HealthCheckMethod": "HEAD",
  "HealthCheckUrl": "/",
  "UnhealthyThreshold": 2,
  "ApiFormat": "Ollama",
  "AllowEmbeddings": true,
  "AllowCompletions": true,
  "PinnedEmbeddingsProperties": {
    "model": "all-minilm"
  },
  "PinnedCompletionsProperties": {
    "model": "llama3",
    "options": {
      "num_ctx": 4096,
      "temperature": 0.3
    }
  }
}

📡 API Compatibility

OllamaFlow provides universal API compatibility with native transformation between formats:

Frontend API Support

  • Ollama API - Complete compatibility with all Ollama endpoints
  • OpenAI API - Chat completions, embeddings, and model management

Supported Endpoints

Ollama API:

  • /api/generate - Text generation
  • /api/chat/generate - Chat completions
  • /api/pull - Model pulling
  • /api/push - Model pushing
  • /api/show - Model information
  • /api/tags - List models
  • /api/ps - Running models
  • /api/embed - Embeddings
  • /api/delete - Model deletion

OpenAI API:

  • /v1/chat/completions - Chat completions
  • /v1/completions - Text completions
  • /v1/embeddings - Text embeddings

Supported Backends

  • Ollama - Local AI runtime
  • OpenAI - OpenAI API services
  • vLLM - High-performance LLM inference
  • SharpAI - .NET-based AI inference server
  • Any OpenAI-compatible API - Universal backend support

🔧 Advanced Features

Request Control & Security

OllamaFlow provides fine-grained control over request types and parameters at both the frontend and backend levels:

Request Type Restrictions

Control which types of requests are allowed using AllowEmbeddings and AllowCompletions boolean properties:

  • Set on frontends to control which request types clients can use those endpoint
  • Set on backends to control which request types can be routed to that backend instance
  • Both must be true for a request to succeed - if either the frontend or backend disallows a request type, it will fail

Example use cases:

  • Dedicate specific frontends for embeddings-only workloads
  • Reserve high-performance backends for completions only
  • Create security boundaries between different request types
Pinned Request Properties

Force specific properties into requests using PinnedEmbeddingsProperties and PinnedCompletionsProperties dictionaries:

  • Properties are automatically appended to requests that don't include them
  • Properties overwrite existing values in the request for compliance enforcement
  • Apply to both frontends and backends independently
  • Support any valid request property (model, options, temperature, context size, stop tokens, etc.)
  • Structure must mirror the API request format - for Ollama API, generation parameters go inside options object

Example use cases:

  • Model enforcement: Ensure specific models are always used regardless of client request
  • Resource control: Lock context sizes to prevent memory exhaustion
  • Quality assurance: Standardize temperature and other generation parameters
  • Security compliance: Override user-specified parameters to meet organizational policies

Property precedence (highest to lowest):

  1. Backend pinned properties
  2. Frontend pinned properties
  3. Original user request properties

Merge behavior:

  • Uses recursive JSON merging via JsonMerge
  • Nested objects are merged intelligently (new properties added, existing properties overwritten)
  • Arrays are completely replaced, not merged
{
  "Identifier": "secured-frontend",
  "PinnedCompletionsProperties": {
    "model": "llama3",
    "options": {
      "temperature": 0.3,
      "num_ctx": 4096,
      "stop": ["[DONE]", "\n\n"]
    }
  }
}

Multi-Backend Testing

Test with multiple AI backend instances using Docker Compose:

cd Docker
docker compose -f compose-ollama.yaml up -d

This spins up 4 Ollama instances on ports 11435-11438 for testing load balancing and transformation capabilities.

Admin API

Manage your cluster programmatically:

# List all backends
curl -H "Authorization: Bearer your-token" \
  http://localhost:43411/v1.0/backends

# Add a new backend
curl -X PUT \
  -H "Authorization: Bearer your-token" \
  -H "Content-Type: application/json" \
  -d '{"Identifier": "gpu-4", "Hostname": "192.168.1.104", "Port": 11434}' \
  http://localhost:43411/v1.0/backends

A complete Postman collection (OllamaFlow.postman_collection.json) is included in the repository root with examples for all API endpoints, including Ollama API, OpenAI API, and administrative APIs with native transformation examples.

For interactive API testing and experimentation, the OllamaFlow API Explorer provides a web-based dashboard for exploring and testing all OllamaFlow endpoints.

For a visual interface, check out the OllamaFlow Web UI which provides a dashboard for cluster management and monitoring.

🤝 Contributing

We welcome contributions! Whether it's:

  • 🐛 Bug fixes
  • ✨ New features
  • 📚 Documentation improvements
  • 💡 Feature requests

Please check out our Contributing Guidelines and feel free to:

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📚 Documentation & Resources

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • The Ollama and vLLM teams for creating amazing local AI tools and model runners
  • All our contributors and users who make this project possible

<div align="center"> <b>Ready to scale your AI infrastructure?</b><br> Get started with OllamaFlow today!<br><br> 📖 <a href="https://ollamaflow.readme.io/"><b>Documentation</b></a> | 🎨 <a href="https://github.com/ollamaflow/ui"><b>Web Dashboard</b></a> | 🔬 <a href="https://github.com/ollamaflow/apiexplorer"><b>API Explorer</b></a> </div>

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.1.3 156 10/8/2025
1.1.0 142 10/3/2025
1.0.5 166 9/30/2025
1.0.3 275 9/19/2025
1.0.2 157 9/5/2025
1.0.1 180 9/4/2025

Initial release.