OllamaFlow.Core
1.1.3
dotnet add package OllamaFlow.Core --version 1.1.3
NuGet\Install-Package OllamaFlow.Core -Version 1.1.3
<PackageReference Include="OllamaFlow.Core" Version="1.1.3" />
<PackageVersion Include="OllamaFlow.Core" Version="1.1.3" />
<PackageReference Include="OllamaFlow.Core" />
paket add OllamaFlow.Core --version 1.1.3
#r "nuget: OllamaFlow.Core, 1.1.3"
#:package OllamaFlow.Core@1.1.3
#addin nuget:?package=OllamaFlow.Core&version=1.1.3
#tool nuget:?package=OllamaFlow.Core&version=1.1.3
OllamaFlow
<div align="center"> <img src="https://github.com/jchristn/ollamaflow/blob/main/assets/icon.png?raw=true" width="200" height="184" alt="OllamaFlow">
Intelligent Load Balancing and Model Orchestration for Ollama and OpenAI Platforms
🚀 Scale Your AI Infrastructure
OllamaFlow is a lightweight, intelligent orchestration layer that unifies multiple AI backend instances into a high-availability inference cluster. Supporting both Ollama and OpenAI API formats on the frontend with native transformation capabilities, OllamaFlow delivers scalability, high availability, and security control - enabling you to scale AI workloads across multiple backends while ensuring zero-downtime model serving and fine-grained control over inference and embeddings deployments.
Why OllamaFlow?
- 🎯 Multiple Virtual Endpoints: Create multiple frontend endpoints, each mapping to their own set of AI backends
- 🔄 Universal API Support: Frontend supports both Ollama and OpenAI API formats
- 🌐 Multi-Backend Support: Connect to Ollama, OpenAI, vLLM, SharpAI, and other OpenAI-compatible backends
- ⚖️ Smart Load Balancing: Distribute requests intelligently across healthy backends
- 🔒 Security and Control: Fine-grained control over request types and parameter enforcement for secure inference and embeddings deployments
- 🔧 Automatic Model Sync: Ensure all backends have the required models (Ollama-compatible backends only)
- ❤️ Health Monitoring: Real-time health checks with configurable thresholds
- 📊 Zero Downtime: Provide high-availability to mitigate effects of backend failures
- 🛠️ RESTful Admin API: Full control through a comprehensive management API
- 🎨 Web Dashboard: Optional web UI for visual cluster management and monitoring
🎨 Key Features
Load Balancing
- Round-robin and random distribution strategies
- Request routing based on backend health and capacity
- Automatic failover for unhealthy backends
- Configurable rate limiting per backend
- Sticky sessions based on custom headers or IP address
Model Management
- Automatic model discovery across all Ollama backends
- Intelligent synchronization - pulls missing models automatically on Ollama-compatible backends
- Dynamic model requirements - update required models on Ollama-compatible backends
- Parallel downloads with configurable concurrency
High Availability
- Real-time health monitoring with customizable check intervals
- Automatic failover for unhealthy backends
- Request queuing during high load
- Connection pooling for optimal performance
Security and Control
- Request type restrictions - Control embeddings and completions access at frontend and backend levels
- Pinned request properties - Enforce or override parameters for compliance (models, context size, temperature, etc.)
- Bearer token authentication for admin APIs
- Multi-tenant isolation through separate virtual frontends
Enterprise Ready
- Comprehensive logging with syslog support
- Docker and Docker Compose ready
- SQLite database for configuration persistence
- Production-tested for scalability and high availability
🏃 Quick Start
Using Docker (Recommended)
# Pull the image
docker pull jchristn/ollamaflow:v1.1.0
# Run with default configuration
docker run -d \
-p 43411:43411 \
-v $(pwd)/ollamaflow.json:/app/ollamaflow.json \
-v $(pwd)/ollamaflow.db:/app/ollamaflow.db \
jchristn/ollamaflow:v1.1.0
Using .NET
# Clone the repository
git clone https://github.com/jchristn/ollamaflow.git
cd ollamaflow/src
# Build and run
dotnet build
cd OllamaFlow.Server/bin/Debug/net8.0
dotnet OllamaFlow.Server.dll
⚙️ Configuration
OllamaFlow uses a simple JSON configuration file named ollamaflow.json
. Here's a minimal example:
{
"Webserver": {
"Hostname": "*",
"Port": 43411
},
"Logging": {
"MinimumSeverity": 6,
"ConsoleLogging": true
},
"Frontends": ["..."],
"Backends": ["..."]
}
Frontend Configuration
Frontends define your virtual Ollama endpoints:
{
"Identifier": "main-frontend",
"Name": "Production Ollama Frontend",
"Hostname": "*",
"LoadBalancing": "RoundRobin",
"Backends": ["gpu-1", "gpu-2", "gpu-3"],
"RequiredModels": ["llama3", "all-minilm"],
"AllowEmbeddings": true,
"AllowCompletions": true,
"PinnedEmbeddingsProperties": {
"model": "all-minilm"
},
"PinnedCompletionsProperties": {
"model": "llama3",
"options": {
"num_ctx": 4096,
"temperature": 0.3
}
}
}
Backend Configuration
Backends represent your actual AI inference instances (Ollama, OpenAI, vLLM, SharpAI, etc.):
{
"Identifier": "gpu-1",
"Name": "GPU Server 1",
"Hostname": "192.168.1.100",
"Port": 11434,
"MaxParallelRequests": 4,
"HealthCheckMethod": "HEAD",
"HealthCheckUrl": "/",
"UnhealthyThreshold": 2,
"ApiFormat": "Ollama",
"AllowEmbeddings": true,
"AllowCompletions": true,
"PinnedEmbeddingsProperties": {
"model": "all-minilm"
},
"PinnedCompletionsProperties": {
"model": "llama3",
"options": {
"num_ctx": 4096,
"temperature": 0.3
}
}
}
📡 API Compatibility
OllamaFlow provides universal API compatibility with native transformation between formats:
Frontend API Support
- ✅ Ollama API - Complete compatibility with all Ollama endpoints
- ✅ OpenAI API - Chat completions, embeddings, and model management
Supported Endpoints
Ollama API:
- ✅
/api/generate
- Text generation - ✅
/api/chat/generate
- Chat completions - ✅
/api/pull
- Model pulling - ✅
/api/push
- Model pushing - ✅
/api/show
- Model information - ✅
/api/tags
- List models - ✅
/api/ps
- Running models - ✅
/api/embed
- Embeddings - ✅
/api/delete
- Model deletion
OpenAI API:
- ✅
/v1/chat/completions
- Chat completions - ✅
/v1/completions
- Text completions - ✅
/v1/embeddings
- Text embeddings
Supported Backends
- Ollama - Local AI runtime
- OpenAI - OpenAI API services
- vLLM - High-performance LLM inference
- SharpAI - .NET-based AI inference server
- Any OpenAI-compatible API - Universal backend support
🔧 Advanced Features
Request Control & Security
OllamaFlow provides fine-grained control over request types and parameters at both the frontend and backend levels:
Request Type Restrictions
Control which types of requests are allowed using AllowEmbeddings
and AllowCompletions
boolean properties:
- Set on frontends to control which request types clients can use those endpoint
- Set on backends to control which request types can be routed to that backend instance
- Both must be
true
for a request to succeed - if either the frontend or backend disallows a request type, it will fail
Example use cases:
- Dedicate specific frontends for embeddings-only workloads
- Reserve high-performance backends for completions only
- Create security boundaries between different request types
Pinned Request Properties
Force specific properties into requests using PinnedEmbeddingsProperties
and PinnedCompletionsProperties
dictionaries:
- Properties are automatically appended to requests that don't include them
- Properties overwrite existing values in the request for compliance enforcement
- Apply to both frontends and backends independently
- Support any valid request property (model, options, temperature, context size, stop tokens, etc.)
- Structure must mirror the API request format - for Ollama API, generation parameters go inside
options
object
Example use cases:
- Model enforcement: Ensure specific models are always used regardless of client request
- Resource control: Lock context sizes to prevent memory exhaustion
- Quality assurance: Standardize temperature and other generation parameters
- Security compliance: Override user-specified parameters to meet organizational policies
Property precedence (highest to lowest):
- Backend pinned properties
- Frontend pinned properties
- Original user request properties
Merge behavior:
- Uses recursive JSON merging via JsonMerge
- Nested objects are merged intelligently (new properties added, existing properties overwritten)
- Arrays are completely replaced, not merged
{
"Identifier": "secured-frontend",
"PinnedCompletionsProperties": {
"model": "llama3",
"options": {
"temperature": 0.3,
"num_ctx": 4096,
"stop": ["[DONE]", "\n\n"]
}
}
}
Multi-Backend Testing
Test with multiple AI backend instances using Docker Compose:
cd Docker
docker compose -f compose-ollama.yaml up -d
This spins up 4 Ollama instances on ports 11435-11438 for testing load balancing and transformation capabilities.
Admin API
Manage your cluster programmatically:
# List all backends
curl -H "Authorization: Bearer your-token" \
http://localhost:43411/v1.0/backends
# Add a new backend
curl -X PUT \
-H "Authorization: Bearer your-token" \
-H "Content-Type: application/json" \
-d '{"Identifier": "gpu-4", "Hostname": "192.168.1.104", "Port": 11434}' \
http://localhost:43411/v1.0/backends
A complete Postman collection (OllamaFlow.postman_collection.json
) is included in the repository root with examples for all API endpoints, including Ollama API, OpenAI API, and administrative APIs with native transformation examples.
For interactive API testing and experimentation, the OllamaFlow API Explorer provides a web-based dashboard for exploring and testing all OllamaFlow endpoints.
For a visual interface, check out the OllamaFlow Web UI which provides a dashboard for cluster management and monitoring.
🤝 Contributing
We welcome contributions! Whether it's:
- 🐛 Bug fixes
- ✨ New features
- 📚 Documentation improvements
- 💡 Feature requests
Please check out our Contributing Guidelines and feel free to:
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
📚 Documentation & Resources
- Complete Documentation - Comprehensive guides, API reference, and tutorials
- Web UI Dashboard - Visual cluster management interface
- API Explorer - Interactive web-based API testing and experimentation
- Postman Collection - API testing and examples
📜 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- The Ollama and vLLM teams for creating amazing local AI tools and model runners
- All our contributors and users who make this project possible
<div align="center"> <b>Ready to scale your AI infrastructure?</b><br> Get started with OllamaFlow today!<br><br> 📖 <a href="https://ollamaflow.readme.io/"><b>Documentation</b></a> | 🎨 <a href="https://github.com/ollamaflow/ui"><b>Web Dashboard</b></a> | 🔬 <a href="https://github.com/ollamaflow/apiexplorer"><b>API Explorer</b></a> </div>
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- ExpressionTree (>= 1.1.2)
- JsonMerge (>= 1.0.0)
- RestWrapper (>= 3.1.8)
- SyslogLogging (>= 2.0.10)
- UrlMatcher (>= 3.0.2)
- Watson (>= 6.3.15)
- WatsonORM.Sqlite (>= 3.0.14)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
Initial release.