CopilotLocalRouter 0.1.1

dotnet tool install --global CopilotLocalRouter --version 0.1.1
                    
This package contains a .NET tool you can call from the shell/command line.
dotnet new tool-manifest
                    
if you are setting up this repo
dotnet tool install --local CopilotLocalRouter --version 0.1.1
                    
This package contains a .NET tool you can call from the shell/command line.
#tool dotnet:?package=CopilotLocalRouter&version=0.1.1
                    
nuke :add-package CopilotLocalRouter --version 0.1.1
                    

CopilotLocalRouter

<div align="center">

Route simple AI coding tasks to a local Ollama model — saving cloud tokens without interrupting your workflow.

NuGet NuGet Downloads Build License: MIT .NET 10

</div>


What is this?

GitHub Copilot, Claude Code, and Cursor send every request to an expensive cloud model — even trivial ones like "write a method to add two integers" or "explain what this loop does." CopilotLocalRouter sits between your AI assistant and the cloud, intercepting simple tasks and handling them locally with Ollama.

Complex tasks (multi-file refactors, architecture decisions) are passed through to the cloud unchanged.

Your AI Assistant (Copilot / Claude / Cursor)
         │
         ▼  MCP stdio transport
  ┌─────────────────────┐
  │  CopilotLocalRouter │
  │  ┌───────────────┐  │
  │  │  Classifier   │  │  ← scores prompt complexity
  │  └──────┬────────┘  │
  └─────────┼───────────┘
            │
     Simple / Medium ──────────► Ollama (local LLM, ~50ms)
            │
          Complex
            │
            ▼
       [skip] signal ──────────► Cloud model (unchanged)

Features

  • 5 MCP tools — generate, explain, refactor, review, and test generation
  • Automatic routing — heuristic classifier scores every prompt; no manual tagging required
  • Transparent fallback — returns [skip] so your AI assistant silently falls back to the cloud
  • Circuit breaker — if Ollama goes down, requests fall through to cloud immediately
  • Prompt cache — identical prompts served from an LRU cache (SHA256-keyed, 60 min TTL)
  • Cost tracking — logs estimated token savings every N requests
  • OTel metricsSystem.Diagnostics.Metrics compatible; connect any OTel collector
  • Zero config default — works out of the box with qwen2.5-coder on localhost:11434

Table of Contents


Prerequisites

Requirement Version Notes
.NET SDK 10.0+ Includes dnx — no separate tool install needed
Ollama 0.2.0+ Must be running locally or accessible on the network
AI Assistant Any GitHub Copilot, Claude Code, Cursor, or any MCP-compatible client

Installation

There is nothing to install. dnx is a tool execution script included with the .NET 10 SDK that works like npx — it downloads and runs a .NET tool on demand. Add the config block for your AI client (see Quick Start) and dnx handles the rest automatically.

Pinning a version: Use CopilotLocalRouter@0.1.0 in the args array to lock to a specific release. Omitting the version always uses the latest.

Build from Source

git clone https://github.com/michaelstonis/CopilotLocalRouter.git
cd CopilotLocalRouter
dotnet build
dotnet test

Quick Start

1. Start Ollama and pull a model:

ollama pull qwen2.5-coder

2. Add to your AI client's MCP config (no prior install needed — dnx downloads the tool on first run):

<details> <summary>VS Code / GitHub Copilot — <code>.vscode/mcp.json</code></summary>

{
  "servers": {
    "copilot-local-router": {
      "command": "dnx",
      "args": ["CopilotLocalRouter"],
      "env": {
        "OLLAMA_BASE_URL": "http://localhost:11434",
        "OLLAMA_MODEL": "qwen2.5-coder"
      }
    }
  }
}

</details>

<details> <summary>Claude Code — <code>.claude/mcp.json</code></summary>

{
  "mcpServers": {
    "copilot-local-router": {
      "command": "dnx",
      "args": ["CopilotLocalRouter"],
      "env": {
        "OLLAMA_BASE_URL": "http://localhost:11434",
        "OLLAMA_MODEL": "qwen2.5-coder"
      }
    }
  }
}

Or via CLI:

claude mcp add copilot-local-router dnx -- CopilotLocalRouter

</details>

<details> <summary>Cursor — <code>.cursor/mcp.json</code></summary>

{
  "mcpServers": {
    "copilot-local-router": {
      "command": "dnx",
      "args": ["CopilotLocalRouter"],
      "env": {
        "OLLAMA_BASE_URL": "http://localhost:11434",
        "OLLAMA_MODEL": "qwen2.5-coder"
      }
    }
  }
}

</details>

3. Restart your AI client and verify:

Ask your assistant: "What MCP tools are available?"

You should see: local_code_generate, local_code_explain, local_code_refactor, local_code_review, local_test_generate.


Supported AI Clients

Client Support Config file
VS Code + GitHub Copilot ✅ Full .vscode/mcp.json
Claude Code ✅ Full .claude/mcp.json
Cursor ✅ Full .cursor/mcp.json
Any MCP stdio client ✅ Full Client-specific

Available Tools

Tool name When it's used Skips to cloud when
local_code_generate Write a function, class, method, or boilerplate Task spans multiple files or requires architectural decisions
local_code_explain Explain what code does, how an algorithm works Explanation requires deep multi-file context
local_code_refactor Clean up, rename, extract, simplify within one file Refactor spans multiple files or involves breaking changes
local_code_review Review code for bugs, smells, and naming issues Full security audit or codebase-wide review is requested
local_test_generate Write unit tests for a function or class Integration tests or complex mocking is required

Model Size Best for Command
qwen2.5-coder 4.7 GB Code generation, refactoring, tests ollama pull qwen2.5-coder
gemma3:4b 3.3 GB Explanations, code review, low-RAM machines ollama pull gemma3:4b
deepseek-coder-v2 8.9 GB Highest quality code tasks ollama pull deepseek-coder-v2
codellama 3.8 GB General-purpose coding (older baseline) ollama pull codellama

Tip: qwen2.5-coder is the recommended default — it offers the best balance of speed, quality, and memory footprint for code tasks.


Configuration

All configuration is done via environment variables in your MCP client config, or as appsettings.json overrides when building from source.

Environment Variables

Variable Default Description
OLLAMA_BASE_URL http://localhost:11434 Ollama server URL
OLLAMA_MODEL qwen2.5-coder Default model for all tools
ROUTER_ENABLED true Set to false to disable routing (all tasks go to cloud)

Routing Thresholds

Control how aggressively tasks are routed locally by setting these in appsettings.json (source builds) or by contributing an env-var override:

Setting Default Effect
Router:SimpleConfidenceThreshold 0.75 Minimum score to route a "simple" task locally
Router:MediumConfidenceThreshold 0.50 Minimum score to route a "medium" task locally
Router:MaxTokensSimple 500 Token count upper bound for "simple" classification
Router:MaxTokensMedium 1500 Token count upper bound for "medium" classification

More aggressive local routing (accept more tasks locally, at lower quality confidence):

"SimpleConfidenceThreshold": 0.60,
"MediumConfidenceThreshold": 0.40

More conservative (only high-confidence simple tasks handled locally):

"SimpleConfidenceThreshold": 0.90,
"MediumConfidenceThreshold": 0.70

Cost Tracking

"CostTracking": {
  "Enabled": true,
  "CloudInputTokenRate": 0.003,
  "CloudOutputTokenRate": 0.015,
  "Currency": "USD",
  "LogSummaryEveryNRequests": 100
}

Rates are per 1,000 tokens. The estimator uses conservative sizing based on average Ollama response sizes.


How Routing Works

Every prompt is scored by a heuristic classifier across four signals:

  1. Keyword analysis — task verbs (generate, explain, refactor) and complexity indicators (across all, entire codebase, architecture)
  2. Token count — prompts over 500 tokens are unlikely to be simple tasks
  3. Structural signals — multi-file references, import counts, line counts
  4. Task type match — each tool has a baseline complexity expectation

Scores above SimpleConfidenceThreshold → local. Scores above MediumConfidenceThreshold → local (medium tier). Everything else → [skip].


Architecture

src/
  CopilotLocalRouter.Core/        # Domain logic
    Classification/               # Heuristic task classifier
    Routing/                      # Tiered request router
    Agents/                       # AgentManager — Ollama conversation executor
    Resilience/                   # Circuit breaker, LRU prompt cache, normalizer
    Telemetry/                    # Metrics, cost estimator, quality scorer
    Configuration/                # RouterOptions, ModelProfile
    Interfaces/                   # IRequestRouter, IAgentManager, ITaskClassifier
  CopilotLocalRouter.Ollama/      # OllamaSharp IChatClient integration
  CopilotLocalRouter.McpTools/    # MCP tool definitions (5 tools)
  CopilotLocalRouter.Host/        # Startup, DI, appsettings, health check

tests/
  CopilotLocalRouter.Core.Tests/         # Unit tests — classifier, resilience, telemetry
  CopilotLocalRouter.McpTools.Tests/     # Integration tests — MCP tool end-to-end

benchmarks/
  CopilotLocalRouter.Benchmarks/         # BenchmarkDotNet — classification + cache key perf

Key dependencies:

Package Version Role
ModelContextProtocol 1.2.0 MCP stdio server
OllamaSharp 5.4.25 Ollama IChatClient implementation
Microsoft.Extensions.AI 10.5.0 Middleware pipeline, IChatClient abstractions
Microsoft.Extensions.Hosting 10.0.7 DI, configuration, lifetime management

Resilience

Feature Behaviour
Circuit breaker Opens after 3 consecutive Ollama failures; auto-resets after 30 seconds
Prompt cache LRU cache, 500 entries, 60-minute TTL, SHA256-keyed
Retry Up to 2 retries with 1s / 3s exponential backoff
Graceful degradation Any failure returns [skip] — the user's request is never blocked

Telemetry

OTel-compatible metrics emitted via System.Diagnostics.Metrics (meter name: CopilotLocalRouter):

Metric Type Description
router.requests.total Counter All requests received
router.requests.local Counter Requests handled by Ollama
router.requests.skipped Counter Requests returned to cloud
router.cache.hits Counter Prompt cache hits
router.cache.misses Counter Prompt cache misses
router.circuit.state_changes Counter Circuit breaker state transitions
router.classification.duration_ms Histogram Time to classify a prompt
router.agent.duration_ms Histogram Time for Ollama to respond
router.agent.response_tokens Histogram Tokens in Ollama response
router.agents.active Gauge Concurrent Ollama calls in flight

Connect any OpenTelemetry collector (Prometheus, Jaeger, Grafana, etc.) to the standard OTel endpoint.


Contributing

Contributions are welcome. Please open an issue first if you're planning a significant change.

git clone https://github.com/michaelstonis/CopilotLocalRouter.git
cd CopilotLocalRouter
dotnet restore
dotnet build
dotnet test

Guidelines:

  • All new routing logic should have classifier unit tests
  • MCP tool changes require integration tests in CopilotLocalRouter.McpTools.Tests
  • Keep tool descriptions tightly tuned — they directly affect AI agent tool selection

Troubleshooting

See docs/troubleshooting.md for common issues including:

  • Ollama not reachable / tools not appearing in assistant
  • All tasks returning [skip] (threshold tuning)
  • Circuit breaker stuck open
  • Model not found errors

Changelog

See CHANGELOG.md for release history.


License

MIT — see LICENSE for details.

Product Compatible and additional computed target framework versions.
.NET net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

This package has no dependencies.

Version Downloads Last Updated
0.1.1 81 5/4/2026