MCP.Evals 0.0.3

dotnet tool install --global MCP.Evals --version 0.0.3
                    
This package contains a .NET tool you can call from the shell/command line.
dotnet new tool-manifest
                    
if you are setting up this repo
dotnet tool install --local MCP.Evals --version 0.0.3
                    
This package contains a .NET tool you can call from the shell/command line.
#tool dotnet:?package=MCP.Evals&version=0.0.3
                    
nuke :add-package MCP.Evals --version 0.0.3
                    

MCP.Evals

Build and Publish Run Examples NuGet Version .NET License

An evaluation framework for testing Model Context Protocol (MCP) servers with language models. MCP.Evals provides automated testing capabilities to validate MCP server implementations across different scenarios and ensure reliable tool execution.

Features

  • 🔍 Automated MCP Server Testing - Run evaluations against MCP servers

  • 🤖 Multi-Language Model Support - Azure OpenAI (verified), OpenAI (needs verification), Anthropic (not implemented)*

  • 📊 Flexible Configuration - YAML configuration support

  • 🚀 Multiple Transport Methods - Support for stdio and HTTP transports

  • Validation Framework - Configuration and request validation

  • 🛠️ CLI Tool - Easy-to-use command-line interface

Installation

dotnet tool install -g MCP.Evals

As a NuGet Package

dotnet add package MCP.Evals

Quick Start

1. Create an Evaluation Configuration

Create a YAML configuration file (e.g., my-server-eval.yaml):

# Language model configuration
# Status: azure-openai (✅ verified), openai (⚠️ needs verification), anthropic (❌ not implemented)
model:
  provider: azure-openai  # Recommended: use azure-openai (verified working)
  name: gpt-4o

# MCP server configuration
server:
  transport: stdio
  path: "./my-mcp-server.exe"

# Evaluation test cases
evals:
  - name: basic_math_test
    description: Test basic math operations
    prompt: "Use the calculator tool to add 5 + 3"
    expectedResult: "Should return 8"
    
  - name: echo_test
    description: Test echo functionality
    prompt: "Use the echo tool to repeat 'Hello World'"
    expectedResult: "Should echo back 'Hello World'"

2. Run Evaluations

# Run evaluations with API key
McpEval evaluate my-server-eval.yaml --api-key "your-api-key"

# Azure OpenAI with endpoint
McpEval evaluate my-server-eval.yaml --api-key "your-key" --endpoint "https://your-resource.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2025-01-01-preview"

# Save results to file
McpEval evaluate my-server-eval.yaml --output results.json --api-key "your-key"

# Validate configuration without running
McpEval validate my-server-eval.yaml

3. View Results

Results include detailed execution logs, scoring, and performance metrics:

{
  "configurationName": "my-server-eval",
  "totalEvaluations": 2,
  "successfulEvaluations": 2,
  "failedEvaluations": 0,
  "averageScore": 0.95,
  "evaluations": [
    {
      "name": "basic_math_test",
      "success": true,
      "score": 0.9,
      "executionTimeMs": 1234,
      "toolsUsed": ["calculator"]
    }
  ]
}

Configuration Options

Language Model Providers

Azure OpenAI ✅ VERIFIED WORKING
model:
  provider: azure-openai
  name: gpt-4o
OpenAI ⚠️ NEEDS VERIFICATION
model:
  provider: openai
  name: gpt-4o
Anthropic ❌ NOT IMPLEMENTED
model:
  provider: anthropic
  name: claude-3-5-sonnet-20241022

Transport Types

Standard I/O (stdio)
server:
  transport: stdio
  path: "./my-server.exe"
  args: ["arg1", "arg2"]  # Optional command line arguments
HTTP
server:
  transport: http
  url: "http://localhost:3000/mcp"
  path: "./server.js"  # Optional: auto-start server if not running

Examples

The repository includes example configurations for different scenarios:

CLI Commands

evaluate

Run evaluations from a configuration file:

McpEval evaluate <config-path> [options]

Options:
  --api-key       API key for language model provider
  --endpoint      Endpoint URL (required for Azure OpenAI)
  --output, -o    Output file path for results (JSON format)
  --format, -f    Output format (json, summary, detailed, clean)
  --verbose, -v   Enable verbose logging

validate

Validate configuration without running evaluations:

McpEval validate <config-path>

Configuration Options

API Key Configuration

API keys must be provided via command line argument:

  • Use --api-key option when running commands

Azure OpenAI Configuration

For Azure OpenAI, you also need to provide the endpoint:

  • Use --endpoint command line option with your Azure OpenAI deployment URL (e.g., https://your-resource.openai.azure.com/openai/deployments/model-name/chat/completions?api-version=2025-01-01-preview)

Logging Configuration

Control logging verbosity with the --verbose command line flag:

  • Default: Information level logging
  • With --verbose: Debug level logging

Requirements

  • .NET 8.0 or later
  • Valid API keys for chosen language model provider
  • MCP server to evaluate

Architecture

MCP.Evals follows SOLID principles with a clean architecture:

  • Commands - CLI command implementations
  • Services - Core business logic (orchestration, scoring, transport management)
  • Models - Data transfer objects and configuration models
  • Abstractions - Interfaces for dependency injection
  • Validation - FluentValidation rules for configuration validation

Support

Acknowledgments

This project was inspired by mcp-evals by mclenhard.


Implementation Status:

  • Azure OpenAI: ✅ Verified working and recommended for production use
  • OpenAI: ⚠️ Implementation exists but needs verification testing
  • Anthropic: ❌ Not implemented - returns placeholder responses only

Made with ❤️ for the MCP community

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

This package has no dependencies.

Version Downloads Last Updated
0.0.3 167 10/17/2025
0.0.2 169 10/17/2025
0.0.1 178 10/17/2025

Initial release of MCP.Evals evaluation framework