MCP.Evals 0.0.3

.NET 8.0

dotnet tool install --global MCP.Evals --version 0.0.3

This package contains a .NET tool you can call from the shell/command line.

dotnet new tool-manifest
                    

                            if you are setting up this repo

dotnet tool install --local MCP.Evals --version 0.0.3

This package contains a .NET tool you can call from the shell/command line.

#tool dotnet:?package=MCP.Evals&version=0.0.3

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

nuke :add-package MCP.Evals --version 0.0.3

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

MCP.Evals

An evaluation framework for testing Model Context Protocol (MCP) servers with language models. MCP.Evals provides automated testing capabilities to validate MCP server implementations across different scenarios and ensure reliable tool execution.

Features

🔍 Automated MCP Server Testing - Run evaluations against MCP servers
🤖 Multi-Language Model Support - Azure OpenAI (verified), OpenAI (needs verification), Anthropic (not implemented)*
📊 Flexible Configuration - YAML configuration support
🚀 Multiple Transport Methods - Support for stdio and HTTP transports
✅ Validation Framework - Configuration and request validation
🛠️ CLI Tool - Easy-to-use command-line interface

Installation

As a Global Tool (Recommended)

dotnet tool install -g MCP.Evals

As a NuGet Package

dotnet add package MCP.Evals

Quick Start

1. Create an Evaluation Configuration

Create a YAML configuration file (e.g., my-server-eval.yaml):

# Language model configuration
# Status: azure-openai (✅ verified), openai (⚠️ needs verification), anthropic (❌ not implemented)
model:
  provider: azure-openai  # Recommended: use azure-openai (verified working)
  name: gpt-4o

# MCP server configuration
server:
  transport: stdio
  path: "./my-mcp-server.exe"

# Evaluation test cases
evals:
  - name: basic_math_test
    description: Test basic math operations
    prompt: "Use the calculator tool to add 5 + 3"
    expectedResult: "Should return 8"
    
  - name: echo_test
    description: Test echo functionality
    prompt: "Use the echo tool to repeat 'Hello World'"
    expectedResult: "Should echo back 'Hello World'"

2. Run Evaluations

# Run evaluations with API key
McpEval evaluate my-server-eval.yaml --api-key "your-api-key"

# Azure OpenAI with endpoint
McpEval evaluate my-server-eval.yaml --api-key "your-key" --endpoint "https://your-resource.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2025-01-01-preview"

# Save results to file
McpEval evaluate my-server-eval.yaml --output results.json --api-key "your-key"

# Validate configuration without running
McpEval validate my-server-eval.yaml

3. View Results

Results include detailed execution logs, scoring, and performance metrics:

{
  "configurationName": "my-server-eval",
  "totalEvaluations": 2,
  "successfulEvaluations": 2,
  "failedEvaluations": 0,
  "averageScore": 0.95,
  "evaluations": [
    {
      "name": "basic_math_test",
      "success": true,
      "score": 0.9,
      "executionTimeMs": 1234,
      "toolsUsed": ["calculator"]
    }
  ]
}

Configuration Options

Language Model Providers

Azure OpenAI ✅ VERIFIED WORKING

model:
  provider: azure-openai
  name: gpt-4o

OpenAI ⚠️ NEEDS VERIFICATION

model:
  provider: openai
  name: gpt-4o

Anthropic ❌ NOT IMPLEMENTED

model:
  provider: anthropic
  name: claude-3-5-sonnet-20241022

Transport Types

Standard I/O (stdio)

server:
  transport: stdio
  path: "./my-server.exe"
  args: ["arg1", "arg2"]  # Optional command line arguments

HTTP

server:
  transport: http
  url: "http://localhost:3000/mcp"
  path: "./server.js"  # Optional: auto-start server if not running

Examples

The repository includes example configurations for different scenarios:

C# MCP Server - Testing a C# MCP server implementation
TypeScript MCP Server - Testing a TypeScript MCP server

CLI Commands

evaluate

Run evaluations from a configuration file:

McpEval evaluate <config-path> [options]

Options:
  --api-key       API key for language model provider
  --endpoint      Endpoint URL (required for Azure OpenAI)
  --output, -o    Output file path for results (JSON format)
  --format, -f    Output format (json, summary, detailed, clean)
  --verbose, -v   Enable verbose logging

validate

Validate configuration without running evaluations:

McpEval validate <config-path>

Configuration Options

API Key Configuration

API keys must be provided via command line argument:

Use --api-key option when running commands

Azure OpenAI Configuration

For Azure OpenAI, you also need to provide the endpoint:

Use --endpoint command line option with your Azure OpenAI deployment URL (e.g., https://your-resource.openai.azure.com/openai/deployments/model-name/chat/completions?api-version=2025-01-01-preview)

Logging Configuration

Control logging verbosity with the --verbose command line flag:

Default: Information level logging
With --verbose: Debug level logging

Requirements

.NET 8.0 or later
Valid API keys for chosen language model provider
MCP server to evaluate

Architecture

MCP.Evals follows SOLID principles with a clean architecture:

Commands - CLI command implementations
Services - Core business logic (orchestration, scoring, transport management)
Models - Data transfer objects and configuration models
Abstractions - Interfaces for dependency injection
Validation - FluentValidation rules for configuration validation

Support

Acknowledgments

This project was inspired by mcp-evals by mclenhard.

Implementation Status:

Azure OpenAI: ✅ Verified working and recommended for production use
OpenAI: ⚠️ Implementation exists but needs verification testing
Anthropic: ❌ Not implemented - returns placeholder responses only

Made with ❤️ for the MCP community

Product	Compatible and additional computed target framework versions.
.NET	net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

This package has no dependencies.

Version	Downloads	Last Updated
0.0.3	167	10/17/2025
0.0.2	169	10/17/2025
0.0.1	178	10/17/2025

Initial release of MCP.Evals evaluation framework