FasterWhisper.NET 1.0.2

There is a newer version of this package available.
See the version list below for details.
dotnet add package FasterWhisper.NET --version 1.0.2
                    
NuGet\Install-Package FasterWhisper.NET -Version 1.0.2
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="FasterWhisper.NET" Version="1.0.2" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="FasterWhisper.NET" Version="1.0.2" />
                    
Directory.Packages.props
<PackageReference Include="FasterWhisper.NET" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add FasterWhisper.NET --version 1.0.2
                    
#r "nuget: FasterWhisper.NET, 1.0.2"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package FasterWhisper.NET@1.0.2
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=FasterWhisper.NET&version=1.0.2
                    
Install as a Cake Addin
#tool nuget:?package=FasterWhisper.NET&version=1.0.2
                    
Install as a Cake Tool

FasterWhisper.NET Banner

FasterWhisper.NET

by Qourex โ€” Bringing high-performance speech recognition to .NET

Build & Test NuGet Downloads License: MIT .NET


FasterWhisper.NET is a production-ready .NET SDK for OpenAI Whisper built on top of the high-performance CTranslate2 inference engine.

The project brings Whisper transcription, streaming, batching, diagnostics, audio analysis, subtitle generation, and deployment-focused tooling to modern .NET applications while remaining fully compatible with the CTranslate2 ecosystem.


โ“ Why FasterWhisper.NET?

FasterWhisper.NET focuses on delivering a complete, optimized .NET developer experience around Whisper inference. Key capabilities include:

  • ๐Ÿ”Œ Native .NET API Surface โ€” Elegant, idiomatic C# builder and async patterns.
  • โšก CTranslate2-Powered Inference โ€” High-performance inference powered by CTranslate2.
  • ๐Ÿ‘ฅ Shared-Weight Replica Pools โ€” Concurrent transcription powered by CTranslate2 replica pools that share underlying model weights.
  • ๐Ÿ”„ Streaming Transcription โ€” Real-time audio streaming powered by an async-push pipeline.
  • ๐Ÿ—ฃ๏ธ Voice Activity Detection (VAD) โ€” Fast, robust Silero VAD v5 ONNX integration.
  • ๐Ÿ“ˆ Audio Quality Assessment โ€” Built-in non-intrusive analyzer to grade audio signal quality before transcription.
  • ๐Ÿ” Hallucination Diagnostics โ€” Identifies repetitive loops and blank segments.
  • ๐Ÿ“ฆ Batched Inference Pipelines โ€” Concurrently processes partitioned segments for high throughput.
  • ๐Ÿ“Š Rich Profiling and Benchmarking โ€” Integrated resource performance metrics (CPU RAM/GPU VRAM/RTF).
  • ๐Ÿ“ Export Pipelines โ€” Formats segments directly into SRT, WebVTT, TSV, or JSON subtitles.
  • โœ… Extensive Automated Test Coverage โ€” Robust test suite validating all interop and wrapper features.

โš–๏ธ Ecosystem Positioning

FasterWhisper.NET and Python's faster-whisper both leverage the high-performance CTranslate2 inference engine under the hood.

  • faster-whisper provides an excellent Python-first experience and has become one of the most widely adopted Whisper implementations in the machine learning ecosystem.
  • FasterWhisper.NET focuses on the .NET ecosystem by providing:
    • Native C# APIs (fully typed, thread-safe, and disposable)
    • Async-first design (native async/await and IAsyncEnumerable<T>)
    • Seamless integration with ASP.NET Core, desktop apps, and enterprise .NET pipelines
    • Built-in host diagnostics, profiling, and quality analysis
    • Extended export format pipelines and subtitle rendering utilities
    • Deployment-oriented cross-platform packaging

๐Ÿฅ Project Health

Metric Value
Unit Tests 152
Native Backend CTranslate2 C++
License MIT
Target Frameworks .NET 8.0, .NET 9.0, .NET 10.0
Supported OS Windows (x64), Linux (x64), macOS (x64, ARM64)
Languages C#, C++, CUDA

๐ŸŽฏ Feature Matrix

Capability Support
Streaming Transcription โœ…
Voice Activity Detection (VAD) โœ…
Batched Inference Pipelines โœ…
Shared Replica Pools โœ…
Audio Quality Analysis โœ…
Hallucination Diagnostics โœ…
Subtitle & Transcript Export โœ…
Profiling & Benchmarking โœ…
Memory-Mapped Loading โœ…

๐Ÿ“ Architecture

Below is a high-level overview of the library's architecture:

graph TD
    App[Application] --> SDK[FasterWhisper.NET]
    subgraph SDK Features
        SDK --> Audio[Audio Processing]
        SDK --> Stream[Streaming & IAsyncEnumerable]
        SDK --> VAD[Silero VAD v5 ONNX]
        SDK --> Diag[Diagnostics & Quality Analysis]
        SDK --> Export[Export Subtitle Pipeline]
        SDK --> Pool[Replica Pool & Semaphores]
    end
    SDK --> CT2[CTranslate2 Engine C++]
    CT2 --> Models[Whisper Models]

โœจ Features

  • โšก High-Performance Inference โ€” Powered by CTranslate2 with INT8 / FP16 / INT16 quantization
  • ๐ŸŽฎ GPU & CPU Acceleration โ€” NVIDIA CUDA + cuDNN for GPU, Intel MKL-optimized CPU threads
  • ๐Ÿ“ฅ Automatic Model Downloader โ€” Downloads converted CTranslate2 models from Hugging Face Hub
  • ๐Ÿ”„ Streaming Transcription โ€” Real-time IAsyncEnumerable transcription with VAD-based segmentation
  • ๐Ÿ—ฃ๏ธ Silero VAD Integration โ€” Voice Activity Detection via ONNX Runtime to eliminate silence hallucinations
  • ๐Ÿ“ Word-Level Timestamps โ€” Cross-attention alignment for precise per-word timing
  • ๐ŸŒ Language Detection โ€” Auto-detect from 99+ languages with confidence scores
  • ๐ŸŽต Audio Processing Pipeline โ€” RMS normalization, high-pass filtering, spectral noise gate, Lanczos resampler, multi-format WAV decoder (8/16/24/32-bit PCM, IEEE float, A-law, ฮผ-law)
  • ๐Ÿ”ง Text Post-Processing โ€” Filler word filtering, stutter pruning, context-conditioned decoding
  • ๐Ÿ’ป Cross-Platform โ€” Native runtimes pre-compiled for Windows (win-x64), Linux (linux-x64), and macOS (osx-x64, osx-arm64 for Apple Silicon)
  • ๐Ÿ“ฆ Multi-Target โ€” Out-of-the-box support for .NET 8.0, .NET 9.0, and .NET 10.0

๐Ÿ“‹ Table of Contents


๐Ÿ“ฆ Installation

dotnet add package FasterWhisper.NET

Or via the Package Manager Console:

Install-Package FasterWhisper.NET

The NuGet package ships with pre-compiled native binaries for Windows (win-x64), Linux (linux-x64), and macOS (osx-x64, osx-arm64) out of the box. For GPU acceleration, see CUDA Prerequisites.


๐Ÿš€ Quick Start

Concurrency & Thread-Safety: WhisperModel supports concurrent transcription calls. Under the hood, concurrent calls are queued and processed safely using a SemaphoreSlim. If you configure the model with NumReplicas > 1, transcription calls will execute concurrently utilizing CTranslate2's native thread-safe replica pool, sharing the same loaded model weights in memory to minimize additional memory overhead.

using Qourex.FasterWhisper.NET;

// 1. Download and load the model (cached to ~/.cache/qourex-fasterwhisper)
using var model = await WhisperModel.LoadAsync(
    modelNameOrPath: "base",       // "tiny", "base", "small", "medium", "large-v3", etc.
    device:          "cpu",        // "cpu" or "cuda"
    computeType:     "default"     // "float32", "float16", "int8", "int8_float16", etc.
);

// 2. Configure transcription options
var options = new WhisperOptions
{
    BeamSize = 5,
    WordTimestamps = false
};

// 3. Configure Voice Activity Detection (optional)
var vadOptions = new VadOptions
{
    Enabled   = true,
    Threshold = 0.5f
};

// 4. Transcribe โ€” supports WAV, MP3, MP4, Opus (FFmpeg fallback)
var segments = model.Transcribe(
    mediaPath:  "audio.wav",
    language:   "en",           // pass null for auto-detection
    options:    options,
    vadOptions: vadOptions
);

// 5. Display results
foreach (var segment in segments)
{
    Console.WriteLine($"[{segment.Start:F2}s -> {segment.End:F2}s] {segment.Text}");
}

๐Ÿค– Available Models

Models are automatically downloaded from the Hugging Face Hub on first use and cached locally.

Model Parameters Size on Disk Approx. VRAM (FP16) Relative Speed Best For
tiny 39 M ~75 MB ~1 GB Fastest Quick prototyping, edge devices
base 74 M ~142 MB ~1 GB Very Fast Lightweight applications
small 244 M ~466 MB ~2 GB Fast Good accuracy/speed balance
medium 769 M ~1.5 GB ~5 GB Moderate High accuracy
large-v1 1550 M ~3.1 GB ~10 GB Slow Maximum accuracy (v1)
large-v2 1550 M ~3.1 GB ~10 GB Slow Maximum accuracy (v2)
large-v3 1550 M ~3.1 GB ~10 GB Slow Best overall accuracy
large-v3-turbo 809 M ~1.6 GB ~3 GB Fast Great accuracy/speed trade-off
faster-distil-whisper-large-v3 756 M ~1.5 GB ~3 GB Fast Excellent English-only speed & accuracy

Devices & Compute Types

Device Description
"cpu" CPU execution with Intel MKL optimization
"cuda" NVIDIA GPU via CUDA (requires CUDA Toolkit + cuDNN)
Compute Type Description
"default" Automatically selects the best type for the device
"float32" Full 32-bit floating point precision
"float16" Half precision โ€” faster on GPU, lower memory
"int8" 8-bit integer quantization โ€” fastest, smallest footprint
"int8_float16" Mixed INT8 compute with FP16 storage
"int16" 16-bit integer quantization

๐Ÿ“Š Performance Benchmarks

Below are the results of the performance and resource footprint benchmarks run using the console samples.

Running the Benchmarks

The benchmark runner is built directly into the repository's sample CLI tool. You can run these exact performance tests on your own machine using the following commands:

# Run on CPU with 'tiny' model and standard harvard.wav
dotnet run --project samples/Qourex.FasterWhisper.NET.Samples/Qourex.FasterWhisper.NET.Samples.csproj -- benchmark cpu tiny harvard.wav

# Run on GPU (CUDA) with 'faster-distil-whisper-large-v3' and custom audio
dotnet run --project samples/Qourex.FasterWhisper.NET.Samples/Qourex.FasterWhisper.NET.Samples.csproj -p:UseGpuTest=true -- benchmark cuda faster-distil-whisper-large-v3 harvard.wav

Benchmark Environment

  • CPU: Intel Core i7-4790 (4 Cores / 8 Threads)
  • RAM: 32 GB DDR3
  • GPU: NVIDIA GeForce GTX 1070 Ti (8 GB VRAM)
  • CUDA: 12.4 / cuDNN 9.1
  • OS: Windows 11 Pro
  • Model: faster-distil-whisper-large-v3 (756 million parameters)
  • Audio Length: 972.29 seconds (16.2 minutes)

Memory & Startup Overhead (Standard vs Memory-Mapped)

Memory-mapped loading can reduce model initialization time and may be beneficial for CPU-based deployments.

For CUDA deployments, standard path-based loading may provide lower steady-state host memory usage because model weights are uploaded to GPU memory and temporary host buffers can be released.

Choose the loading strategy that best fits your deployment environment.

Loading Strategy Load Time CPU RAM Delta GPU VRAM Delta Startup Speedup
Standard Load (No-Mmap) 3,172.8 ms 110.1 MB 3,814.0 MB Baseline
Memory-Mapped Load 2,506.2 ms 963.1 MB 3,713.0 MB 1.27x

Standard loading registers a lower C# process working set (CPU RAM Delta) because model loading is fully delegated to the native C++ heap. Memory-mapped loading allocates virtual memory buffers within C# before passing pinned pointers to C++, which is reflected in C# working set statistics.

Multi-Replica Resource Scaling (Shared weights in memory)

When scaling the number of concurrent execution threads via NumReplicas within the same process, CTranslate2's architecture shares weight tensors in memory. Scaling from 1 to 4 replicas shows that GPU memory remains effectively constant while CPU memory increases only modestly relative to model size:

Configuration Load Time CPU RAM GPU VRAM
NumReplicas = 1 2,516.7 ms 963.1 MB 3,712.0 MB
NumReplicas = 2 2,406.2 ms 1,454.2 MB 3,700.0 MB
NumReplicas = 4 2,496.4 ms 1,447.3 MB 3,712.0 MB

Scaling to multiple replicas does not double or quadruple the RAM/VRAM footprint because the underlying model weights are shared across workers. Memory footprint stays virtually constant, with only minor differences due to activation caching allocations.

Performance & Optimization Results

Benchmark Phase / Configuration Duration Speedup / RTF
Quantization 'default' (float32) 44,403.2 ms Real-Time Factor (RTF): 0.0457
Quantization 'int8' 31,708.0 ms Real-Time Factor (RTF): 0.0326
Quantization Speedup (int8 vs float32) โ€” 1.40x
4 Tasks Parallel (Blocked, Replica = 1) 179,217.2 ms Baseline
4 Tasks Parallel (Concurrent, Replica = 2) 155,992.5 ms 1.15x speedup
Pipeline Sequential (2x Audio) 89,092.7 ms Baseline
Pipeline Batched (2x Audio) 57,802.7 ms 1.54x speedup

๐Ÿ”ฌ Advanced Usage

Fluent Model Builder

For a cleaner and more discoverable configuration experience, you can use the WhisperModelBuilder:

using Qourex.FasterWhisper.NET;

using var model = await WhisperModelBuilder.Create("base")
    .WithDevice("cuda")
    .WithComputeType("float16")
    .WithNumReplicas(2)
    .WithVad(threshold: 0.5f)
    .WithWordTimestamps()
    .WithDenoising()
    .BuildAsync();

var segments = model.Transcribe("meeting.wav");

Concurrency & Multi-Replica Execution

By default, WhisperModel serializes transcription calls using a thread-safe semaphore. To enable true concurrent processing of multiple transcription requests without duplicating model weight memory, configure numReplicas > 1 (or use the WithNumReplicas builder option):

// Load model with 2 replicas (shares weights in memory but runs 2 parallel inferences)
using var model = await WhisperModel.LoadAsync(
    modelNameOrPath: "base",
    device: "cpu",
    numReplicas: 2
);

// Transcribe files concurrently
var tasks = new[] { "audio1.wav", "audio2.wav" }.Select(file => Task.Run(() =>
{
    var segments = model.Transcribe(file);
    Console.WriteLine($"Finished transcribing {file}");
}));

await Task.WhenAll(tasks);

Batched Inference Pipeline

For high-throughput batch processing of long audio files, use BatchedInferencePipeline. It splits audio into chunks (using VAD) and batches them together to process multiple segments concurrently. This can yield a 2-4x speedup on GPU:

using Qourex.FasterWhisper.NET;

using var model = await WhisperModel.LoadAsync("base", device: "cuda");
using var pipeline = new BatchedInferencePipeline(model, batchSize: 8);

var result = pipeline.Transcribe("long_podcast.mp3");
foreach (var segment in result.Segments)
{
    Console.WriteLine($"[{segment.Start}s -> {segment.End}s] {segment.Text}");
}

Word-Level Timestamps

Extract precise per-word timing using CTranslate2's native cross-attention alignment:

var options = new WhisperOptions
{
    WordTimestamps   = true,
    MedianFilterWidth = 7     // smoothing for cross-attention matrix
};

var segments = model.Transcribe("interview.wav", language: "en", options: options);

foreach (var segment in segments)
{
    Console.WriteLine($"[{segment.Start:F2}s -> {segment.End:F2}s] {segment.Text}");

    foreach (var word in segment.Words)
    {
        Console.WriteLine($"  '{word.Word}' [{word.Start:F2}s -> {word.End:F2}s] (p={word.Probability:F3})");
    }
}

Streaming Transcription

Transcribe audio in real-time from a live microphone or network stream. Uses VAD to automatically segment the audio and yield results as speech is detected:

// Simulate a real-time audio source (e.g., microphone capture)
async IAsyncEnumerable<float[]> GetAudioStream()
{
    // Each chunk: raw mono float32 PCM at 16 kHz
    while (capturing)
    {
        yield return await microphone.ReadChunkAsync();
    }
}

var options = new WhisperOptions { BeamSize = 1 }; // greedy for low latency
var vadOptions = new VadOptions
{
    Enabled              = true,
    Threshold            = 0.5f,
    MinSpeechDurationMs  = 250,
    MinSilenceDurationMs = 100
};

await foreach (var segment in model.TranscribeStreamAsync(
    GetAudioStream(),
    language: "en",
    options: options,
    vadOptions: vadOptions))
{
    Console.WriteLine($"[LIVE] {segment.Text}");
}

Voice Activity Detection (VAD)

Silero VAD v5 runs via ONNX Runtime to detect speech regions. The ONNX model (silero_vad.onnx) is auto-downloaded on first use. VAD dramatically improves transcription speed and eliminates silence-induced hallucinations.

var vadOptions = new VadOptions
{
    Enabled              = true,   // enable Silero VAD segmentation
    Threshold            = 0.5f,   // speech probability threshold (0.0โ€“1.0)
    MinSpeechDurationMs  = 250,    // discard speech shorter than this
    MinSilenceDurationMs = 100     // split on silence longer than this
};

var segments = model.Transcribe("meeting.mp3", language: null, vadOptions: vadOptions);

Enabling VAD is strongly recommended for long-form audio. It splits audio into speech-only regions before transcription, which speeds up processing and prevents the model from hallucinating text during silent passages.

In-Memory Model Loading

The loaded model files dictionary must contain either vocabulary.txt or vocabulary.json to properly initialize the Whisper tokenizer. Failing to include a valid vocabulary file will result in a KeyNotFoundException at runtime.

Load models entirely from byte arrays โ€” useful for embedded scenarios, encrypted model storage, or custom deployment pipelines:

// Load model files from any source (database, blob storage, embedded resource, etc.)
var modelFiles = new Dictionary<string, byte[]>
{
    ["model.bin"]       = File.ReadAllBytes("path/to/model.bin"),
    ["config.json"]     = File.ReadAllBytes("path/to/config.json"),
    ["vocabulary.txt"]  = File.ReadAllBytes("path/to/vocabulary.txt")
};

using var model = new WhisperModel(
    modelFiles,
    device: "cpu",
    computeType: "int8",
    cpuThreads: 4
);

var segments = model.Transcribe("audio.wav", language: "en");

Language Detection

Detect the spoken language before transcription, or let the library auto-detect:

// Explicit language detection
float[] pcm = LoadAudioAsPcm("speech.wav"); // 16 kHz mono float32
var languages = model.DetectLanguage(pcm);

foreach (var (language, probability) in languages.Take(5))
{
    Console.WriteLine($"  {language}: {probability:P1}");
}
// Output:
//   en: 98.5%
//   de: 0.8%
//   fr: 0.3%
//   ...

// Auto-detection during transcription โ€” just pass language as null
var segments = model.Transcribe("unknown_language.wav", language: null);

Audio Processing Options

Built-in audio preprocessing that runs before transcription:

var options = new WhisperOptions
{
    NormalizeAudio    = true,   // RMS volume normalization (default: true)
    CutLowFrequencies = true,  // High-pass filter at 80 Hz โ€” removes DC offset
                               //   and microphone hum (default: true)
    PreEmphasis       = false, // Pre-emphasis filter for improved high-freq clarity
    DenoiseAudio      = false  // Spectral noise gate for stationary background noise
};

Supported input formats: | Format | Method | |:-------|:-------| | WAV โ€” 8/16/24/32-bit PCM | Direct loading via managed reader | | WAV โ€” 32/64-bit IEEE float | Direct loading via managed reader | | WAV โ€” A-law / ฮผ-law (G.711) | Direct loading via managed reader | | MP3, MP4, Opus, FLAC, etc. | Auto-decoded via FFmpeg subprocess | | Raw PCM float[] | Pass directly to Transcribe(float[] pcm, ...) |

For non-WAV formats, FFmpeg must be installed and available on the system PATH. Audio is automatically resampled to 16 kHz mono using a Lanczos windowed-sinc resampler.

Text Post-Processing

Clean up transcription output with built-in filters:

var options = new WhisperOptions
{
    FilterFillerWords       = true,   // removes "uh", "um", "ah", "eh", "mhm", etc.
    PruneStutters           = true,   // removes consecutive duplicate words (stuttering)
    ConditionOnPreviousText = true    // conditions next chunk on previous text for
};                                    //   better context continuity (default: true)

Audio Quality Assessment

Before running transcription, you can assess the input audio quality using the built-in non-intrusive analyzer. It grades the audio (Excellent, Good, Fair, Poor) and provides suggestions:

using Qourex.FasterWhisper.NET;

float[] samples = WhisperModel.LoadAudio("low_quality.wav");
var report = AudioQualityReport.Assess(samples);

Console.WriteLine($"Audio Quality Grade: {report.OverallGrade}");
Console.WriteLine($"Estimated SNR: {report.SignalToNoiseRatio:F1} dB");

foreach (var suggestion in report.Suggestions)
{
    Console.WriteLine($"Suggestion: {suggestion}");
}

Subtitle & Export Formats

Easily export transcribed segments into standard subtitle files (SRT, WebVTT, TSV, JSON, etc.):

using Qourex.FasterWhisper.NET;

var segments = model.Transcribe("movie.wav");

// Export to SRT format string
string srtContent = SubtitleExporter.ToSrt(segments);
File.WriteAllText("movie.srt", srtContent);

// Export to WebVTT file directly
SubtitleExporter.WriteVtt(segments, "movie.vtt");

๐Ÿ’ก Common Use Cases

  • ASP.NET Core Transcription APIs โ€” Scalable microservices for processing user-uploaded media.
  • Meeting Transcription โ€” Splitting long audio, detecting speakers, and exporting structured transcripts.
  • Subtitle Generation โ€” Auto-generating SRT or WebVTT files with precise word-level alignments.
  • Media Content Indexing โ€” Fast transcription of audio archives for search and semantic analysis.
  • Podcast Processing โ€” Automated transcription pipelines with spectral denoising and filler-word removal.
  • Accessibility Workflows โ€” Real-time speech-to-text generation for live captioning.
  • Offline Speech-to-Text โ€” On-device transcription for desktop applications without cloud APIs.
  • Enterprise Media Pipelines โ€” Reliable, high-concurrency document and audio digitization workflows.

๐Ÿงช Automated Testing

The project includes 152 automated tests validating all interop boundaries, features, and edge cases:

  • Model Loading โ€” Verifies local paths, HuggingFace downloads, and in-memory loading.
  • Audio Decoding โ€” Tests standard PCM WAV layouts and external FFmpeg fallback streams.
  • VAD Segmentation โ€” Validates Silero speech boundaries and window offsets.
  • Streaming โ€” Verifies asynchronous chunk ingestion and event firing.
  • Timestamps โ€” Asserts exact start/end positions and cross-attention alignment.
  • Subtitle Export โ€” Verifies VTT, SRT, TSV, and JSON formatter correctness.
  • Batching โ€” Tests high-throughput batched pipelines and chunk splitting.
  • Replica Pools โ€” Asserts thread safety and semaphore queuing.
  • Diagnostics โ€” Tests SNR calculations, clipping checks, and quality grading.
  • Quantization & Error Handling โ€” Validates compute configurations and exception safety.

๐Ÿ“– API Reference

WhisperOptions

All decoding and processing parameters. Every property has a sensible default.

Property Type Default Description
BeamSize int 5 Beam size for beam search. Set to 1 for greedy decoding
Patience float 1.0 Beam search patience factor. Decoding continues until BeamSize ร— Patience hypotheses finish
LengthPenalty float 1.0 Exponential penalty applied to output length during beam search
RepetitionPenalty float 1.0 Penalty for previously generated tokens. Set > 1.0 to discourage repetition
NoRepeatNgramSize int 0 Prevent repetitions of n-grams of this size. 0 = disabled
MaxLength int 448 Maximum number of tokens to generate per segment
SamplingTopK int 1 Top-K sampling candidates. 1 = greedy, 0 = sample from full distribution
SamplingTemperature float 1.0 Sampling temperature. Higher values increase randomness
NumHypotheses int 1 Number of hypotheses to return
ReturnScores bool true Include average log-probability scores in results
ReturnNoSpeechProb bool true Include no-speech token probability in results
MaxInitialTimestampIndex int 50 Maximum index of the first predicted timestamp token
SuppressBlank bool true Suppress blank outputs at the beginning of sampling
SuppressTokens int[]? [-1] Token IDs to suppress during generation. null uses model defaults
WordTimestamps bool false Extract word-level timestamps via cross-attention alignment
MedianFilterWidth int 7 Median filter width applied to cross-attention matrix for word alignment
Temperatures float[] [0.0, 0.2, 0.4, 0.6, 0.8, 1.0] Temperature fallback sequence. Retries with next temperature if validation fails
LogProbThreshold float -1.0 Minimum average log-probability. Segments below this are retried or rejected
NoSpeechThreshold float 0.6 If no-speech probability exceeds this and log-prob is below threshold, segment is silence
CompressionRatioThreshold float 2.4 Maximum gzip compression ratio. Segments above this are considered repetitive
Prefix string? null Optional text prefix to guide the first transcription chunk
WithoutTimestamps bool false Suppress timestamp token generation
NormalizeAudio bool true Apply RMS volume normalization before transcription
CutLowFrequencies bool true Apply 80 Hz high-pass filter (removes DC offset and microphone hum)
ConditionOnPreviousText bool true Condition next chunk decoding on previous chunk's transcribed text
FilterFillerWords bool false Remove filler words (uh, um, ah, eh, uh-huh, mhm) from output
PruneStutters bool false Remove consecutive duplicate words (stuttering) from output
PreEmphasis bool false Apply pre-emphasis filter to boost high-frequency clarity
DenoiseAudio bool false Apply spectral noise gate for stationary background noise reduction
InitialPrompt string? null Optional prompt text context to guide transcription spelling/style
Hotwords string? null Comma-separated list of words/phrases to boost in the transcription prompt
HallucinationSilenceThreshold float 0 Skip silent sections longer than this (in seconds) if hallucination is detected. 0 = disabled
PrependPunctuations string "\"'โ€œยฟ([{-" Punctuation marks prepended to the following word
AppendPunctuations string "\".ใ€‚,๏ผŒ!๏ผ?๏ผŸ:๏ผš)โ€)]}ใ€" Punctuation marks appended to the preceding word
MaxNewTokens int 0 Maximum new tokens generated per chunk. 0 uses MaxLength
BestOf int 5 Number of candidates sampled when temperature > 0
PromptResetOnTemperature float 0.5 Reset prompt context when temperature fallback exceeds this threshold
ClipTimestamps List<(float, float)>? null Explicit start/end timestamp ranges to restrict transcription
Multilingual bool false Perform independent language detection per 30-second chunk
AdaptiveBeamSize bool true Use greedy decoding at temp=0 and full beam on fallback retries
RestoreTextFormatting bool false Apply rule-based capitalization and punctuation formatting to outputs
VocabularyBias Dictionary<string, float>? null Custom words/phrases with bias strengths to boost during decoding
MultiPassEnabled bool false Enable second-pass re-transcription for low-confidence segments
MultiPassConfidenceThreshold float 0.6 Confidence threshold below which segments are retranscribed
MultiPassBeamSize int 10 Beam size used for second-pass re-transcription

VadOptions

Voice Activity Detection configuration using Silero VAD v5.

Property Type Default Description
Enabled bool false Enable or disable VAD segmentation
Threshold float 0.5 Speech probability threshold (0.0โ€“1.0). Frames above this are classified as speech
MinSpeechDurationMs int 250 Minimum speech duration in milliseconds. Shorter segments are discarded
MinSilenceDurationMs int 2000 Minimum silence duration in milliseconds to trigger a segment split

WhisperSegment

Represents a transcribed audio segment.

Property Type Description
Text string Transcribed text content
Tokens int[] Raw token IDs generated by the model
Score float Average log-probability (generation quality score)
NoSpeechProb float Probability of the no-speech token
Start float Start timestamp in seconds
End float End timestamp in seconds
Words List<WhisperWord> Word-level timestamps (populated when WordTimestamps = true)

WhisperWord

Represents a single aligned word with timing information.

Property Type Description
Word string The word text
Start float Start time in seconds
End float End time in seconds
Probability float Alignment / token probability (0.0โ€“1.0)

๐Ÿ”จ Building from Source

Prerequisites

Tool Required For Download
CMake 3.18+ Native C++ build cmake.org
Visual Studio 2022 (MSVC) C++ compiler visualstudio.com
CUDA Toolkit 12.x GPU builds only NVIDIA
cuDNN 9.x GPU builds only NVIDIA
.NET SDK 8.0+ C# library dotnet.microsoft.com

Build Script

The repository includes a build.ps1 PowerShell script that automates the entire pipeline:

# Build with CUDA + GPU support (default)
.\build.ps1

# Build CPU-only (no CUDA/NVCC required)
.\build.ps1 -CpuOnly

The build script performs these steps:

  1. Configures and compiles the native C++ wrapper via CMake (generates qourex_fasterwhisper_native.dll)
  2. Copies native DLLs (CTranslate2, MKL, OpenMP) into the C# project's runtimes/win-x64/native/ folder
  3. Builds the .NET solution and packs the NuGet package into the ./artifacts/ directory

Manual Build

# 1. Build native C++ wrapper (run from VS Developer Command Prompt with vcvars64.bat)
cd src/Qourex.FasterWhisper.Native
mkdir build && cd build
cmake .. -G "NMake Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -DOPENMP_RUNTIME=INTEL -DWITH_MKL=ON -DMKL_ROOT="C:\Program Files (x86)\Intel\oneAPI\mkl\latest" -DWITH_DNNL=OFF -DWITH_CUDA=ON -DWITH_CUDNN=ON
cmake --build .

# 2. Copy qourex_fasterwhisper_native.dll to the C# runtimes folder
cp qourex_fasterwhisper_native.dll ../../Qourex.FasterWhisper.NET/runtimes/win-x64/native/

# 3. Build and pack the C# library
cd ../../Qourex.FasterWhisper.NET
dotnet pack --configuration Release --output ../../artifacts

When compiling manually on Windows, ensure -DMKL_ROOT points to the correct installation path of the Intel oneMKL SDK on your system (e.g. C:\Program Files (x86)\Intel\oneAPI\mkl\latest). The build.ps1 script automatically locates this path.


๐Ÿ–ฅ๏ธ CUDA Prerequisites

These are only required when using device: "cuda" for GPU acceleration. CPU-only usage has no additional dependencies.

  1. NVIDIA CUDA Toolkit 12.x โ€” Download
  2. NVIDIA cuDNN 9.x โ€” Download

Ensure the following DLLs are accessible on your system PATH:

DLL Source
cudart64_*.dll CUDA Toolkit
cublas64_*.dll CUDA Toolkit
cublasLt64_*.dll CUDA Toolkit
cudnn64_*.dll cuDNN

Flash Attention

For additional performance on supported NVIDIA GPUs (Ampere and later), enable Flash Attention:

using var model = await WhisperModel.LoadAsync(
    modelNameOrPath: "large-v3",
    device: "cuda",
    computeType: "float16",
    flashAttention: true          // โšก Flash Attention on CUDA
);

Flash Attention requires CUDA device and a GPU with compute capability โ‰ฅ 8.0 (Ampere architecture or newer). It is silently ignored on CPU.


๐Ÿ“ Project Structure

Qourex.FasterWhisper/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ Qourex.FasterWhisper.Native/   # C++ CMake project (CTranslate2 wrapper)
โ”‚   โ”‚   โ”œโ”€โ”€ CMakeLists.txt
โ”‚   โ”‚   โ”œโ”€โ”€ qourex_fasterwhisper_native.cpp
โ”‚   โ”‚   โ””โ”€โ”€ qourex_fasterwhisper_native.h
โ”‚   โ””โ”€โ”€ Qourex.FasterWhisper.NET/      # C# library (NuGet package source)
โ”‚       โ”œโ”€โ”€ WhisperModel.cs            #   Model loading, transcription, streaming
โ”‚       โ”œโ”€โ”€ WhisperOptions.cs          #   Decoding options & VAD options
โ”‚       โ”œโ”€โ”€ WhisperSegment.cs          #   Segment & Word result types
โ”‚       โ”œโ”€โ”€ WhisperTokenizer.cs        #   BPE tokenizer (GPT-2 vocabulary)
โ”‚       โ”œโ”€โ”€ AudioProcessor.cs          #   Mel spectrogram, resampling, filtering
โ”‚       โ”œโ”€โ”€ SileroVad.cs               #   Silero VAD v5 (ONNX Runtime)
โ”‚       โ”œโ”€โ”€ ModelDownloader.cs         #   HuggingFace model downloader
โ”‚       โ”œโ”€โ”€ NativeMethods.cs           #   P/Invoke native interop layer
โ”‚       โ””โ”€โ”€ Qourex.FasterWhisper.NET.csproj
โ”œโ”€โ”€ samples/
โ”‚   โ””โ”€โ”€ Qourex.FasterWhisper.NET.Samples/ # Console app feature demo
โ”œโ”€โ”€ tests/
โ”‚   โ””โ”€โ”€ Qourex.FasterWhisper.NET.Tests/ # xUnit test suite
โ”œโ”€โ”€ build.ps1                          # Automated build script
โ”œโ”€โ”€ Qourex.FasterWhisper.slnx          # Solution file
โ”œโ”€โ”€ LICENSE                            # MIT License
โ””โ”€โ”€ README.md

๐Ÿš€ Deployment & Troubleshooting Guide

1. ASP.NET Core Integration (Dependency Injection)

WhisperModel loads heavy neural network weights into RAM/VRAM and initializes CTranslate2 engine replicas. This is a resource-intensive operation that should only occur once during application startup.

Always register WhisperModel as a Singleton in your ASP.NET Core application:

// Program.cs
builder.Services.AddSingleton<WhisperModel>(sp =>
{
    // Configure and load the model synchronously or asynchronously at startup
    return WhisperModel.LoadAsync(
        modelNameOrPath: "base",
        device: "cpu",
        computeType: "default",
        numReplicas: 2 // Allow 2 concurrent transcriptions on this model
    ).GetAwaiter().GetResult();
});

Because WhisperModel is thread-safe and internally coordinates concurrent access via a replica pool and a semaphore, you can safely inject it into scoped Controllers or Services:

[ApiController]
[Route("api/transcribe")]
public class TranscriptionController : ControllerBase
{
    private readonly WhisperModel _model;

    public TranscriptionController(WhisperModel model)
    {
        _model = model;
    }

    [HttpPost]
    public async Task<IActionResult> TranscribeAudio(IFormFile file)
    {
        using var stream = file.OpenReadStream();
        // Decode and transcribe the audio
        var segments = await Task.Run(() => _model.Transcribe(stream));
        return Ok(segments);
    }
}

2. Docker GPU Deployment (Linux Container)

To run the GPU-enabled package inside a Linux Docker container, you must use a base image containing the CUDA and cuDNN runtimes and configure the container to access the host's GPU:

# Use official NVIDIA CUDA runtime as base
FROM nvidia/cuda:12.4.1-runtime-ubuntu22.04

# Install .NET SDK/Runtime
RUN apt-get update && apt-get install -y dotnet-sdk-8.0

# Install cuDNN and basic dependencies
RUN apt-get install -y libcublas-12-4 libcudnn9-cuda-12

WORKDIR /app
COPY . .
RUN dotnet publish -c Release -o out

ENTRYPOINT ["dotnet", "out/YourApp.dll"]

Run the container with host GPU access:

docker run --gpus all -it your-whisper-app

3. Troubleshooting Common Runtime Errors

๐Ÿ”ด DllNotFoundException (Native Library Missing)
  • The Issue: The application fails to start or throws DllNotFoundException stating qourex_fasterwhisper_native or one of its dependencies could not be loaded.
  • The Cause: On Windows, native libraries depend on the C++ Redistributable and Intel MKL runtimes. On GPU builds, the system must have the correct CUDA Toolkit (12.x) and cuDNN versions installed and available in the system PATH.
  • The Solution:
    1. Install the Visual C++ Redistributable.
    2. For GPU execution, verify that cudart64_12.dll, cublas64_12.dll, and cudnn64_9.dll are in your Windows PATH environment variable. You can verify GPU compatibility by loading the model with device: "cpu" first to isolate dependency paths.
๐Ÿ”ด StackOverflowException in StreamingMelExtractor
  • The Issue: Process crashes abruptly without a catchable exception when initializing StreamingMelExtractor.
  • The Cause: Passing an excessively large fftSize parameter triggers stack allocation exhaustion.
  • The Solution: Keep the fftSize under 8192 (default is 400 or 512). A bounds check has been added to enforce safe thresholds.

โš–๏ธ Downstream Licensing Obligations

When deploying applications utilizing FasterWhisper.NET, downstream developers must conform to the licenses of bundled and external dependencies:

  1. Intel oneMKL (ISSL License): The Windows NuGet packages bundle Intel MKL runtime binaries (mkl_core.3.dll, etc.) under the Intel Simplified Software License (ISSL) to ensure high-performance CPU execution out-of-the-box. Downstream commercial users must be aware that the ISSL contains active reverse-engineering prohibitions (Section 3). For environments where ISSL is restricted, consider running on Linux (which links dynamically to standard OpenBLAS) or compiling the Windows C++ wrapper without MKL.
  2. FFmpeg (LGPL / GPL): The library does not bundle FFmpeg, but invokes the ffmpeg CLI as a subprocess for decoding non-WAV media formats. If your application bundles or distributes FFmpeg binaries to support this fallback:
    • Ensure you comply with FFmpeg's LGPL (default) or GPL (if compiled with non-free components like x264/x265) redistribution terms.
    • Tip: You can bypass FFmpeg entirely by pre-decoding your audio inputs to standard WAV format (PCM/Float/A-law/ฮผ-law) in your application pipeline, as the library processes WAV files using a 100% managed C# decoder.
  3. ONNX Runtime & Silero VAD (MIT License): Both dependencies are licensed under the permissive MIT license. Cryptographic integrity checking (SHA-256 validation) is enforced internally during Silero VAD downloads.

๐Ÿ“„ License

This project is licensed under the MIT License โ€” see the LICENSE file for details.

MIT License ยท Copyright (c) 2026 Qourex

<p align="center"> Built with โค๏ธ by Qourex <br /> <a href="https://github.com/qourex/fasterwhisper.net/issues">Report Bug</a> ยท <a href="https://github.com/qourex/fasterwhisper.net/issues">Request Feature</a> </p>

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 is compatible.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.0.6 78 6/30/2026
1.0.5 104 6/28/2026
1.0.4 100 6/26/2026
1.0.3 96 6/26/2026
1.0.2 117 6/25/2026
1.0.0 98 6/25/2026