FasterWhisper.NET.Gpu
1.0.6
dotnet add package FasterWhisper.NET.Gpu --version 1.0.6
NuGet\Install-Package FasterWhisper.NET.Gpu -Version 1.0.6
<PackageReference Include="FasterWhisper.NET.Gpu" Version="1.0.6" />
<PackageVersion Include="FasterWhisper.NET.Gpu" Version="1.0.6" />
<PackageReference Include="FasterWhisper.NET.Gpu" />
paket add FasterWhisper.NET.Gpu --version 1.0.6
#r "nuget: FasterWhisper.NET.Gpu, 1.0.6"
#:package FasterWhisper.NET.Gpu@1.0.6
#addin nuget:?package=FasterWhisper.NET.Gpu&version=1.0.6
#tool nuget:?package=FasterWhisper.NET.Gpu&version=1.0.6

FasterWhisper.NET.Gpu
by Qourex โ Bringing high-performance GPU-accelerated speech recognition to .NET
๐ Read the Documentation for detailed guides, .NET 10.0 samples, and mobile deployment walkthroughs.
FasterWhisper.NET.Gpu is the GPU-accelerated release of the C# port of the popular Python library faster-whisper. It bundles pre-compiled native binaries built with CUDA and cuDNN enabled for CTranslate2, delivering blazing-fast transcription times on NVIDIA GPUs.
For CPU-only execution without CUDA prerequisites, please use the base FasterWhisper.NET package.
โก Key GPU Advantages
- ๐ฎ CUDA & cuDNN Acceleration โ Native GPU-bound inference for Whisper models.
- ๐ Flash Attention Support โ Substantial throughput improvements on Ampere (RTX 30-series) and newer architectures.
- ๐ Mixed Precision Compute โ Full support for
"float16"and"int8_float16"compute types to minimize GPU memory (VRAM) footprint. - ๐ Parallel Mel Extraction โ Managed multi-threaded audio pipeline maximizing core usage before GPU scheduling.
๐ฆ Installation
To install the GPU-enabled package:
dotnet add package FasterWhisper.NET.Gpu
๐ CUDA Prerequisites
To run this package with GPU acceleration (device: "cuda"), you must have the following NVIDIA runtimes installed and configured on your host system:
Windows
- NVIDIA CUDA Toolkit 12.x (Compiled with 12.8) โ CUDA Downloads
- NVIDIA cuDNN 8.9.x โ cuDNN Downloads Archive
Ensure that the following DLLs from these installations are available in your system PATH:
cudart64_12.dll(or other CUDA 12 runtime versions)cublas64_12.dllcublasLt64_12.dllcudnn64_8.dll(specifically cuDNN v8)
Linux / WSL2
- NVIDIA CUDA Toolkit 12.x (Compiled with 12.8) โ WSL/Linux CUDA Downloads
- NVIDIA cuDNN 8.9.x โ cuDNN Downloads Archive
Ensure that the following shared libraries from these installations are available in your LD_LIBRARY_PATH or system library paths (e.g. /usr/local/cuda/lib64):
libcudart.so.12libcublas.so.12libcublasLt.so.12libcudnn.so.8(specifically cuDNN v8)
๐ณ Docker Compilation (For Linux GPU Binaries)
For Linux and WSL2 environments, you can compile the CUDA native libraries natively without installing compilers on your host system by using a Docker container.
Run the following command from the root of the repository:
docker run --rm --gpus all -v "$(pwd)":/workspace -w /workspace nvcr.io/nvidia/cuda:12.8.0-devel-ubuntu22.04 bash -c "
apt-get update && \
apt-get install -y ca-certificates gpg wget && \
wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | gpg --dearmor - | tee /usr/share/keyrings/kitware-archive-keyring.gpg >/dev/null && \
echo 'deb [signed-by=/usr/share/keyrings/kitware-archive-keyring.gpg] https://apt.kitware.com/ubuntu/ jammy main' | tee /etc/apt/sources.list.d/kitware.list >/dev/null && \
apt-get update && \
apt-get install -y cmake build-essential libopenblas-dev ninja-build libcudnn8-dev git && \
./build.sh --gpu-only
"
Note: If your local Docker setup does not have the NVIDIA Container Toolkit configured, you can omit the --gpus all flag, as a physical GPU is not required during the compilation step.
This command compiles the wrapper and automatically stages the output qourex_fasterwhisper_native.so and libctranslate2.so files under the C# GPU project runtimes directory (src/Qourex.FasterWhisper.NET.Gpu/runtimes/linux-x64/native/).
๐ป Quick Start
Concurrency & Thread-Safety: WhisperModel is thread-safe and supports concurrent transcription calls. Under the hood, concurrent calls are queued and processed safely using a SemaphoreSlim. If you configure the model with NumReplicas > 1, transcription calls will execute concurrently utilizing CTranslate2's native thread-safe replica pool, sharing the same loaded model weights in memory to minimize VRAM overhead.
using Qourex.FasterWhisper.NET;
// 1. Download and load the model on CUDA (cached to ~/.cache/qourex-fasterwhisper)
using var model = await WhisperModel.LoadAsync(
modelNameOrPath: "large-v3",
device: "cuda", // Use GPU
computeType: "float16", // Half-precision for optimal GPU performance
flashAttention: true // Enable Flash Attention (requires compute capability >= 8.0)
);
// 2. Configure transcription options
var options = new WhisperOptions
{
BeamSize = 5,
WordTimestamps = true
};
// 3. Transcribe
var segments = model.Transcribe(
mediaPath: "audio.wav",
language: "en",
options: options
);
// 4. Print timing and text
foreach (var segment in segments)
{
Console.WriteLine($"[{segment.Start:F2}s -> {segment.End:F2}s] {segment.Text}");
}
๐ง GPU Configuration Options
Compute Types
Choose the optimal precision for your GPU memory and compute capabilities:
| Compute Type | Description |
|---|---|
"default" |
Selects float16 if supported by the GPU, else fallback |
"float16" |
Recommended. Fast FP16 execution, lowest VRAM utilization |
"float32" |
Standard 32-bit floating point precision |
"int8_float16" |
INT8 quantized calculations with FP16 storage |
Flash Attention
Enable Flash Attention for compatible GPUs:
flashAttention: true
Note: Flash Attention requires an NVIDIA GPU with compute capability โฅ 8.0 (Ampere architecture or newer, e.g. RTX 30-series, 40-series, A100, H100).
๐ License
This package is licensed under the MIT License โ see the LICENSE file for details.
MIT License ยท Copyright (c) 2026 Qourex
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows is compatible. net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows is compatible. net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows is compatible. |
-
net10.0
- Microsoft.Extensions.Logging.Abstractions (>= 8.0.1)
- Microsoft.ML.OnnxRuntime (>= 1.20.1)
-
net10.0-windows
- Microsoft.Extensions.Logging.Abstractions (>= 8.0.1)
- Microsoft.ML.OnnxRuntime (>= 1.20.1)
-
net8.0
- Microsoft.Extensions.Logging.Abstractions (>= 8.0.1)
- Microsoft.ML.OnnxRuntime (>= 1.20.1)
-
net8.0-windows
- Microsoft.Extensions.Logging.Abstractions (>= 8.0.1)
- Microsoft.ML.OnnxRuntime (>= 1.20.1)
-
net9.0
- Microsoft.Extensions.Logging.Abstractions (>= 8.0.1)
- Microsoft.ML.OnnxRuntime (>= 1.20.1)
-
net9.0-windows
- Microsoft.Extensions.Logging.Abstractions (>= 8.0.1)
- Microsoft.ML.OnnxRuntime (>= 1.20.1)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.