ManySpeech.WhisperAsr
1.0.1
dotnet add package ManySpeech.WhisperAsr --version 1.0.1
NuGet\Install-Package ManySpeech.WhisperAsr -Version 1.0.1
<PackageReference Include="ManySpeech.WhisperAsr" Version="1.0.1" />
<PackageVersion Include="ManySpeech.WhisperAsr" Version="1.0.1" />
<PackageReference Include="ManySpeech.WhisperAsr" />
paket add ManySpeech.WhisperAsr --version 1.0.1
#r "nuget: ManySpeech.WhisperAsr, 1.0.1"
#:package ManySpeech.WhisperAsr@1.0.1
#addin nuget:?package=ManySpeech.WhisperAsr&version=1.0.1
#tool nuget:?package=ManySpeech.WhisperAsr&version=1.0.1
(�������� | English )
ManySpeech.WhisperAsr User Guide
I. Introduction
ManySpeech.WhisperAsr is a specialized speech recognition component in the ManySpeech speech processing suite. It supports models such as whisper, distil-whisper, and whisper-turbo. Under the hood, it uses Microsoft.ML.OnnxRuntime for decoding ONNX models, offering several advantages:
- Multi-environment support: Compatible with net461+, net60+, netcoreapp3.1, and netstandard2.0+, adapting to various development scenarios.
- Cross-platform compilation: Supports cross-platform compilation for systems like Windows, macOS, Linux, and Android, expanding application scope.
- AOT compilation support: Easy to use, facilitating quick integration into projects.
II. Installation Methods
It is recommended to install via the NuGet package manager. Here are two specific installation approaches:
(I) Using Package Manager Console
Execute the following command in Visual Studio's "Package Manager Console":
Install-Package ManySpeech.WhisperAsr
(II) Using .NET CLI
Enter the following command in the command line to install:
dotnet add package ManySpeech.WhisperAsr
(III) Manual Installation
Search for "ManySpeech.WhisperAsr" in the NuGet Package Manager interface and click "Install".
III. Configuration Instructions (Reference: conf.json File)
Most parameters in the conf.json configuration file for decoding do not need modification, but specific parameters can be adjusted:
"task": "transcribe": When using a Whisper multilingual model (e.g., whisper-tiny-onnx), settingtasktotranscribeenables transcription only (no translation); setting it totranslateenables automatic translation to the specified language; if left empty,transcribeis used by default.language: zh: When using a Whisper multilingual model (e.g., whisper-tiny-onnx), you can specify the language type. If not specified, the language will be automatically recognized.without_timestamps: false: Whenfalse, recognition results include timestamps; whentrue, timestamps are excluded.
IV. Code Calling Methods
(I) Offline (Non-streaming) Model Calling
- Add Project References Add the following references in your code:
using ManySpeech.WhisperAsr;
using ManySpeech.WhisperAsr.Model;
- Model Initialization and Configuration
- Paraformer model initialization method:
string applicationBase = AppDomain.CurrentDomain.BaseDirectory;
string modelName = "whisper-tiny-onnx";
string encoderFilePath = applicationBase + "./" + modelName + "/encoder.int8.onnx";
string decoderFilePath = applicationBase + "./" + modelName + "/decoder.int8.onnx";
string configFilePath = applicationBase + "./" + modelName + "/conf.json";
OfflineRecognizer offlineRecognizer = new OfflineRecognizer(encoderFilePath: encoderFilePath, decoderFilePath: decoderFilePath, configFilePath: configFilePath, threadsNum: 1);
- Calling Process
List<float[]> samples = new List<float[]>();
// Code for converting WAV files to samples is omitted here. For details, refer to the ManySpeech.WhisperAsr.Examples sample code.
List<OfflineStream> streams = new List<OfflineStream>();
foreach (var sample in samples)
{
OfflineStream stream = offlineRecognizer.CreateOfflineStream();
stream.AddSamples(sample);
streams.Add(stream);
}
List<OfflineRecognizerResultEntity> results = offlineRecognizer.GetResults(streams);
foreach (OfflineRecognizerResultEntity result in results_batch)
{
Console.WriteLine(result.Text);
}
V. Related Projects
- Voice Activity Detection: To solve the problem of reasonable segmentation of long audio, you can add the ManySpeech.AliFsmnVad library. Install it using the following command:
dotnet add package ManySpeech.AliFsmnVad
- Text Punctuation Prediction: For recognition results lacking punctuation, add the ManySpeech.AliCTTransformerPunc library. Install it with:
dotnet add package ManySpeech.AliCTTransformerPunc
For specific calling examples, refer to the official documentation of the corresponding library or the ManySpeech.WhisperAsr.Examples project. This project is a console/desktop sample project that demonstrates basic speech recognition functions such as offline transcription and real-time recognition.
VI. Other Instructions
- Test Cases: Use
ManySpeech.WhisperAsr.Examplesas test cases. - Test CPU: The test CPU used is Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz (2.59 GHz).
- Supported Platforms:
- Windows: Windows 7 SP1 and later versions.
- macOS: macOS 10.13 (High Sierra) and later versions, including iOS.
- Linux: Compatible with Linux distributions, but specific dependencies must be met (see the list of Linux distributions supported by .NET 6).
- Android: Android 5.0 (API 21) and later versions.
VII. Model Downloads (Supported ONNX Models)
The following is information about ONNX models supported by ManySpeech.WhisperAsr, including model names, types, supported languages, punctuation support, timestamp support, and download links. Choose the appropriate model based on your needs:
| Model Name | Type | Supported Languages | Punctuation | Timestamp | Download Link |
|---|---|---|---|---|---|
| whisper-tiny-onnx | Non-streaming | Multilingual | Yes | No | modelscope |
| whisper-tiny-en-onnx | Non-streaming | English | Yes | No | modelscope |
| whisper-base-onnx | Non-streaming | Multilingual | Yes | No | modelscope |
| whisper-base-en-onnx | Non-streaming | English | Yes | No | modelscope |
| whisper-small-onnx | Non-streaming | Multilingual | Yes | No | modelscope |
| whisper-small-en-onnx | Non-streaming | English | Yes | No | modelscope |
| whisper-small-cantonese-onnx | Non-streaming | Cantonese, Chinese, English | Yes | No | modelscope |
| whisper-medium-onnx | Non-streaming | Multilingual | Yes | No | modelscope |
| whisper-medium-en-onnx | Non-streaming | English | Yes | No | modelscope |
| whisper-large-v1-onnx | Non-streaming | Multilingual | Yes | No | modelscope |
| whisper-large-v2-onnx | Non-streaming | Multilingual | Yes | No | modelscope |
| whisper-large-v3-onnx | Non-streaming | Multilingual | Yes | No | modelscope |
| whisper-large-v3-turbo-onnx | Non-streaming | Multilingual | Yes | No | modelscope |
| whisper-large-v3-turbo-zh-onnx | Non-streaming | Chinese, English, etc. | Yes | No | modelscope |
| distil-whisper-small-en-onnx | Non-streaming | English | Yes | No | modelscope |
| distil-whisper-medium-en-onnx | Non-streaming | English | Yes | No | modelscope |
| distil-whisper-large-v2-en-onnx | Non-streaming | English | Yes | No | modelscope |
| distil-whisper-large-v3-en-onnx | Non-streaming | English | Yes | No | modelscope |
| distil-whipser-large-v3.5-en-onnx | Non-streaming | English | Yes | No | modelscope |
| distil-whisper-large-v2-multi-hans-onnx | Non-streaming | Chinese | Yes | No | modelscope |
| distil-whisper-small-cantonese-onnx-alvanlii-20240404 | Non-streaming | Cantonese, Chinese, English | Yes | No | modelscope |
Reference
[1] https://github.com/openai/whisper
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net5.0 was computed. net5.0-windows was computed. net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 is compatible. net9.0-android was computed. net9.0-android35.0 is compatible. net9.0-browser was computed. net9.0-ios was computed. net9.0-ios18.0 is compatible. net9.0-maccatalyst was computed. net9.0-maccatalyst18.0 is compatible. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net9.0-windows10.0.19041 is compatible. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
| .NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 is compatible. |
| .NET Standard | netstandard2.0 is compatible. netstandard2.1 is compatible. |
| .NET Framework | net461 is compatible. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 is compatible. net48 is compatible. net481 was computed. |
| MonoAndroid | monoandroid was computed. |
| MonoMac | monomac was computed. |
| MonoTouch | monotouch was computed. |
| Tizen | tizen40 was computed. tizen60 was computed. |
| Xamarin.iOS | xamarinios was computed. |
| Xamarin.Mac | xamarinmac was computed. |
| Xamarin.TVOS | xamarintvos was computed. |
| Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETCoreApp 3.1
- ManySpeech.Tiktoken (>= 1.0.0)
- Microsoft.ML.OnnxRuntime (>= 1.22.1)
- SharpZipLib (>= 1.4.2)
- System.Text.Json (>= 9.0.8)
-
.NETFramework 4.6.1
- ManySpeech.Tiktoken (>= 1.0.0)
- Microsoft.ML.OnnxRuntime (>= 1.22.1)
- SharpZipLib (>= 1.4.2)
- System.Numerics.Vectors (>= 4.6.1)
- System.Text.Json (>= 9.0.8)
-
.NETFramework 4.7.2
- ManySpeech.Tiktoken (>= 1.0.0)
- Microsoft.ML.OnnxRuntime (>= 1.22.1)
- SharpZipLib (>= 1.4.2)
- System.Numerics.Vectors (>= 4.6.1)
- System.Text.Json (>= 9.0.8)
-
.NETFramework 4.8
- ManySpeech.Tiktoken (>= 1.0.0)
- Microsoft.ML.OnnxRuntime (>= 1.22.1)
- SharpZipLib (>= 1.4.2)
- System.Numerics.Vectors (>= 4.6.1)
- System.Text.Json (>= 9.0.8)
-
.NETStandard 2.0
- ManySpeech.Tiktoken (>= 1.0.0)
- Microsoft.ML.OnnxRuntime (>= 1.22.1)
- SharpZipLib (>= 1.4.2)
- System.Text.Json (>= 9.0.8)
-
.NETStandard 2.1
- ManySpeech.Tiktoken (>= 1.0.0)
- Microsoft.ML.OnnxRuntime (>= 1.22.1)
- SharpZipLib (>= 1.4.2)
- System.Text.Json (>= 9.0.8)
-
net6.0
- ManySpeech.Tiktoken (>= 1.0.0)
- Microsoft.ML.OnnxRuntime (>= 1.22.1)
-
net8.0
- ManySpeech.Tiktoken (>= 1.0.0)
- Microsoft.ML.OnnxRuntime (>= 1.22.1)
-
net9.0
- ManySpeech.Tiktoken (>= 1.0.0)
- Microsoft.ML.OnnxRuntime (>= 1.22.1)
-
net9.0-android35.0
- ManySpeech.Tiktoken (>= 1.0.0)
- Microsoft.ML.OnnxRuntime (>= 1.22.1)
-
net9.0-ios18.0
- ManySpeech.Tiktoken (>= 1.0.0)
- Microsoft.ML.OnnxRuntime (>= 1.22.1)
-
net9.0-maccatalyst18.0
- ManySpeech.Tiktoken (>= 1.0.0)
- Microsoft.ML.OnnxRuntime (>= 1.22.1)
-
net9.0-windows10.0.19041
- ManySpeech.Tiktoken (>= 1.0.0)
- Microsoft.ML.OnnxRuntime (>= 1.22.1)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.