Manyeyes.Speech.AliParaformerAsr 1.0.1

dotnet add package Manyeyes.Speech.AliParaformerAsr --version 1.0.1
NuGet\Install-Package Manyeyes.Speech.AliParaformerAsr -Version 1.0.1
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Manyeyes.Speech.AliParaformerAsr" Version="1.0.1" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Manyeyes.Speech.AliParaformerAsr --version 1.0.1
#r "nuget: Manyeyes.Speech.AliParaformerAsr, 1.0.1"
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install Manyeyes.Speech.AliParaformerAsr as a Cake Addin
#addin nuget:?package=Manyeyes.Speech.AliParaformerAsr&version=1.0.1

// Install Manyeyes.Speech.AliParaformerAsr as a Cake Tool
#tool nuget:?package=Manyeyes.Speech.AliParaformerAsr&version=1.0.1

AliParaformerAsr

简介:

项目中使用的Asr模型是阿里巴巴达摩院提供的Paraformer-large ASR模型。 项目基于Net 6.0,使用C#编写,调用Microsoft.ML.OnnxRuntime对onnx模型进行解码,支持跨平台编译。项目以库的形式进行调用,部署非常方便。 ASR整体流程的rtf在0.03左右。

用途:

Paraformer是达摩院语音团队提出的一种高效的非自回归端到端语音识别框架。本项目为Paraformer中文通用语音识别模型,采用工业级数万小时的标注音频进行模型训练,保证了模型的通用识别效果。模型可以被应用于语音输入法、语音导航、智能会议纪要等场景。

Paraformer模型结构:

alternate text is missing from this package README image

Paraformer模型结构如上图所示,由 Encoder、Predictor、Sampler、Decoder 与 Loss function 五部分组成。Encoder可以采用不同的网络结构,例如self-attention,conformer,SAN-M等。Predictor 为两层FFN,预测目标文字个数以及抽取目标文字对应的声学向量。Sampler 为无可学习参数模块,依据输入的声学向量和目标向量,生产含有语义的特征向量。Decoder 结构与自回归模型类似,为双向建模(自回归为单向建模)。Loss function 部分,除了交叉熵(CE)与 MWER 区分性优化目标,还包括了 Predictor 优化目标 MAE。

其核心点主要有:

Predictor 模块:基于 Continuous integrate-and-fire (CIF) 的 预测器 (Predictor) 来抽取目标文字对应的声学特征向量,可以更加准确的预测语音中目标文字个数。 Sampler:通过采样,将声学特征向量与目标文字向量变换成含有语义信息的特征向量,配合双向的 Decoder 来增强模型对于上下文的建模能力。 基于负样本采样的 MWER 训练准则。 更详细的细节见:

论文: Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition

论文解读:Paraformer: 高识别率、高计算效率的单轮非自回归端到端语音识别模型

ASR常用参数(参考:asr.yaml文件):

用于解码的asr.yaml配置参数。

paraformer-large offline onnx模型下载

https://huggingface.co/manyeyes/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx

Offline(非流式)模型调用方法:

1.添加项目引用

using AliParaformerAsr;

2.模型初始化和配置
string applicationBase = AppDomain.CurrentDomain.BaseDirectory;
string modelName = "speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx";
string modelFilePath = applicationBase + "./"+ modelName + "/model_quant.onnx";
string configFilePath = applicationBase + "./" + modelName + "/asr.yaml";
string mvnFilePath = applicationBase + "./" + modelName + "/am.mvn";
string tokensFilePath = applicationBase + "./" + modelName + "/tokens.txt";
AliParaformerAsr.OfflineRecognizer offlineRecognizer = new OfflineRecognizer(modelFilePath, configFilePath, mvnFilePath, tokensFilePath);
3.调用
List<float[]> samples = new List<float[]>();
AudioFileReader audioFileReader = new AudioFileReader(wavFilePath);
byte[] datas = new byte[_audioFileReader.Length];
audioFileReader.Read(datas, 0, datas.Length);
TimeSpan duration = audioFileReader.TotalTime;
float[] wavdata = new float[datas.Length / 4];
Buffer.BlockCopy(datas, 0, wavdata, 0, datas.Length);
wavdata = wavdata.Select((float x) => x * 32768f).ToArray();
samples.Add(sample);
List<string> results_batch = offlineRecognizer.GetResults(samples);
4.输出结果:

正是因为存在绝对正义所以我们接受现实的相对正义但是不要因为现实的相对正义我们就认为这个世界没有正义因为如果当你认为这个世界没有正义

非常的方便但是现在不同啊英国脱欧欧盟内部完善的产业链的红利人

he must be home now for the light is on他一定在家因为灯亮着就是有一种推理或者解释的那种感觉

after early nightfall the yellow lamps would light up here in there the squalid quarter of the broffles

elapsed_milliseconds:1502.8828125
total_duration:40525.6875
rtf:0.037084696280599808
end!

Online(流式)模型调用方法:

处理长音频,推荐结合Manyeyes.Speech.AliFsmnVad一起使用 *

Product Compatible and additional computed target framework versions.
.NET net5.0 is compatible.  net5.0-windows was computed.  net6.0 is compatible.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 is compatible.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
.NET Core netcoreapp3.1 is compatible. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
1.0.1 199 11/22/2023
1.0.0 101 9/13/2023