Bitzsoft.Integrations.Ocr
1.0.0-alpha.7
dotnet add package Bitzsoft.Integrations.Ocr --version 1.0.0-alpha.7
NuGet\Install-Package Bitzsoft.Integrations.Ocr -Version 1.0.0-alpha.7
<PackageReference Include="Bitzsoft.Integrations.Ocr" Version="1.0.0-alpha.7" />
<PackageVersion Include="Bitzsoft.Integrations.Ocr" Version="1.0.0-alpha.7" />
<PackageReference Include="Bitzsoft.Integrations.Ocr" />
paket add Bitzsoft.Integrations.Ocr --version 1.0.0-alpha.7
#r "nuget: Bitzsoft.Integrations.Ocr, 1.0.0-alpha.7"
#:package Bitzsoft.Integrations.Ocr@1.0.0-alpha.7
#addin nuget:?package=Bitzsoft.Integrations.Ocr&version=1.0.0-alpha.7&prerelease
#tool nuget:?package=Bitzsoft.Integrations.Ocr&version=1.0.0-alpha.7&prerelease
Bitzsoft.Integrations.Ocr
OCR 文字识别集成 -- 基于 Tesseract + Emgu.CV 的图像文字识别,支持中英文多语种、验证码生成与识别。
功能特性
- 多数据源输入:文件路径、字节数组、Base64 字符串、URL
- 区域识别:指定图像矩形区域进行精准 OCR
- 多语种支持:英文(eng)、简体中文(chi_sim)、繁体中文(chi_tra)
- 图像预处理:基于 Emgu.CV 的灰度、二值化、去噪等预处理管线
- 验证码生成:可配置尺寸与长度的随机验证码图片
- 验证码识别:自动识别验证码图像中的文本
- 置信度过滤:低于
MinConfidence阈值的结果自动标记为失败 - 第三方请求日志:内置 RequestLogging DelegatingHandler,记录 URL 输入源的 HTTP 请求与响应,便于问题排查
- 识别结果包含区域坐标、独立置信度、处理耗时等详细信息
- Options 强类型配置,支持
IConfigurationSection绑定
安装
dotnet add package Bitzsoft.Integrations.Ocr
或直接在项目文件中引用:
<PackageReference Include="Bitzsoft.Integrations.Ocr" Version="*" />
配置
在 appsettings.json 中添加 OCR 服务配置:
{
"OCR": {
"Enabled": true,
"Language": "chi_sim+eng",
"TessDataPath": "./tessdata",
"EngineMode": "Default",
"PreprocessImage": true,
"SavePreprocessedImage": false,
"MinConfidence": 60,
"PageSegMode": "Auto"
}
}
注册服务
using Bitzsoft.Integrations.Ocr;
// 方式一:通过 IConfigurationSection 绑定
builder.Services.AddOCRService(builder.Configuration.GetSection("OCR"));
// 方式二:通过委托手动配置
builder.Services.AddOCRService(options =>
{
options.Enabled = true;
options.Language = "chi_sim+eng";
options.TessDataPath = "./tessdata";
options.PreprocessImage = true;
options.MinConfidence = 60;
});
第三方请求日志
内置 Bitzsoft.Integrations.RequestLogging 出站请求记录管道,默认使用 NullRequestLogStore 不持久化。
// ① 默认:启用记录管道但不持久化(日志丢弃)
services.AddOCRService(options => { /* ... */ });
// ② 持久化:宿主注册 IRequestLogStore 实现后,所有出站请求自动落库
services.AddRequestLogging<MyRequestLogStore>(opts =>
{
opts.MaxBodyLength = 8192;
opts.SensitiveFields.Add("mySecret");
});
services.AddOCRService(options => { /* ... */ });
使用示例
以下示例展示发票图像识别,提取关键字段并校验置信度:
using Bitzsoft.Integrations.Ocr;
/// <summary>
/// 发票识别服务
/// </summary>
public class InvoiceOcrService
{
private readonly IOCRService _ocr;
/// <summary>
/// 初始化发票识别服务实例
/// </summary>
/// <param name="ocr">OCR 文字识别服务(由 DI 注入)</param>
public InvoiceOcrService(IOCRService ocr)
{
_ocr = ocr;
}
/// <summary>
/// 识别发票图像并提取关键信息
/// </summary>
/// <param name="invoiceImagePath">发票图片文件路径</param>
/// <param name="cancellationToken">取消令牌</param>
/// <returns>识别到的发票文本内容及各区域置信度</returns>
public async Task<InvoiceRecognitionResult> RecognizeInvoiceAsync(
string invoiceImagePath,
CancellationToken cancellationToken = default)
{
// 对整张发票进行全量识别
OCRResult fullResult = await _ocr.RecognizeFromFileAsync(
invoiceImagePath,
cancellationToken);
if (!fullResult.Success)
{
return new InvoiceRecognitionResult
{
Success = false,
ErrorMessage = fullResult.ErrorMessage
};
}
// 过滤低置信度区域,仅保留可靠识别结果
var reliableRegions = fullResult.Regions
.Where(r => r.Confidence >= 80)
.ToList();
// 针对发票号码区域进行精准二次识别(假设发票号位于右上方区域)
OCRResult invoiceNoResult = await _ocr.RecognizeRegionAsync(
imagePath: invoiceImagePath,
x: 500,
y: 50,
width: 300,
height: 40);
return new InvoiceRecognitionResult
{
Success = true,
FullText = fullResult.Text,
MeanConfidence = fullResult.MeanConfidence,
ProcessingTimeMs = fullResult.ProcessingTimeMs,
ReliableRegions = reliableRegions,
InvoiceNumber = invoiceNoResult.Success ? invoiceNoResult.Text.Trim() : null
};
}
/// <summary>
/// 生成并识别验证码(用于测试或自动化场景)
/// </summary>
/// <returns>验证码文本与对应图像数据</returns>
public async Task<CaptchaResult> GenerateAndRecognizeCaptchaAsync()
{
// 生成 4 位验证码图片
byte[] captchaImage = _ocr.GenerateCaptcha(
width: 120,
height: 40,
length: 4);
// 识别生成的验证码
OCRResult result = await _ocr.RecognizeCaptchaAsync(captchaImage);
return new CaptchaResult
{
Text = result.Text,
Confidence = result.MeanConfidence,
ImageData = captchaImage
};
}
}
/// <summary>
/// 发票识别结果
/// </summary>
public class InvoiceRecognitionResult
{
/// <summary>是否识别成功</summary>
public bool Success { get; set; }
/// <summary>错误消息</summary>
public string? ErrorMessage { get; set; }
/// <summary>完整识别文本</summary>
public string FullText { get; set; } = string.Empty;
/// <summary>平均置信度(0-100)</summary>
public float MeanConfidence { get; set; }
/// <summary>处理耗时(毫秒)</summary>
public long ProcessingTimeMs { get; set; }
/// <summary>高置信度区域列表</summary>
public List<TextRegion> ReliableRegions { get; set; } = new();
/// <summary>发票号码</summary>
public string? InvoiceNumber { get; set; }
}
/// <summary>
/// 验证码结果
/// </summary>
public class CaptchaResult
{
/// <summary>验证码文本</summary>
public string Text { get; set; } = string.Empty;
/// <summary>识别置信度</summary>
public float Confidence { get; set; }
/// <summary>验证码图像数据</summary>
public byte[] ImageData { get; set; } = Array.Empty<byte>();
}
Tesseract 语言数据(tessdata)
Tesseract 需要对应语言的 .traineddata 文件才能执行 OCR。项目源码中 tessdata/ 目录仅包含说明文件,不随 NuGet 包分发实际的语言数据文件(单文件 15–56 MB,体积过大)。
下载地址
| 数据集 | 精度 | 下载地址 | 适用场景 |
|---|---|---|---|
tessdata |
标准 | https://github.com/tesseract-ocr/tessdata | 生产环境(推荐) |
tessdata_best |
最高 | https://github.com/tesseract-ocr/tessdata_best | 对精度要求极高的场景 |
tessdata_fast |
快速 | https://github.com/tesseract-ocr/tessdata_fast | 对速度要求极高的场景 |
常用语言文件
| 语言代码 | 语言 | 文件名 | 大小(约) |
|---|---|---|---|
eng |
英语 | eng.traineddata |
22 MB |
chi_sim |
简体中文 | chi_sim.traineddata |
42 MB |
chi_tra |
繁体中文 | chi_tra.traineddata |
56 MB |
jpn |
日语 | jpn.traineddata |
34 MB |
kor |
韩语 | kor.traineddata |
15 MB |
完整语言列表请查阅上述 GitHub 仓库。
安装步骤
- 从对应仓库下载所需语言的
.traineddata文件 - 将文件放入应用程序运行目录下的
tessdata/文件夹 - 在配置中指定语言代码(多语言用
+连接):
{
"OCR": {
"Language": "chi_sim+eng"
}
}
注意:
TessDataPath配置项默认为./tessdata,需确保该路径相对于应用工作目录正确。在 ASP.NET Core 项目中,通常将tessdata/放在项目根目录并设置"复制到输出目录"为"如果较新则复制"。
相关包
- Bitzsoft.Integrations.Sms -- 短信双通道集成
- Bitzsoft.Integrations.CrawlingService -- 网页爬取服务集成客户端
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net5.0 is compatible. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- Bitzsoft.Integrations.RequestLogging (>= 1.0.0-alpha.7)
- Emgu.CV (>= 4.10.0.5680)
- Emgu.CV.runtime.windows (>= 4.10.0.5680)
- Microsoft.Extensions.Configuration.Abstractions (>= 10.0.9)
- Microsoft.Extensions.Configuration.Binder (>= 10.0.9)
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 10.0.9)
- Microsoft.Extensions.Http (>= 10.0.9)
- Microsoft.Extensions.Logging.Abstractions (>= 10.0.9)
- Microsoft.Extensions.Options (>= 10.0.9)
- Tesseract (>= 5.2.0)
-
net5.0
- Bitzsoft.Integrations.RequestLogging (>= 1.0.0-alpha.7)
- Emgu.CV (>= 4.10.0.5680)
- Emgu.CV.runtime.windows (>= 4.10.0.5680)
- Microsoft.Bcl.AsyncInterfaces (>= 5.0.0)
- Microsoft.Extensions.Configuration.Abstractions (>= 5.0.0)
- Microsoft.Extensions.Configuration.Binder (>= 5.0.0)
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 5.0.0)
- Microsoft.Extensions.Http (>= 5.0.0)
- Microsoft.Extensions.Logging.Abstractions (>= 5.0.0)
- Microsoft.Extensions.Options (>= 5.0.0)
- System.IO.Pipelines (>= 5.0.1)
- Tesseract (>= 5.2.0)
-
net8.0
- Bitzsoft.Integrations.RequestLogging (>= 1.0.0-alpha.7)
- Emgu.CV (>= 4.10.0.5680)
- Emgu.CV.runtime.windows (>= 4.10.0.5680)
- Microsoft.Extensions.Configuration.Abstractions (>= 10.0.9)
- Microsoft.Extensions.Configuration.Binder (>= 10.0.9)
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 10.0.9)
- Microsoft.Extensions.Http (>= 10.0.9)
- Microsoft.Extensions.Logging.Abstractions (>= 10.0.9)
- Microsoft.Extensions.Options (>= 10.0.9)
- Tesseract (>= 5.2.0)
NuGet packages (1)
Showing the top 1 NuGet packages that depend on Bitzsoft.Integrations.Ocr:
| Package | Downloads |
|---|---|
|
Bitzsoft.Integrations.All
Bitzsoft 第三方集成聚合包 — 包含全部 Integration 模块 |
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 1.0.0-alpha.7 | 51 | 6/16/2026 |
| 1.0.0-alpha.6 | 56 | 6/16/2026 |
| 1.0.0-alpha.5 | 54 | 6/14/2026 |
| 1.0.0-alpha.3 | 61 | 6/7/2026 |
| 1.0.0-alpha.2 | 58 | 5/29/2026 |
| 1.0.0-alpha.1 | 58 | 5/28/2026 |