HtmlPDFContrastImage.Window
1.0.0
dotnet add package HtmlPDFContrastImage.Window --version 1.0.0
NuGet\Install-Package HtmlPDFContrastImage.Window -Version 1.0.0
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="HtmlPDFContrastImage.Window" Version="1.0.0" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="HtmlPDFContrastImage.Window" Version="1.0.0" />
<PackageReference Include="HtmlPDFContrastImage.Window" />
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add HtmlPDFContrastImage.Window --version 1.0.0
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
#r "nuget: HtmlPDFContrastImage.Window, 1.0.0"
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package HtmlPDFContrastImage.Window@1.0.0
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=HtmlPDFContrastImage.Window&version=1.0.0
#tool nuget:?package=HtmlPDFContrastImage.Window&version=1.0.0
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
HtmlPDFContrastImage.Window
HTML和PDF文档图片对比工具 - Windows平台NuGet包
📦 功能特性
✅ 多种输入方式 - 支持文件路径、URL和byte[]输入
✅ 图片过滤 - 支持通过byte[]过滤指定图片
✅ 灵活配置 - 所有对比参数都可配置
✅ 高性能匹配 - 使用感知哈希和匈牙利算法
✅ 异步API - 完全异步,支持高并发
🚀 快速开始
安装
dotnet add package HtmlPDFContrastImage.Window
基本使用
1. 文件路径方式
using HtmlPDFContrastImage.Window;
// 创建对比器
await using var comparer = new HtmlPdfComparer();
// 执行对比
var result = await comparer.CompareAsync(
htmlSource: @"C:\path\to\file.html",
pdfSource: @"C:\path\to\file.pdf"
);
// 输出结果
Console.WriteLine($"匹配成功: {result.MatchedCount}/{result.HtmlImageCount}");
Console.WriteLine($"HTML匹配率: {result.MatchRateByHtml:P2}");
Console.WriteLine($"PDF匹配率: {result.MatchRateByPdf:P2}");
2. URL方式
await using var comparer = new HtmlPdfComparer();
var result = await comparer.CompareAsync(
htmlSource: "https://example.com/document.html",
pdfSource: "https://example.com/document.pdf"
);
3. byte[]方式
byte[] htmlBytes = File.ReadAllBytes("file.html");
byte[] pdfBytes = File.ReadAllBytes("file.pdf");
await using var comparer = new HtmlPdfComparer();
var result = await comparer.CompareAsync(
htmlBytes: htmlBytes,
pdfBytes: pdfBytes
);
4. 混合方式
// HTML从URL获取,PDF从本地文件
await using var comparer = new HtmlPdfComparer();
var result = await comparer.CompareAsync(
htmlSource: "https://example.com/document.html",
pdfSource: @"C:\local\file.pdf"
);
⚙️ 高级配置
自定义配置选项
using System.Collections.Immutable;
var options = new CompareOptions
{
// 排除特定名称的图片
ExcludeImageNames = ImmutableList.Create("logo.png", "header.jpg"),
// 排除特定路径的图片
ExcludeImagePaths = ImmutableList.Create(@"C:\temp\watermark.png"),
// 哈希相似度阈值 (0.0-1.0)
HashThreshold = 0.95,
// 图片相似度阈值 (0.0-1.0)
SimilarityThreshold = 0.85,
// 匹配算法: Greedy(贪心) 或 Hungarian(匈牙利)
MatchAlgorithm = MatchAlgorithm.Hungarian,
// 相似度计算方法: PerceptualHash(感知哈希) | Histogram(直方图) | SSIM(结构相似性)
SimilarityMethod = SimilarityMethod.PerceptualHash
};
await using var comparer = new HtmlPdfComparer(options);
图片过滤功能
// 准备要过滤的图片(不参与对比)
var filterImages = new List<byte[]>
{
File.ReadAllBytes("logo.png"),
File.ReadAllBytes("watermark.png")
};
await using var comparer = new HtmlPdfComparer();
var result = await comparer.CompareAsync(
htmlSource: "document.html",
pdfSource: "document.pdf",
filterImageBytes: filterImages // 这些图片会被排除在对比之外
);
自定义HttpClient
// 配置自定义的HttpClient(用于URL下载)
var httpClient = new HttpClient
{
Timeout = TimeSpan.FromMinutes(5)
};
httpClient.DefaultRequestHeaders.Add("User-Agent", "MyApp/1.0");
await using var comparer = new HtmlPdfComparer(
options: null,
httpClient: httpClient
);
并发控制
// 设置最大并发数为4
await using var comparer = new HtmlPdfComparer(
options: null,
httpClient: null,
maxConcurrency: 4
);
// 批量对比
var tasks = Enumerable.Range(1, 10).Select(async i =>
{
return await comparer.CompareAsync(
htmlSource: $"file{i}.html",
pdfSource: $"file{i}.pdf"
);
});
var results = await Task.WhenAll(tasks);
📊 对比结果
CompareResult 属性
public class CompareResult
{
public bool Success { get; init; } // 是否成功
public int HtmlImageCount { get; init; } // HTML图片总数
public int PdfImageCount { get; init; } // PDF图片总数
public int MatchedCount { get; init; } // 匹配成功数
public int UnmatchedHtmlCount { get; init; } // HTML未匹配数
public int UnmatchedPdfCount { get; init; } // PDF未匹配数
public double MatchRateByHtml { get; init; } // HTML匹配率 (0.0-1.0)
public double MatchRateByPdf { get; init; } // PDF匹配率 (0.0-1.0)
public TimeSpan ElapsedTime { get; init; } // 处理耗时
public List<string> Errors { get; init; } // 错误信息
public List<ImagePairInfo> MatchedPairs { get; init; } // 匹配的图片对
}
使用结果数据
var result = await comparer.CompareAsync(...);
if (result.Success)
{
Console.WriteLine($"✅ 对比成功");
Console.WriteLine($"📊 HTML: {result.HtmlImageCount} 张图片");
Console.WriteLine($"📊 PDF: {result.PdfImageCount} 张图片");
Console.WriteLine($"✓ 成功匹配: {result.MatchedCount} 对");
Console.WriteLine($"✗ HTML未匹配: {result.UnmatchedHtmlCount} 张");
Console.WriteLine($"✗ PDF未匹配: {result.UnmatchedPdfCount} 张");
Console.WriteLine($"📈 HTML匹配率: {result.MatchRateByHtml:P2}");
Console.WriteLine($"📈 PDF匹配率: {result.MatchRateByPdf:P2}");
Console.WriteLine($"⏱ 耗时: {result.ElapsedTime.TotalSeconds:F2}秒");
// 查看详细匹配信息
foreach (var pair in result.MatchedPairs)
{
Console.WriteLine($" HTML[{pair.HtmlIndex}] ↔ PDF[{pair.PdfIndex}] 相似度: {pair.Similarity:P2}");
}
}
else
{
Console.WriteLine($"❌ 对比失败");
foreach (var error in result.Errors)
{
Console.WriteLine($" - {error}");
}
}
🎯 完整示例
using HtmlPDFContrastImage.Window;
using System.Collections.Immutable;
// 配置选项
var options = new CompareOptions
{
ExcludeImageNames = ImmutableList.Create("logo.png"),
HashThreshold = 0.95,
SimilarityThreshold = 0.85,
MatchAlgorithm = MatchAlgorithm.Hungarian,
SimilarityMethod = SimilarityMethod.PerceptualHash
};
// 准备过滤图片
var filterImages = new List<byte[]>
{
await File.ReadAllBytesAsync("watermark.png")
};
// 创建对比器
await using var comparer = new HtmlPdfComparer(options, maxConcurrency: 2);
// 执行对比(支持混合输入方式)
var result = await comparer.CompareAsync(
htmlSource: "https://example.com/doc.html", // 从URL
pdfBytes: await File.ReadAllBytesAsync("local.pdf"), // 从byte[]
filterImageBytes: filterImages
);
// 输出结果
if (result.Success)
{
Console.WriteLine($"✅ 成功匹配 {result.MatchedCount}/{result.HtmlImageCount} 张图片");
Console.WriteLine($"📈 匹配率: {result.MatchRateByHtml:P2}");
}
📋 系统要求
- .NET 8.0 或更高版本
- Windows 操作系统
- 推荐内存: 4GB+
🔧 依赖包
- HtmlAgilityPack - HTML解析
- OpenCvSharp4 - 图像处理
- SixLabors.ImageSharp - 图像操作
📝 注意事项
- 资源管理: 使用
await using或using确保正确释放资源 - 并发控制: 合理设置
maxConcurrency避免内存溢出 - 超时处理: URL下载时注意网络超时,可自定义HttpClient
- 图片格式: 支持常见图片格式(PNG, JPG, BMP, GIF等)
🤝 支持
如有问题或建议,请在GitHub提交Issue。
📄 许可证
MIT License
版本: 1.0.0
作者: paper
更新日期: 2025-12-18
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0-windows7.0 is compatible. net9.0-windows was computed. net10.0-windows was computed. |
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
-
net8.0-windows7.0
- HtmlAgilityPack (>= 1.11.71)
- HTMLAndPDFContrast (>= 1.0.0)
- OpenCvSharp4 (>= 4.11.0.20250507)
- OpenCvSharp4.runtime.win (>= 4.11.0.20250507)
- Pdf2RTF (>= 1.0.0)
- RtfContentExtract (>= 1.0.0)
- SixLabors.ImageSharp (>= 3.1.12)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 1.0.0 | 271 | 12/18/2025 |