HtmlPDFContrastImage.Window 1.0.0

dotnet add package HtmlPDFContrastImage.Window --version 1.0.0
                    
NuGet\Install-Package HtmlPDFContrastImage.Window -Version 1.0.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="HtmlPDFContrastImage.Window" Version="1.0.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="HtmlPDFContrastImage.Window" Version="1.0.0" />
                    
Directory.Packages.props
<PackageReference Include="HtmlPDFContrastImage.Window" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add HtmlPDFContrastImage.Window --version 1.0.0
                    
#r "nuget: HtmlPDFContrastImage.Window, 1.0.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package HtmlPDFContrastImage.Window@1.0.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=HtmlPDFContrastImage.Window&version=1.0.0
                    
Install as a Cake Addin
#tool nuget:?package=HtmlPDFContrastImage.Window&version=1.0.0
                    
Install as a Cake Tool

HtmlPDFContrastImage.Window

HTML和PDF文档图片对比工具 - Windows平台NuGet包

📦 功能特性

多种输入方式 - 支持文件路径、URL和byte[]输入
图片过滤 - 支持通过byte[]过滤指定图片
灵活配置 - 所有对比参数都可配置
高性能匹配 - 使用感知哈希和匈牙利算法
异步API - 完全异步,支持高并发

🚀 快速开始

安装

dotnet add package HtmlPDFContrastImage.Window

基本使用

1. 文件路径方式
using HtmlPDFContrastImage.Window;

// 创建对比器
await using var comparer = new HtmlPdfComparer();

// 执行对比
var result = await comparer.CompareAsync(
    htmlSource: @"C:\path\to\file.html",
    pdfSource: @"C:\path\to\file.pdf"
);

// 输出结果
Console.WriteLine($"匹配成功: {result.MatchedCount}/{result.HtmlImageCount}");
Console.WriteLine($"HTML匹配率: {result.MatchRateByHtml:P2}");
Console.WriteLine($"PDF匹配率: {result.MatchRateByPdf:P2}");
2. URL方式
await using var comparer = new HtmlPdfComparer();

var result = await comparer.CompareAsync(
    htmlSource: "https://example.com/document.html",
    pdfSource: "https://example.com/document.pdf"
);
3. byte[]方式
byte[] htmlBytes = File.ReadAllBytes("file.html");
byte[] pdfBytes = File.ReadAllBytes("file.pdf");

await using var comparer = new HtmlPdfComparer();

var result = await comparer.CompareAsync(
    htmlBytes: htmlBytes,
    pdfBytes: pdfBytes
);
4. 混合方式
// HTML从URL获取,PDF从本地文件
await using var comparer = new HtmlPdfComparer();

var result = await comparer.CompareAsync(
    htmlSource: "https://example.com/document.html",
    pdfSource: @"C:\local\file.pdf"
);

⚙️ 高级配置

自定义配置选项

using System.Collections.Immutable;

var options = new CompareOptions
{
    // 排除特定名称的图片
    ExcludeImageNames = ImmutableList.Create("logo.png", "header.jpg"),
    
    // 排除特定路径的图片  
    ExcludeImagePaths = ImmutableList.Create(@"C:\temp\watermark.png"),
    
    // 哈希相似度阈值 (0.0-1.0)
    HashThreshold = 0.95,
    
    // 图片相似度阈值 (0.0-1.0)
    SimilarityThreshold = 0.85,
    
    // 匹配算法: Greedy(贪心) 或 Hungarian(匈牙利)
    MatchAlgorithm = MatchAlgorithm.Hungarian,
    
    // 相似度计算方法: PerceptualHash(感知哈希) | Histogram(直方图) | SSIM(结构相似性)
    SimilarityMethod = SimilarityMethod.PerceptualHash
};

await using var comparer = new HtmlPdfComparer(options);

图片过滤功能

// 准备要过滤的图片(不参与对比)
var filterImages = new List<byte[]>
{
    File.ReadAllBytes("logo.png"),
    File.ReadAllBytes("watermark.png")
};

await using var comparer = new HtmlPdfComparer();

var result = await comparer.CompareAsync(
    htmlSource: "document.html",
    pdfSource: "document.pdf",
    filterImageBytes: filterImages  // 这些图片会被排除在对比之外
);

自定义HttpClient

// 配置自定义的HttpClient(用于URL下载)
var httpClient = new HttpClient
{
    Timeout = TimeSpan.FromMinutes(5)
};
httpClient.DefaultRequestHeaders.Add("User-Agent", "MyApp/1.0");

await using var comparer = new HtmlPdfComparer(
    options: null,
    httpClient: httpClient
);

并发控制

// 设置最大并发数为4
await using var comparer = new HtmlPdfComparer(
    options: null,
    httpClient: null,
    maxConcurrency: 4
);

// 批量对比
var tasks = Enumerable.Range(1, 10).Select(async i =>
{
    return await comparer.CompareAsync(
        htmlSource: $"file{i}.html",
        pdfSource: $"file{i}.pdf"
    );
});

var results = await Task.WhenAll(tasks);

📊 对比结果

CompareResult 属性

public class CompareResult
{
    public bool Success { get; init; }              // 是否成功
    public int HtmlImageCount { get; init; }        // HTML图片总数
    public int PdfImageCount { get; init; }         // PDF图片总数
    public int MatchedCount { get; init; }          // 匹配成功数
    public int UnmatchedHtmlCount { get; init; }    // HTML未匹配数
    public int UnmatchedPdfCount { get; init; }     // PDF未匹配数
    public double MatchRateByHtml { get; init; }    // HTML匹配率 (0.0-1.0)
    public double MatchRateByPdf { get; init; }     // PDF匹配率 (0.0-1.0)
    public TimeSpan ElapsedTime { get; init; }      // 处理耗时
    public List<string> Errors { get; init; }       // 错误信息
    public List<ImagePairInfo> MatchedPairs { get; init; } // 匹配的图片对
}

使用结果数据

var result = await comparer.CompareAsync(...);

if (result.Success)
{
    Console.WriteLine($"✅ 对比成功");
    Console.WriteLine($"📊 HTML: {result.HtmlImageCount} 张图片");
    Console.WriteLine($"📊 PDF: {result.PdfImageCount} 张图片");
    Console.WriteLine($"✓ 成功匹配: {result.MatchedCount} 对");
    Console.WriteLine($"✗ HTML未匹配: {result.UnmatchedHtmlCount} 张");
    Console.WriteLine($"✗ PDF未匹配: {result.UnmatchedPdfCount} 张");
    Console.WriteLine($"📈 HTML匹配率: {result.MatchRateByHtml:P2}");
    Console.WriteLine($"📈 PDF匹配率: {result.MatchRateByPdf:P2}");
    Console.WriteLine($"⏱ 耗时: {result.ElapsedTime.TotalSeconds:F2}秒");
    
    // 查看详细匹配信息
    foreach (var pair in result.MatchedPairs)
    {
        Console.WriteLine($"  HTML[{pair.HtmlIndex}] ↔ PDF[{pair.PdfIndex}] 相似度: {pair.Similarity:P2}");
    }
}
else
{
    Console.WriteLine($"❌ 对比失败");
    foreach (var error in result.Errors)
    {
        Console.WriteLine($"  - {error}");
    }
}

🎯 完整示例

using HtmlPDFContrastImage.Window;
using System.Collections.Immutable;

// 配置选项
var options = new CompareOptions
{
    ExcludeImageNames = ImmutableList.Create("logo.png"),
    HashThreshold = 0.95,
    SimilarityThreshold = 0.85,
    MatchAlgorithm = MatchAlgorithm.Hungarian,
    SimilarityMethod = SimilarityMethod.PerceptualHash
};

// 准备过滤图片
var filterImages = new List<byte[]>
{
    await File.ReadAllBytesAsync("watermark.png")
};

// 创建对比器
await using var comparer = new HtmlPdfComparer(options, maxConcurrency: 2);

// 执行对比(支持混合输入方式)
var result = await comparer.CompareAsync(
    htmlSource: "https://example.com/doc.html",  // 从URL
    pdfBytes: await File.ReadAllBytesAsync("local.pdf"),  // 从byte[]
    filterImageBytes: filterImages
);

// 输出结果
if (result.Success)
{
    Console.WriteLine($"✅ 成功匹配 {result.MatchedCount}/{result.HtmlImageCount} 张图片");
    Console.WriteLine($"📈 匹配率: {result.MatchRateByHtml:P2}");
}

📋 系统要求

  • .NET 8.0 或更高版本
  • Windows 操作系统
  • 推荐内存: 4GB+

🔧 依赖包

  • HtmlAgilityPack - HTML解析
  • OpenCvSharp4 - 图像处理
  • SixLabors.ImageSharp - 图像操作

📝 注意事项

  1. 资源管理: 使用await usingusing确保正确释放资源
  2. 并发控制: 合理设置maxConcurrency避免内存溢出
  3. 超时处理: URL下载时注意网络超时,可自定义HttpClient
  4. 图片格式: 支持常见图片格式(PNG, JPG, BMP, GIF等)

🤝 支持

如有问题或建议,请在GitHub提交Issue。

📄 许可证

MIT License


版本: 1.0.0
作者: paper 更新日期: 2025-12-18

Product Compatible and additional computed target framework versions.
.NET net8.0-windows7.0 is compatible.  net9.0-windows was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.0.0 271 12/18/2025