PdfOxide 0.3.52

dotnet add package PdfOxide --version 0.3.52
                    
NuGet\Install-Package PdfOxide -Version 0.3.52
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="PdfOxide" Version="0.3.52" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="PdfOxide" Version="0.3.52" />
                    
Directory.Packages.props
<PackageReference Include="PdfOxide" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add PdfOxide --version 0.3.52
                    
#r "nuget: PdfOxide, 0.3.52"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package PdfOxide@0.3.52
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=PdfOxide&version=0.3.52
                    
Install as a Cake Addin
#tool nuget:?package=PdfOxide&version=0.3.52
                    
Install as a Cake Tool

PDF Oxide for .NET — The Fastest PDF Toolkit for C# & .NET

The fastest .NET PDF library for text extraction, image extraction, and markdown conversion. Powered by a pure-Rust core, exposed to .NET through P/Invoke. 0.8ms mean per document, 5× faster than PyMuPDF, 15× faster than pypdf. 100% pass rate on 3,830 real-world PDFs. MIT / Apache-2.0 licensed.

NuGet License: MIT OR Apache-2.0

Part of the PDF Oxide toolkit. Same Rust core, same speed, same 100% pass rate as the Rust, Python, Go, JavaScript / TypeScript, and WASM bindings.

Quick Start

dotnet add package PdfOxide
using PdfOxide.Core;

using var doc = PdfDocument.Open("paper.pdf");
string text = doc.ExtractText(0);
string markdown = doc.ToMarkdown(0);

Why pdf_oxide?

  • Fast — 0.8ms mean per document, 5× faster than PyMuPDF, 15× faster than pypdf, 29× faster than pdfplumber
  • Reliable — 100% pass rate on 3,830 test PDFs, zero panics, zero timeouts, no segfaults
  • Complete — Text extraction, image extraction, search, form fields, PDF creation, and editing in one package
  • Permissive license — MIT / Apache-2.0 — use freely in commercial and closed-source projects
  • Pure Rust core — Memory-safe, panic-free, no C dependencies beyond the P/Invoke layer
  • Native binaries included — Pre-built libraries for Windows, macOS, and Linux (x64 + ARM64) ship in the NuGet package
  • Idiomatic .NETusing statements, async counterparts, LINQ-friendly collections, nullable reference types

Performance

Benchmarked on 3,830 PDFs from three independent public test suites (veraPDF, Mozilla pdf.js, DARPA SafeDocs). Text extraction libraries only. Single-thread, 60s timeout, no warm-up.

Library Mean p99 Pass Rate License
PDF Oxide 0.8ms 9ms 100% MIT / Apache-2.0
PyMuPDF 4.6ms 28ms 99.3% AGPL-3.0
pypdfium2 4.1ms 42ms 99.2% Apache-2.0
pdftext 7.3ms 82ms 99.0% GPL-3.0
pdfminer 16.8ms 124ms 98.8% MIT
pypdf 12.1ms 97ms 98.4% BSD-3

99.5% text parity vs PyMuPDF and pypdfium2 across the full corpus. The .NET binding is sometimes faster than direct Rust calls on small documents because the P/Invoke path bypasses the Rust-side mutex used by other bindings.

Installation

dotnet add package PdfOxide

Pre-built native libraries for:

Platform x64 ARM64
Windows Yes Yes
macOS Yes Yes (Apple Silicon)
Linux Yes Yes

Compatible with .NET Standard 2.1, .NET 5, .NET 6, .NET 8, .NET Framework 4.8+, .NET Core, Xamarin, MAUI, and Blazor Server. No system dependencies, no Rust toolchain required.

API Tour

Open a document

using PdfOxide.Core;

using var doc = PdfDocument.Open("report.pdf");
Console.WriteLine($"Pages: {doc.PageCount}");
Console.WriteLine($"PDF version: {doc.Version.Major}.{doc.Version.Minor}");

// From a stream
using var stream = File.OpenRead("report.pdf");
using var docFromStream = PdfDocument.Open(stream);

// Encrypted PDFs
using var encrypted = PdfDocument.OpenWithPassword("secure.pdf", "user-password");

Text extraction

using var doc = PdfDocument.Open("document.pdf");

string text = doc.ExtractText(0);          // single page
string allText = doc.ExtractAllText();     // entire document

string markdown = doc.ToMarkdown(0);
string allMarkdown = doc.ToMarkdownAll();

string html = doc.ToHtml(0);
string allHtml = doc.ToHtmlAll();

Structured text

var words = doc.ExtractWords(0);
foreach (var (text, x, y, w, h) in words)
{
    Console.WriteLine($"\"{text}\" at ({x:F1}, {y:F1})");
}

// Text inside a rectangle
string regionText = doc.ExtractTextInRect(0, x: 50, y: 700, width: 200, height: 50);

// Tables
var tables = doc.ExtractTables(0);
foreach (var (rows, cols) in tables)
{
    Console.WriteLine($"{rows}x{cols} table");
}
var results = doc.SearchAll("quarterly revenue");
foreach (var (page, text, x, y, w, h) in results)
{
    Console.WriteLine($"Page {page}: \"{text}\" at ({x}, {y})");
}

// Single-page case-sensitive search
var pageResults = doc.SearchPage(0, "exact phrase", caseSensitive: true);

Image extraction

using PdfOxide.Core;

using var doc = PdfDocument.Open("document.pdf");
var images = doc.ExtractImages(0);
foreach (var img in images)
{
    Console.WriteLine($"{img.Width}x{img.Height} {img.Format} ({img.Colorspace}, {img.BitsPerComponent} bpc, {img.Data.Length} bytes)");
}

Form fields

using PdfOxide.Core;

// Read form fields from an existing PDF
using var doc = PdfDocument.Open("form.pdf");
foreach (var f in doc.GetFormFields())
{
    Console.WriteLine($"{f.Name} ({f.FieldType}) = \"{f.Value}\"");
}

// Fill and flatten form fields via DocumentEditor
using var editor = DocumentEditor.Open("form.pdf");
editor.SetFormFieldValue("employee.name", "Jane Doe");
editor.SetFormFieldValue("employee.email", "jane@example.com");
editor.FlattenForms();
editor.Save("filled-form.pdf");

Document editing — metadata

using PdfOxide.Core;

using var editor = DocumentEditor.Open("document.pdf");

// Read metadata
Console.WriteLine($"Title: {editor.Title}");
Console.WriteLine($"Author: {editor.Author}");
Console.WriteLine($"Pages: {editor.PageCount}");

// Update metadata (properties are get/set)
editor.Title = "Quarterly Report";
editor.Author = "Example Author";
editor.Subject = "Q1 2026 Results";

// Save (or save async)
editor.Save("edited.pdf");
// await editor.SaveAsync("edited.pdf");

Note: the .NET binding currently exposes document open/read/convert/create, image extraction, form field read/fill/flatten, and metadata editing. Page operations, annotations, rendering, and signatures are available through the Rust core and other language bindings; equivalent .NET surface will be added in a future release — track progress at issues.

Creating PDFs

using PdfOxide.Core;

// From Markdown, HTML, or plain text
using (var pdf = Pdf.FromMarkdown("# Invoice\n\nTotal: **$42.00**"))
{
    pdf.Save("invoice.pdf");
}

using (var pdf = Pdf.FromHtml("<h1>Report</h1><p>Generated 2026-04-09</p>"))
{
    byte[] bytes = pdf.SaveToBytes();
    File.WriteAllBytes("report.pdf", bytes);
}

// Save to a stream
using (var pdf = Pdf.FromMarkdown("# Stream Example"))
using (var file = File.Create("output.pdf"))
{
    pdf.SaveToStream(file);
}

Page rendering

using var doc = PdfDocument.Open("document.pdf");

// Render to PNG
byte[] png = doc.RenderPage(0);
File.WriteAllBytes("page0.png", png);

// Render with zoom
byte[] zoomed = doc.RenderPageZoom(0, zoom: 2.0f);

// Render as JPEG
byte[] jpeg = doc.RenderPage(0, format: 1);

// Thumbnail
byte[] thumb = doc.RenderThumbnail(0);

Async support

using var doc = PdfDocument.Open("document.pdf");
string text = await doc.ExtractTextAsync(0);

using var pdf = Pdf.FromMarkdown("# Async");
await pdf.SaveAsync("output.pdf");

OCR & Auto Mode

OCR ships in the prebuilt PdfOxide NuGet package as of v0.3.52 — no --build-from-source. Supply an ONNX Runtime shared library (point ORT_DYLIB_PATH at it) and the models, then let pdf_oxide route per page (native text where present, OCR where the page is image-only, graceful fallback when OCR is unavailable):

using PdfOxide.Core;

OcrEngine.PrefetchModels("english");                   // one-off provisioning

using var doc = PdfDocument.Open("scanned-or-mixed.pdf");
string text = doc.ExtractTextAuto(0);                  // recommended

For manual OcrEngine usage (Load(...) + ExtractText(doc, page)), page-type classification (doc.ClassifyPage(0)), config knobs, model selection, and ONNX Runtime install recipes: OCR Guide.

Other languages

PDF Oxide ships the same Rust core through six bindings:

A bug fix in the Rust core lands in every binding on the next release.

Documentation

Use Cases

  • RAG / LLM pipelines — Convert PDFs to clean Markdown for retrieval-augmented generation
  • Enterprise document processing — Extract text, images, and metadata from thousands of PDFs in seconds
  • Form processing — Read and fill AcroForm fields, flatten forms into static content
  • PDF generation — Create invoices, reports, certificates, and templated documents programmatically
  • Metadata editing — Update title, author, subject on existing PDFs without rewriting content
  • PyMuPDF alternative — MIT licensed, 5× faster, no AGPL restrictions, native .NET API

Why I built this

I needed PyMuPDF's speed without its AGPL license, and I needed it in more than one language. Nothing existed that ticked all three boxes — fast, MIT, multi-language — so I wrote it. The Rust core is what does the real work; the bindings for Python, Go, JS/TS, C#, and WASM are thin shells around the same code, so a bug fix in one lands in all of them. It now passes 100% of the veraPDF + Mozilla pdf.js + DARPA SafeDocs test corpora (3,830 PDFs) on every platform I've tested.

If it's useful to you, a star on GitHub genuinely helps. If something's broken or missing, open an issue — I read all of them.

— Yury

License

Dual-licensed under MIT or Apache-2.0 at your option. Unlike AGPL-licensed alternatives, pdf_oxide can be used freely in any project — commercial or open-source — with no copyleft restrictions.

Citation

@software{pdf_oxide,
  title = {PDF Oxide: Fast PDF Toolkit for Rust, Python, Go, JavaScript, and C#},
  author = {Yury Fedoseev},
  year = {2025},
  url = {https://github.com/yfedoseev/pdf_oxide}
}

C# + .NET + Rust core | MIT / Apache-2.0 | 100% pass rate on 3,830 PDFs | 0.8ms mean | 5× faster than the industry leaders

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
  • net10.0

    • No dependencies.
  • net8.0

    • No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.3.52 30 5/20/2026
0.3.51 45 5/19/2026
0.3.50 92 5/17/2026
0.3.49 92 5/16/2026
0.3.48 92 5/15/2026
0.3.47 93 5/13/2026
0.3.46 105 5/11/2026
0.3.45 99 5/7/2026
0.3.44 89 5/6/2026
0.3.43 91 5/3/2026
0.3.42 93 5/3/2026
0.3.41 95 5/1/2026
0.3.40 104 4/29/2026
0.3.39 108 4/27/2026
0.3.38 105 4/23/2026
0.3.37 251 4/21/2026
0.3.36 102 4/20/2026
0.3.35 108 4/19/2026
0.3.34 93 4/18/2026
0.3.33 98 4/17/2026
Loading failed

v0.3.30 — NativeAOT + LibraryImport migration (#333) + v0.3.30 shipping fixes
- v0.3.30 was canceled mid-release after the release pipeline surfaced
 binding.gyp framework wiring, binding.cc C++20 syntax, and a
 go/.gitignore negation that silently blocked the Go staticlib commits.
 v0.3.30 rolls the same feature set (#311-#335) forward with those three
 bugs fixed; no v0.3.30 artifact was ever published to NuGet.
- Migrated all 881 native declarations from DllImport to LibraryImport
 so the binding is NativeAOT-publish-ready and trim-safe.
- Target frameworks trimmed to net8.0 + net10.0 (both current LTS).
 Drops netstandard2.1 / net5.0 / net6.0 (all out of support).
- Package now sets IsAotCompatible / IsTrimmable so consumers get early
 analyzer warnings for any dynamic usage that would break AOT output.
- Thread-safe, idiomatic API (IDisposable, async/await with CancellationToken,
 fluent builders) and features unchanged: text/markdown/html extraction,
 search, form fields, annotations, rendering, digital signatures, OCR,
 compliance (PDF/A, PDF/X, PDF/UA), page editing, and PDF creation.
- Cross-platform native libraries bundled for win-x64/arm64, linux-x64/arm64,
 osx-x64/arm64.