Unpdf 0.1.5
dotnet add package Unpdf --version 0.1.5
NuGet\Install-Package Unpdf -Version 0.1.5
<PackageReference Include="Unpdf" Version="0.1.5" />
<PackageVersion Include="Unpdf" Version="0.1.5" />
<PackageReference Include="Unpdf" />
paket add Unpdf --version 0.1.5
#r "nuget: Unpdf, 0.1.5"
#:package Unpdf@0.1.5
#addin nuget:?package=Unpdf&version=0.1.5
#tool nuget:?package=Unpdf&version=0.1.5
Unpdf
.NET bindings for unpdf - High-performance PDF content extraction to Markdown, text, and JSON.
Installation
dotnet add package Unpdf
Quick Start
using Unpdf;
// Convert PDF to Markdown
string markdown = Pdf.ToMarkdown("document.pdf");
Console.WriteLine(markdown);
// Convert PDF to plain text
string text = Pdf.ToText("document.pdf");
Console.WriteLine(text);
// Convert PDF to JSON
string json = Pdf.ToJson("document.pdf", pretty: true);
Console.WriteLine(json);
// Get document information
var info = Pdf.GetInfo("document.pdf");
Console.WriteLine($"Title: {info.Title}");
Console.WriteLine($"Pages: {info.PageCount}");
// Get page count
int pages = Pdf.GetPageCount("document.pdf");
Console.WriteLine($"Total pages: {pages}");
// Check if file is a valid PDF
bool isValid = Pdf.IsPdf("document.pdf");
Console.WriteLine($"Is valid PDF: {isValid}");
Advanced Usage
Convert with Options
using Unpdf;
// Convert with frontmatter and image extraction
var options = new PdfOptions
{
IncludeFrontmatter = true,
ExtractImages = true,
ImageOutputDir = "./images",
Lenient = true
};
string markdown = Pdf.ToMarkdown("document.pdf", options);
Console.WriteLine(markdown);
Extract Images
using Unpdf;
// Extract all images from PDF
var images = Pdf.ExtractImages("document.pdf", "./output/images");
foreach (var image in images)
{
Console.WriteLine($"Image: {image.Filename}");
Console.WriteLine($" Path: {image.Path}");
Console.WriteLine($" Type: {image.MimeType}");
Console.WriteLine($" Size: {image.Width}x{image.Height}");
Console.WriteLine($" Bytes: {image.SizeBytes}");
}
API Reference
Pdf.ToMarkdown(string path)
Convert a PDF file to Markdown format.
Pdf.ToMarkdown(string path, PdfOptions options)
Convert a PDF file to Markdown format with options.
Pdf.ToText(string path)
Convert a PDF file to plain text.
Pdf.ToJson(string path, bool pretty = false)
Convert a PDF file to JSON format.
Pdf.GetInfo(string path)
Get document metadata (title, author, page count, etc.)
Pdf.GetPageCount(string path)
Get the number of pages in a PDF file.
Pdf.IsPdf(string path)
Check if a file is a valid PDF.
Pdf.ExtractImages(string path, string outputDir)
Extract all images from a PDF file to the specified directory.
Pdf.Version
Get the version of the native library.
PdfOptions
| Property | Type | Default | Description |
|---|---|---|---|
ExtractImages |
bool |
false |
Enable image extraction during conversion |
ImageOutputDir |
string? |
null |
Directory to save extracted images |
IncludeFrontmatter |
bool |
false |
Include YAML frontmatter with metadata |
Lenient |
bool |
true |
Continue parsing despite minor errors |
License
MIT License
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- No dependencies.
-
net9.0
- No dependencies.
NuGet packages (2)
Showing the top 2 NuGet packages that depend on Unpdf:
| Package | Downloads |
|---|---|
|
FileFlux
Complete document processing SDK optimized for RAG systems. Transform PDF, DOCX, Excel, PowerPoint, Markdown and other formats into high-quality chunks with intelligent semantic boundary detection. Includes advanced chunking strategies, metadata extraction, and performance optimization. |
|
|
FileFlux.Core
Pure document extraction SDK for RAG systems. Zero AI dependencies. Extract text from PDF, DOCX, Excel, PowerPoint, Markdown, HTML, and text files. Provides IDocumentReader interface and implementations. Use FileFlux.Core for extraction-only scenarios. For AI-enhanced extraction (image OCR, captioning), use the FileFlux package. |
GitHub repositories
This package is not used by any popular GitHub repositories.