OfficeIMO.Pdf
0.1.34
Prefix Reserved
dotnet add package OfficeIMO.Pdf --version 0.1.34
NuGet\Install-Package OfficeIMO.Pdf -Version 0.1.34
<PackageReference Include="OfficeIMO.Pdf" Version="0.1.34" />
<PackageVersion Include="OfficeIMO.Pdf" Version="0.1.34" />
<PackageReference Include="OfficeIMO.Pdf" />
paket add OfficeIMO.Pdf --version 0.1.34
#r "nuget: OfficeIMO.Pdf, 0.1.34"
#:package OfficeIMO.Pdf@0.1.34
#addin nuget:?package=OfficeIMO.Pdf&version=0.1.34
#tool nuget:?package=OfficeIMO.Pdf&version=0.1.34
OfficeIMO.Pdf - Dependency-Free PDF Engine
OfficeIMO.Pdf is the first-party PDF package for the OfficeIMO family. It is MIT licensed and is intended to stay free of runtime package dependencies.
The long-term direction is larger than a small PDF writer: this package should become the engine that PSWriteOffice can expose for PSWritePDF-style workflows, while PSWritePDF can eventually be archived with migration guidance.
Roadmap: Docs/officeimo.pdf.roadmap.md
Goals
- Keep the core package MIT licensed and dependency-free.
- Build a real document model, layout model, PDF syntax model, and content operator pipeline.
- Generate reports that look good enough to compete with QuestPDF-style output.
- Read and manipulate PDFs enough to replace common iText-backed PSWritePDF workflows through PSWriteOffice.
- Keep the public creation API Word-like and primitive-based; polished invoice/report/statement outputs belong in samples, visual fixtures, or wrappers, not as special engine concepts.
- Keep rasterizers and visual comparison tools in tests/dev tooling, not in the runtime package.
Relationship to OfficeIMO.Word.Pdf
OfficeIMO.Word.Pdf currently uses QuestPDF and SkiaSharp for Word-to-PDF export. Treat that package as a bridge.
The strategic target is for OfficeIMO.Pdf to become good enough that Word, Excel, and PowerPoint exporters can render through the first-party engine without bringing QuestPDF, SkiaSharp, iText, or other runtime PDF dependencies into the core PDF package.
Quick Start
using OfficeIMO.Pdf;
PdfDoc.Create(new PdfOptions {
DefaultFont = PdfStandardFont.Helvetica,
DefaultFontSize = 11
})
.Meta(title: "Hello PDF", author: "OfficeIMO")
.H1("OfficeIMO.Pdf")
.Paragraph(p => p
.Text("A dependency-free PDF builder with ")
.Bold("rich text")
.Text(", links, tables, images, and a growing reader."))
.Size(PageSize.FromInches(8.5, 11))
.Margin(PageMargins.Normal)
.Table(new[] {
new[] { "Area", "Status" },
new[] { "Runtime dependencies", "None in OfficeIMO.Pdf" },
new[] { "PowerShell future", "Expose through PSWriteOffice" }
})
.Save("HelloWorld.pdf");
Current Feature Set
Generation:
- Automatic page flow with configurable page size, inch/centimeter-based page-size helpers, portrait/landscape orientation helpers, margins, inch/centimeter-based margin helpers, reusable Word-compatible
PageMarginspresets, and default text style throughPdfOptions, document defaults, or page-scoped composition, including long rich paragraph and oversized list-item continuation across pages. - Headings with Word-like reusable
PdfHeadingStyle/PdfHeadingStylestypography and rhythm defaults, optional per-heading style/alignment/color overrides across direct, compose item/element, and row-column flows, and orphan prevention before following paragraphs; paragraphs; page breaks; hard line breaks in shared simple text wrapping for headings, list items, table cells, captions, and similar non-rich text; invisibleSpacer(...)flow gaps for generic document rhythm without fake blank text; horizontal rules with reusablePdfHorizontalRuleStylethickness, color, outer rhythm, and keep-with-next defaults; panels with reusablePanelStylebox, padding, alignment, color, keep-together, keep-with-next, and outer rhythm defaults; bullet and numbered lists with reusablePdfListStyletypography, indentation, marker gap, color, rhythm, keep-together, and keep-with-next page flow; rows/columns with a built-in Word-like gutter, reusablePdfRowStylegutters, keep-together and keep-with-next page flow, and outer rhythm plus column-local item groups, headings, lists, panels, spacers, and compact tables; and simple top-level tables. - Rich paragraph runs with bold, italic, underline, strike, color, superscript/subscript baseline shifts, links with annotation contents metadata, explicit line breaks, tab-as-word-spacing normalization, alignment, justification, proportional standard-font wrapping, configurable line height, left/right/first-line/hanging indents, spacing before/after, and Word-like keep-together, keep-with-next, and widow/orphan options for page-flow control.
- Table styling, default table styles through
PdfOptions.DefaultTableStyleorPdfDoc.DefaultTableStyle(...), initial Word-like table presets (TableGrid,PlainTable1,GridTable1Light,ListTable1Light) with name-based resolution throughTableStyles.FromWordTableStyle(...), captions, row/header/footer separators, side-specific per-cell border overrides, body column fills, per-cell fills, header-safe body row striping, horizontal/vertical cell alignment, generic header/body/footer typography, cell line-height controls, symmetric and side-specific cell padding, configurable header/footer row counts with render-time bounds validation, minimum row height, table left indentation and max-width caps with left/center/right placement, table spacing before/after, keep-together, keep-with-next first-row preflight that honors configured column widths, and row-break page-flow controls, fixed/min/max column widths, relative column width weights, column-scoped style bounds validation for sizing/fills/horizontal and vertical alignment, OfficeIMO.Drawing-backed auto-fit column sizing with token minimums, initialPdfTableCellcolumn spans, row spans, rectangular merged cells with combined-box alignment, overlong row-span validation, header/footer boundary validation for row-spanned cells, row-spanned explicit cell fills/borders, explicit cell fill/border coordinate bounds validation, explicit cell fill/border coordinates that skip row-span and column-span continuation slots, row/header/footer separators, body-column background fills that skip merged-cell continuation columns, row/background fills, and default table border grids that skip row-spanned and rectangular merged-cell interiors, and cell-owned URI links, including linked column/row-spanned cell annotations over the merged text frame in top-level and compose/row-column table flows, row height calculation, proportional standard-font wrapped cell text and captions, row-by-row pagination, oversized-row splitting, and repeated header rows. - Flow vector lines, rectangles, rounded rectangles, ellipses, polygons, paths, and reusable drawing scenes through shared
OfficeIMO.Drawingdescriptors, with solid fill, two-stop linear gradient fill, simple offset shadow, stroke, stroke width, dash style, line cap, line join, fill/stroke opacity, affine transforms, clipping paths, optional URI link annotations on generic shape/drawing blocks and vector convenience helpers, and reusablePdfDrawingStylealignment, outer rhythm, and keep-with-next defaults. - JPEG and simple non-interlaced 8-bit grayscale/grayscale-alpha/RGB/RGBA PNG image placement, including PNG alpha soft masks, reusable
PdfImageStylealignment, fit, clipping, outer rhythm, and keep-with-next defaults, first-party image metadata detection, sharedOfficeImageFitstretch/contain/cover fitting, and sharedOfficeClipPathclipping throughOfficeIMO.Drawing. - Header/footer literal text formats through
PdfOptions, document-levelPdfDoc.Header(...)/PdfDoc.Footer(...), or page-scopedPdfPageCompose.Header(...)/Footer(...), with visible page number tokens that continue across flows by default, configurable visible page-number starts throughPageNumberStart(...), decimal/roman/alphabetic page-number styles throughPageNumberStyle(...), left/center/right text zones throughZones(...),FirstPageZones(...), andEvenPagesZones(...), segment builders for composed header/footer text, configurable fonts, sizes, text colors, and margin-relative offsets, plus section-local first-page and odd/even header/footer overrides for Word-like section, cover-page, and report flows. - Metadata: title, author, subject, and keywords.
- Optional reusable theme bundle through
PdfThemefor default text, heading, list, panel, paragraph, horizontal rule, image, drawing, row, and table styles, applicable throughPdfOptions.ApplyTheme(...),PdfDoc.Theme(...), orPdfPageCompose.Theme(...);PdfTheme.WordLike()provides a generic opt-in document rhythm without introducing invoice/report-specific engine APIs. - Built-in Helvetica body/header/footer defaults for readable proportional no-options documents, with optional reusable default text style overrides through
PdfTextStyleor fluentPdfDoc.DefaultTextStyle(...)/PdfPageCompose.DefaultTextStyle(...)configuration. - Optional default heading styles for Word-like H1/H2/H3 typography, color, spacing, and keep-with-next behavior through
PdfOptions.DefaultHeadingStyles,PdfOptions.SetDefaultHeadingStyle(...),PdfDoc.DefaultHeadingStyle(...),PdfPageCompose.DefaultHeadingStyle(...), or per-headingstyle:overrides; compose item/element and row-column heading helpers also expose explicitalignandcoloroverloads for local visual control without report-specific APIs. - Optional default list style for Word-like bullet and numbered list font size, line height, left indent, marker gap, color, spacing before/after, inter-item rhythm, keep-together, and keep-with-next page flow through
PdfOptions.DefaultListStyle,PdfDoc.DefaultListStyle(...),PdfPageCompose.DefaultListStyle(...), or per-liststyle:overrides. - Optional default panel style for Word-like boxed paragraph appearance, rhythm, keep-together, and keep-with-next page flow through
PdfOptions.DefaultPanelStyle,PdfDoc.DefaultPanelStyle(...),PdfPageCompose.DefaultPanelStyle(...), or per-panelstyle:overrides. - Optional default horizontal rule style for Word-like separators, rhythm, and keep-with-next page flow through
PdfOptions.DefaultHorizontalRuleStyle,PdfDoc.DefaultHorizontalRuleStyle(...),PdfPageCompose.DefaultHorizontalRuleStyle(...), or per-rulestyle:overrides. - Optional default image style for Word-like image placement, fitting, clipping, rhythm, and keep-with-next page flow through
PdfOptions.DefaultImageStyle,PdfDoc.DefaultImageStyle(...),PdfPageCompose.DefaultImageStyle(...), or per-imagestyle:overrides. - Optional default drawing style for Word-like shape and drawing-scene placement, rhythm, and keep-with-next page flow through
PdfOptions.DefaultDrawingStyle,PdfDoc.DefaultDrawingStyle(...),PdfPageCompose.DefaultDrawingStyle(...), or per-shape/per-drawingstyle:overrides. - Optional default row style for Word-like column gutters, row-level spacing, keep-together, and keep-with-next page flow through
PdfOptions.DefaultRowStyle,PdfDoc.DefaultRowStyle(...),PdfPageCompose.DefaultRowStyle(...),PdfTheme.RowStyle, or per-rowStyle(...)overrides; multi-column rows use a built-in gutter unless callers explicitly setGap(0)orPdfRowStyle { Gap = 0 }. - Optional default paragraph style for Word-like reusable typography and page-flow settings when individual paragraphs do not provide their own style, either through
PdfOptions.DefaultParagraphStyleor the fluentPdfDoc.DefaultParagraphStyle(...)setter. - Page/section-scoped flow can be created through
PdfDoc.Page(...),PdfDoc.Section(...),PdfDoc.Compose(...Page...), orPdfDoc.Compose(...Section...); page content can add directItem(...)groups, nested element groups,Spacer(...)rhythm blocks, andPageBreak()page transitions before, between, or after columns/rows; and scoped defaults can set heading, list, panel, horizontal rule, image, drawing, row, paragraph, and table styles throughPdfPageCompose.DefaultHeadingStyle(...),PdfPageCompose.DefaultListStyle(...),PdfPageCompose.DefaultPanelStyle(...),PdfPageCompose.DefaultHorizontalRuleStyle(...),PdfPageCompose.DefaultImageStyle(...),PdfPageCompose.DefaultDrawingStyle(...),PdfPageCompose.DefaultRowStyle(...),PdfPageCompose.DefaultParagraphStyle(...), andPdfPageCompose.DefaultTableStyle(...). - Optional PDF outline generation from H1/H2/H3 headings plus generic
Bookmark(...)anchors andLinkToBookmark(...)text links that emit simple PDF named destinations and GoTo annotations. - First-party color interop with
OfficeIMO.Drawing.OfficeColor. - PDF RGB colors reject non-finite or out-of-range components before they can be written as invalid PDF color operators.
ToBytes, path and streamSave, and path and streamSaveAsync.
Reading:
- Load from bytes, path, or stream into
PdfReadDocument. - Enumerate pages, metadata, and document outlines/bookmarks.
- Probe PDF header version, encryption markers, digital signature markers, form-field markers, annotation markers, outline/bookmark markers, catalog view-setting markers, page-label markers, catalog name-tree markers, named-destination markers, open-action markers, viewer-preference markers, tagged-structure markers, XMP metadata markers, catalog URI markers, output-intent markers, embedded-file markers, optional-content/layer markers, and active-content markers without full parsing through
PdfInspector.Probe. - Preflight PDFs through
PdfInspector.Preflightto get wrapper-friendlyCanRead,CanRewrite, parsedDocumentInfo, structuredReadBlockersandRewriteBlockers,HasReadBlocker(...)/HasRewriteBlocker(...)helpers, and diagnostics before invoking read or manipulation commands; unsupported page content stream filters are reported as read blockers so wrappers can explain why text extraction is not available for a real-world PDF. - Inspect page count, page sizes, orientation, inherited page rotation, catalog page mode/layout/version/language values, simple page-label rules, simple document open-action targets, simple viewer preference entries, simple AcroForm field names/types/simple values, simple page URI and named-destination link annotation summaries, distinct document-level link URI and internal destination targets, document-level page-aware link lists, named destination names/targets, and per-page link annotations with contents metadata through
PdfInspector. - Extract document text and page-by-page text from bytes, paths, or streams; path helpers can write per-page text files.
- Extract text spans with positions.
- Extract page image XObjects from bytes, paths, streams, or parsed documents with
PdfImageExtractor;ExtractImagesByPageRanges(..., PdfPageRange...)selects reusable page-range lists for wrapper pipelines, JPEG images are returned as JPEG files and simple PNG-predictor Flate images as PNG files, compatible grayscale/RGB Flate images with grayscale/SMaskalpha are returned as gray-alpha/RGBA PNGs, and helpers can write extracted images to deterministic page-numbered files. - Heuristic column-aware text extraction and simple structured extraction;
PdfTextExtractorexposes layout-option overloads for bytes, paths, streams, page-range text/structured/table extraction withPdfPageRangelists, byte/path/stream whole-document text output to UTF-8 paths or caller-owned streams, and page-file output, plus structured-by-page and table-by-page extraction that preserves detected lines, lists, leader rows, simple table geometry, and selected source page numbers so wrappers can request readback without dropping toPdfReadDocument. Byte-, path-, and stream-based text/table extraction can also write deterministicsource-page-0001.txtandsource-page-0001-table-0001.csvfiles for all pages or selected page ranges, including option-aware selected text page output, with CSV escaping for table output. - Decode common simple streams used by many PDFs, including uncompressed, Flate, ASCIIHex, ASCII85, RunLength, and LZW paths.
Manipulation:
- Merge parser-supported PDFs from bytes, streams, or paths into one new PDF with
PdfMerger, including output stream helpers and enumerable file-list output to paths or streams for wrapper pipelines. - Extract selected pages, one inclusive page range, or multiple inclusive page ranges from bytes, paths, or streams into a new PDF with
PdfPageExtractor, including repeated selected-page/range cloning plus byte-returning path helpers and output stream helpers for byte, stream, and path inputs. - Import selected, repeated, ranged, range-list, or all pages from one parser-supported PDF before, after, or inside another with
PdfPageImporter, including byte-array, path, stream, and output helpers for wrapper pipelines. Insert helpers keep the target document as the primary catalog/metadata source even when source pages are inserted at page 1. - Split a PDF from bytes, paths, or streams into single-page PDFs with
PdfPageExtractor.SplitPages, or into generic inclusivePdfPageRangechunks withPdfPageExtractor.SplitPageRanges, including deterministic split-to-directory file output;PdfPageRange.Parse(...),TryParse(...),ParseMany("1-3,5"), andTryParseMany(...)provide one shared wrapper-friendly range grammar. - Duplicate selected pages or inclusive page ranges/range lists, move selected pages or inclusive page ranges/range lists, delete selected pages or inclusive page ranges/range lists, reorder all pages from explicit page numbers or
PdfPageRangelists, and rotate selected/all pages or inclusive page ranges/range lists from bytes, paths, or streams withPdfPageEditor, including byte-returning path helpers and output stream helpers for byte, stream, and path inputs. - Update or replace document metadata from bytes, paths, or streams with
PdfMetadataEditor, including byte-returning path helpers and output stream helpers for byte, stream, and path inputs. - Add simple text/image stamps and text/image watermarks from bytes, paths, or streams with
PdfStamper, including byte-returning path helpers plus output stream helpers for byte, stream, and path PDF inputs. - Fill simple AcroForm field values from bytes, paths, or streams with
PdfFormFiller.FillFields(...), using fully qualified field names and byte-returning/path/output-stream helpers; current support updates text/string-style values and button name values, generates simple text-widget normal appearance streams plus simple button-widget Off/selected appearance states for widgets with/Rect, marks/NeedAppearances true, and rejects signed or active-content PDFs. - Flatten simple text-widget and button-widget AcroForms from bytes, paths, or streams with
PdfFormFiller.FlattenFields(...), or update and flatten in one pass withFillAndFlattenFields(...), including byte-returning/path/output-stream helpers; current support paints text-widget appearances and simple button-widget normal appearance states into page content, generating minimal button appearances when needed, removes those widget annotations, removes the AcroForm tree, and rejects signed or active-content PDFs. - Rewrite-style manipulation preserves simple direct catalog
/PageMode,/PageLayout,/Version,/Lang, simple direct/PageLabelsnumber trees, simple outline trees including simple GoTo action outline entries whose destinations point only at copied pages, direct/Destsdictionaries, simple/Names/Destsname trees, destination-array and simple GoTo dictionary/OpenActionentries, simple/ViewerPreferencesdictionaries, simple catalog/MetadataXMP XML streams, simple catalog/URIbase dictionaries, simple/OutputIntentsmetadata graphs, simple/Names/EmbeddedFilesattachment trees, simple catalog/AFassociated-file arrays, and simple/OCPropertiesoptional-content metadata, while pruning stale internal bookmark links whose named destinations no longer survive the selected pages. Copied-page label reindexing follows the trailer-root page tree, not stale catalog objects left behind by earlier revisions. - The current manipulation path copies reachable page object graphs and preserves simple image streams, selected-page URI link annotations, and internal named-destination link annotations with contents metadata across extraction, split, duplicate, move, delete, reorder, rotate, metadata rewrite, merge, and stamp flows when their targets remain reachable, but it is not yet a full arbitrary-PDF editing engine.
Quality Gates
The package now has tests that protect the dependency-free promise and start guarding visual quality:
PackageDependencyGuardrailTests.DependencyLightProjects_HaveNoPackageReferencesfails ifOfficeIMO.Pdfgains a runtimePackageReference.PdfDocVisualQualityTestschecks natural proportional-font word spacing, proportional-font alignment for simple text blocks and headers/footers, mixed Word-like flow rhythm across headings, paragraphs, invisible spacers, panels, lists, tables, images, shapes, and row columns, no-cramped-baseline, same-baseline text-collision, and ambiguous-run-gap guards, row/column text-frame bounds with explicit gutter clearance and baseline rhythm, generic line-item table rhythm without template APIs, heading wrapping with proportional wide/narrow glyph metrics in top-level and row/column flows, bullet/numbered-list wrapping with proportional wide/narrow glyph metrics in top-level and row/column flows, table-cell wrapping including proportional wide/narrow glyph metrics in top-level and row/column flows plus long unspaced token breaks, currency/percent/accounting-style numeric alignment in top-level and row/column table flows, header-relative body row striping, fixed/relative/min/max/content-aware table column widths plus table max-width, left-indent, column-span placement, row-span placement, header/footer row-count bounds, table keep-with-next preflight diagnostics for invalid table role/span models and column-scoped style bounds including horizontal alignment, rectangular merged-cell fill/border/link/alignment geometry, explicit cell fill/border coordinate bounds, row-spanned separator gaps, row-spanned and rectangular merged-cell default border gaps, row-spanned and column-spanned background-fill gaps, ignored explicit fill/border row-span and column-span continuation-slot coordinates, and linked merged-cell annotation rectangles in top-level and row/column table flows, table-cell link annotation output in top-level and row/column table flows, table keep-together and row-break page-flow behavior, and long-table pagination using rendered PDF text positions.- Justified paragraph checks verify that wrapped lines expand inter-word spacing, final lines and explicit line-break lines keep natural spacing, and text remains extractable.
- Standard font handling uses shared validation for document options, compose default text style, stamp options, writer style selection, metric helpers, PDF base-font name conversion, and WinAnsi text encoding so invalid enum values cannot silently fall back to another font, unsupported generated/stamped characters cannot silently render as
?, and raw control characters cannot be emitted as invisible PDF text bytes; valid oblique and bold-oblique default-font selections preserve their Helvetica, Times, or Courier family, while generated layout, text span readback, and text stamp/watermark placement for Helvetica and Times family text use built-in glyph-width tables, including common WinAnsi punctuation and accented Latin letters, instead of average character widths. - Page font resources are emitted only for fonts actually used by visible page content, including header/footer fonts only when headers, footers, or page numbers are enabled.
- Page setup rejects invalid intrinsic page sizes and margins at fluent assignment time, while page options report clear layout errors for default/header/footer font enum values, default/header/footer font sizes, header/footer alignment, header/footer placement, and impossible content frames.
PdfDoc.Create(options)snapshots caller-provided options so later caller mutations cannot change document rendering.- Reusable themes apply default text, heading, list, panel, horizontal rule, image, drawing, paragraph, row, and table styles at options, document, or page scope, snapshot caller-provided style objects before rendering, and include a rendered mixed-flow gate for the built-in
PdfTheme.WordLike()document defaults. - Reusable and fluent default text styles apply font, font size, and color to following page-flow content, snapshot caller-provided style objects, and reject invalid configuration delegates, font sizes, or standard font values before rendering.
- Default paragraph styles are snapshotted on assignment, fluent document configuration, and options cloning, apply to top-level and row/column paragraphs that do not provide their own style, and are bypassed by explicit per-paragraph styles.
- Default table styles are snapshotted on assignment, readback, fluent document configuration, and options cloning, apply to top-level and row/column tables that do not provide their own style, can be set from supported Word table style names, and are bypassed by explicit per-table styles.
- Compose page default heading, list, paragraph, and table styles are snapshotted per page, do not leak to later pages, and still allow explicit styles to override the page default.
- Compose page blocks expose read-only content block collections after composition so page-scoped model nodes are not caller-mutable lists.
- Page composition and header/footer templates report clear errors for null delegates, null header/footer text, invalid first-page/even-page header/footer text, invalid footer segment construction, and invalid externally-mutated footer segment state.
- Directly assigned footer segment templates render footer content without requiring the page-number flag, and footer placement validation applies to both page-number footers and segment-based footers.
- Footer segment lists are snapshotted on assignment and readback so caller mutations cannot change footer rendering after options are configured; first-page and even-page footer segments use the same snapshot and validation path.
- Save APIs report clear errors for null or non-writable streams and invalid path outputs; async path saves honor cancellation before creating directories, rendering, or writing files.
- Stream read APIs report clear errors for null or non-readable streams and read from the current stream position.
- Core path read APIs reject null, empty, or whitespace paths before attempting file reads.
- Page import APIs reject invalid source-page selections and invalid target insertion points before file reads, can read source streams from their current positions, can write byte, stream, or path inputs to the current output stream position, keep target metadata for target-edit insert operations, and
AppendPageRanges,PrependPageRanges,InsertPageRange, andInsertPageRangesimport inclusive source ranges fromfirstPage/lastPagepairs, reusablePdfPageRangevalues, or parsed range lists without wrappers materializing each page number; repeated or overlapping import ranges create cloned source pages in caller order. - Encrypted PDFs fail with a clear unsupported diagnostic before parser-supported read/manipulation helpers attempt to process page content.
- Signed PDFs, form PDFs, complex outline/bookmark PDFs, complex page-label number-tree PDFs, unsupported catalog name-tree PDFs, unsupported named-destination name-tree PDFs, complex open-action dictionary PDFs, complex viewer-preference PDFs, complex XMP metadata PDFs, complex catalog URI PDFs, tagged PDFs, complex output-intent PDFs, complex embedded-file/associated-file PDFs, complex optional-content/layer PDFs, and active-content PDFs fail with clear unsupported diagnostics before rewrite-style manipulation helpers copy, merge, edit, metadata-rewrite, stamp, or watermark page content; simple direct catalog
/PageMode,/PageLayout,/Version,/Lang, simple direct/PageLabelsnumber trees, simple outline trees including simple GoTo action outline entries whose destinations point only at copied pages, direct/Destsdictionaries, simple/Names/Destsname trees including leaf/Kids, destination-array and simple GoTo dictionary/OpenActionentries, simple/ViewerPreferencesdictionaries, simple catalog/MetadataXMP XML streams, simple catalog/URIbase dictionaries, simple/OutputIntentsmetadata graphs, simple/Names/EmbeddedFilesattachment trees, simple catalog/AFassociated-file arrays, and simple/OCPropertiesoptional-content metadata are preserved during rewrite-style manipulation, while copied-page page labels are reindexed, stale named-destination links are pruned, and complex outlines, complex page labels, unsupported catalog name trees, malformed or unsupported named-destination name trees, complex open-action dictionaries, complex viewer preferences, complex XMP metadata, complex catalog URI dictionaries, complex output intents, complex embedded/associated files, and complex optional content remain blocked. - Manipulation path input APIs reject null, empty, or whitespace input paths before attempting file reads.
- Page-by-page and page-range text extraction can validate/create output directories before reading inputs and write deterministic source-page-numbered text files for wrapper-friendly PSWritePDF parity;
ExtractTextByPageRanges(...)accepts parsed range lists, preserves caller order, and treats overlapping selections as one page set. - Image extraction can read from bytes, paths, or streams, validate/create output directories before reading path inputs, and write byte-, path-, or stream-based extracted image files with deterministic page-numbered names for wrapper-friendly PSWritePDF parity.
- Rich text runs and link annotations report clear errors for null run text, empty link text, non-absolute link URIs, empty link annotation contents, link contents without a link URI, image/shape/drawing link contents without a URI, and invalid table link coordinates before rendering.
- Paragraph, heading, image, shape, drawing-scene, vector convenience, and table-cell URI link annotations are emitted through a shared annotation dictionary builder, and paragraph
LinkToBookmark(...)runs emit internal GoTo named-destination link annotations. Generated-PDF output checks verify/Annots,/Subtype /Link,/URIand/GoToactions, escaped/Contentsmetadata, positive in-page link rectangles, aligned heading-link geometry, image placement geometry, fixed visual object geometry, missing bookmark-link diagnostics, and inspector readback, including wrapped heading lines, row/column headings, images, shapes, drawing scenes, vector helper calls, table cells, and bookmark links generated from compose and row/column flows. - Heading-based PDF outlines are emitted through a shared outline dictionary builder and protected by generated-PDF output checks for
/Outlines, title entries, nested tree links, counts, and/Destdestinations, plus inspector readback. GenericBookmark(...)anchors emit sorted simple/Names/Destsnamed destinations, reject duplicate names before output, validate internal link targets, and are covered by generated-PDF and inspector readback checks for top-level and row/column flows. - Lightweight probe/readback reports PDF header version, trailer-root catalog page mode/layout/version/language values, simple page-label rules, simple document outline targets including named destinations, simple document open-action targets, simple viewer preference entries, encryption markers, digital signature markers, form-field markers, annotation markers, simple page URI and named-destination link annotation counts, distinct document-level link URI and internal destination targets, document-level page-aware link lists, named destination names/targets, and per-page annotations with contents metadata, outline/bookmark markers, catalog view-setting markers, page-label markers, catalog name-tree markers, named-destination markers, open-action markers, viewer-preference markers, tagged-structure markers, XMP metadata markers, catalog URI markers, output-intent markers, embedded-file markers, optional-content/layer markers, active-content markers, structured preflight read and rewrite blockers, and read/rewrite diagnostics so wrappers can warn before invoking read or manipulation helpers; simple catalog view settings, simple outlines including simple GoTo action outline entries, simple direct page labels, supported catalog name trees, direct named destinations, simple destination name trees including leaf
/Kids, destination-array and simple GoTo dictionary open actions, simple viewer preferences, simple catalog XMP metadata streams, simple catalog URI base dictionaries, simple output intents, simple embedded-file attachment trees, simple catalog associated-file arrays, and simple optional-content metadata are detected without blocking rewrite. Column-aware text readback now splits wide same-baseline runs before gutter detection so generated row/column documents can be extracted in left-column then right-column order, and structured readback keeps clear single-line table gaps so generated simple tables can round-trip into detected table rows. - Generated metadata is protected by literal-string escaping checks for title, author, subject, and keywords, plus inspector readback of the original values.
- PDF object-boundary parsing ignores
streamandendobjtokens inside literal strings so ordinary form/text values cannot truncate parsed objects during read/rewrite flows. - Paragraph and panel paragraph text blocks reject invalid alignment enum values before layout while preserving supported justification.
- Paragraph and panel paragraph blocks snapshot rich text runs into read-only model collections.
- Paragraph scalar style properties reject invalid line height, spacing, and individual indents on assignment while combined text-frame width remains guarded during layout; paragraph style snapshots preserve line height, indents, first-line/hanging indents, spacing, keep-together, keep-with-next, and widow/orphan page-flow settings after the caller mutates the original style.
- Paragraph first-line and hanging indents affect both rich-text wrapping and rendered positions in top-level and row/column flows, with diagnostics when the first-line frame would leave the content area or collapse to a non-positive width.
- Mutable header, footer, panel-box, and table-caption alignment properties reject unsupported values on assignment instead of carrying invalid style state into rendering.
- Table column horizontal/vertical alignment lists reject unsupported values on assignment, reject out-of-grid entries during table layout/preflight, snapshot the assigned collection so later caller mutations cannot change the style, and are honored in both top-level and row/column table flows.
- Table captions render above the grid with configured alignment, color, font size, and spacing in both top-level and row/column table flows.
- Body column fills, per-cell fills, and per-cell borders render in both top-level and row/column table flows.
- Header, body, and footer row separators render as line strokes in both top-level and row/column table flows.
- Body row striping is calculated relative to the first body row and does not apply to configured header rows in both top-level and row/column table flows.
- Table column sizing lists reject non-positive/non-finite widths or weights on assignment and snapshot the assigned collection while leaving layout-dependent width conflicts to render-time diagnostics.
- Table body-column fills, cell fills, and cell borders snapshot assigned collections; cell fill/border coordinates are validated on assignment, and
PdfCellBorder.Widthrejects invalid intrinsic widths on assignment. - Heading blocks reject empty or whitespace titles before layout so outlines and visible document structure cannot contain invisible headings. Bookmark blocks reject empty or whitespace names immediately and duplicate names during output so generated named destinations stay deterministic.
- Heading blocks reject unsupported alignment values before layout so
Justifyor invalid enum state cannot silently render as left-aligned headings. - Heading style tests cover snapshotting, theme propagation, page-scoped defaults, rendered font size/color, and spacing-after rhythm so H1/H2/H3 can move toward Word-like style control instead of hardcoded renderer constants.
- Bullet and numbered list blocks snapshot caller-provided items and styles into read-only model state, reject unsupported alignment values before layout, cover default/page/per-list style rendering for font size, color, left indent, marker gap, and vertical rhythm, and can keep a whole list together or keep it with the following visible block across top-level and row/column page flow so
Justifyor invalid enum state cannot silently render as left-aligned lists. - Image, shape, and drawing-scene blocks reject unsupported alignment values before layout so
Justifyor invalid enum state cannot silently render as left-aligned fixed-size content. - Table captions reject empty or whitespace text before layout while null still means no caption.
- Table blocks reject unsupported table alignment values at model construction across top-level, compose, and link-enabled table APIs.
- Image blocks snapshot caller-provided bytes and reject invalid intrinsic model state at construction time; image, drawing, and horizontal rule styles reject invalid intrinsic spacing at construction time; fixed-size flow blocks such as images, horizontal rules, vector shapes, and drawing scenes still report clear layout errors when they are wider or taller than the available page content frame.
- Kept-together panels report a clear layout error when their measured height exceeds the available page content height.
- Panel scalar style properties reject invalid border width, padding, max width, and outer spacing on assignment while panel-box alignment and layout-dependent padding conflicts remain guarded.
- Panel paragraph blocks snapshot explicit panel styles at add time, and default panel style tests cover snapshotting, theme propagation, page-scoped defaults, rendered background color, max-width alignment, padding, spacing rhythm, and keep-with-next page flow for top-level and row/column panels.
- Horizontal rule style tests cover snapshotting, theme propagation, document and page-scoped defaults, rendered stroke color/thickness, spacing rhythm, and keep-with-next page flow for top-level and row/column rules.
- Image style tests cover snapshotting, theme propagation, document and page-scoped defaults, rendered alignment/fit coordinates, and spacing rhythm for top-level and row/column images.
- Drawing style tests cover snapshotting, theme propagation, document and page-scoped defaults, rendered shape and drawing-scene coordinates, and spacing rhythm for top-level and row/column vector objects.
- Row style tests cover snapshotting, theme propagation, document and page-scoped defaults, rendered gutter coordinates, row-level spacing rhythm, keep-together and keep-with-next page flow, and over-tall row diagnostics for reusable row/column primitives.
- Paragraph keep-together layout moves a whole paragraph to the next page in top-level and row/column flows when it would otherwise split, and reports a clear error when the kept paragraph is taller than the available page content height.
- Paragraph keep-with-next layout moves a paragraph with the following visible paragraph/list/panel/table/rule/image/shape/drawing/row-section neighbor in top-level and row/column flows when the first paragraph would otherwise be stranded at the bottom of a page.
- Paragraph widow/orphan layout can avoid leaving a single paragraph line at the bottom of a page in top-level and row/column flows.
- Heading layout keeps a heading with the following visible paragraph/list/panel/table/rule/image/shape/drawing/row-section neighbor in top-level and row/column flows when the heading would otherwise be orphaned at the bottom of a page.
- Row/column composition reports clear layout errors for empty rows, invalid gutters, non-finite, non-positive, or over-allocated column widths before they can corrupt rendered geometry; render-time diagnostics reject gutters that exceed the available content width and kept rows that exceed the available page content height, and row/column model collections expose read-only views after composition.
- Row/column visual-quality checks render ordinary Word-like column primitives, then verify extracted text lines remain inside their column frames, preserve explicit/default gutter clearance, and maintain readable baseline rhythm and row-level breathing room so composition regressions fail before they become cramped reports.
- Generic business-shaped visual fixtures, such as line-item tables, stay as proof documents for reusable Word-like primitives: weighted/min-width table columns, wrapped text, right-aligned numeric values, footer/summary row separation, margins, and follow-on rhythm are verified without adding invoice/report concepts to the engine.
- Word-like table presets and
PdfTheme.WordLike()now include neutral footer separator defaults so summary/footer rows have document-style structure without requiring invoice/report-specific style APIs. - Table scalar style properties reject invalid border widths, row/header/footer separator widths, padding, max width, left indent, row counts, row height, spacing, caption font size, header/body/footer font sizes, line height, and row baseline offsets on assignment while layout-dependent conflicts remain render-time diagnostics.
- Table styles report clear layout errors for invalid captions, unsupported caption justification, alignment enum values, cell fills/borders, explicit cell style coordinates outside the table grid, column-scoped style entries outside the table grid, oversized header/footer row counts, and impossible column sizing.
- Table header rows stay visually distinct from body row striping even when a style disables explicit header fill.
- Tables can move as a unit when
PdfTableStyle.KeepTogetheris enabled, including row/column flows, and report a clear layout error when the kept table is taller than the available page content height. - Tables can keep with the first visible part of the following block when
PdfTableStyle.KeepWithNextis enabled and the pair fits inside the page content frame. - Oversized table rows split across pages by wrapped text line when
PdfTableStyle.AllowRowBreakAcrossPagesis enabled, including row/column flows, and report a clear layout error when row splitting is disabled. - Table blocks snapshot input rows, styles, and link dictionaries into read-only model state and normalize null cells before layout so later caller mutations cannot change rendered output.
- Shape and drawing blocks snapshot shared
OfficeIMO.Drawingdescriptors, including linear gradient fills, at add time so later caller mutations cannot change rendered output. - Page extraction stream helpers read from and write to current stream positions; repeated selected pages are emitted as distinct cloned page objects;
PdfPageRangeparses single pages, inclusivefirst-last/first..lastranges, and comma/semicolon-separated range lists for wrappers; path output helpers create parent directories but reject empty paths and existing directory targets before reading inputs or writing output; split helpers validate/create output directories before reading inputs and write deterministic page-numbered or page-range files for wrapper-friendly PSWritePDF parity. - Page editing stream helpers read from and write to current stream positions, rejecting unreadable inputs or non-writable outputs before attempting parser work; path inputs can return edited PDF bytes, write to paths, or write to the current position of caller-owned output streams;
DeletePageRange,DeletePageRanges,DuplicatePageRange,DuplicatePageRanges,MovePageRange,MovePageRanges,RotatePageRange, andRotatePageRangesaccept eitherfirstPage/lastPagepairs, reusablePdfPageRangevalues, or parsed range lists without making wrappers materialize each page number;DeletePageRanges,MovePageRanges, andRotatePageRangestreat overlapping ranges as one selection set;DuplicatePagesandDuplicatePageRangesinsert cloned copies immediately after selected source pages and honor repeated selections/ranges as repeated clones;MovePagesmoves selected source pages as a group in original relative order before another source page or to the end; path output helpers create parent directories but reject empty paths and existing directory targets before reading inputs or writing output. - Metadata editing stream helpers read from and write to current stream positions while preserving the same update/replace semantics as byte and path inputs; path inputs can return bytes, write to paths, or write to the current position of caller-owned output streams; path output helpers create parent directories but reject empty paths and existing directory targets before reading inputs or writing output.
- Merge stream helpers read each input from its current stream position and can write merged PDFs to the current output stream position; file-list helpers accept enumerable paths for pipeline-collected inputs and can write to an output path or output stream.
- Merge file output helpers create parent directories but reject empty paths and existing directory targets before reading inputs or writing output.
- Text/image stamp and watermark helpers read the source PDF, plus image payloads for image stamps/watermarks, from the current stream position; path inputs can return stamped PDF bytes for wrapper pipelines, write to paths, or write to the current position of caller-owned output streams.
- Text/image stamp and watermark path output helpers create parent directories but reject empty paths and existing directory targets before reading inputs, image payloads, or writing output.
- Text/image stamp option models snapshot assigned page-number arrays, provide
UsePageRange(...)overloads forfirstPage/lastPagepairs or reusablePdfPageRangevalues, plusUsePageRanges(...)for parsed range lists without wrappers materializing page arrays; overlapping range-list selections are treated as one page selection set, and invalid intrinsic coordinates, sizes, rotation, fonts, and duplicate/non-positive page selections are rejected before stamping. - Text/image stamp and watermark output is emitted through the shared internal content-stream helper and protected by content-stream checks for placement matrices, color/font operators, image dimensions/rotation, PNG alpha soft masks, and above/below-content layering order; custom image watermark sizing preserves watermark layering.
PdfDocVisualBaselineTestskeeps representative and professional report geometry snapshots for headings, paragraphs, rich text, panels, bullets, tables, images, PNG alpha soft masks, clipping, axial shading, and vector drawing content-stream signals.PdfDocRasterVisualBaselineTestscan render the professional report, a two-page line-item statement fixture, a Word-like table style gallery, a landscape showcase dashboard, plus compact hello-world, core-layout, style-cheatsheet, styled-runs, drawing-gallery, row-columns, links-rules, lists-tables, default-styles, three-page flow-dsl, and two-page headers-footers scenarios through Popplerpdftoppm, then compare page PNGs against approved baselines. On mismatch it writes expected, actual, and diff PNG artifacts under%TEMP%\OfficeIMO.PdfRaster. SetOFFICEIMO_REQUIRE_PDF_RASTERIZER=1to make missing Poppler fail the test lane,OFFICEIMO_UPDATE_PDF_RASTER_BASELINE=1to refresh approved PNGs,OFFICEIMO_PDF_RASTER_PIXEL_TOLERANCEto allow small per-channel deltas, andOFFICEIMO_PDF_RASTER_ALLOWED_DIFF_PIXELSto allow a limited changed-pixel count.
Near-term work should keep adding small visual gates before broad feature growth. The roadmap tracks the intended sequence.
Support Matrix
The current create/read/manipulate/export coverage is tracked in Docs/officeimo.pdf.support-matrix.md.
Examples
Runnable samples live under OfficeIMO.Examples/Pdf. The professional report sample can be generated with:
dotnet run --project OfficeIMO.Examples -- --pdf-professional
The Word-like table style gallery can be generated with:
dotnet run --project OfficeIMO.Examples -- --pdf-table-styles
Known Gaps
- This is not yet a full QuestPDF replacement.
- This is not yet a full iText/PSWritePDF replacement.
- Font metrics are still simplified outside the built-in Helvetica and Times-family standard-font width tables.
- TrueType/OpenType embedding is not implemented yet; text outside the current standard-font WinAnsi path is rejected with a clear diagnostic instead of being substituted.
- Unsupported catalog name-tree preservation, malformed or unsupported named-destination name trees, full PNG coverage, advanced page import/editing, richer image transparency cases, rich/custom form appearance generation and flattening beyond simple field inspection, simple value fill, and simple text/button-widget flattening, signatures, encryption, redaction, and Office document rendering are roadmap items.
- The reader is intentionally pragmatic and does not yet cover the whole PDF specification.
Design Notes
OfficeIMO.Pdfruntime code must not depend on PDF libraries, rasterizers, SkiaSharp, QuestPDF, iText, or commercial engines.OfficeIMO.Drawingis the preferred first-party reuse layer for shared color, font, image metadata, image fitting, text measurement, reusable drawing primitives, and eventually office-wide drawing scene concepts. Initial color interop is available throughPdfColor, image placement can consumeOfficeImageFit, and flow lines/rectangles/rounded rectangles/ellipses/polygons/paths plus grouped scenes can consume sharedOfficeShapeandOfficeDrawingdescriptors, including stroke dash/cap/join, two-stop linear gradient fill intent, simple offset shadow intent, fill/stroke opacity, affine transform intent, and clipping path intent.- PDF syntax and layout should move toward reusable internal models instead of one-off string writing. Initial reused content-stream helpers now cover common fill, stroke, stroke width, stroke cap, stroke join, dash arrays, fill-stroke, rectangle, line, path painting for ordinary and transformed rounded rectangles/ellipses/polygons/freeform paths, clipping paths for shapes/gradients/images, shading draws for gradient fills, local transform matrices, ExtGState resource application for opacity/shadows, save/restore wrappers for images/clipping/gradients/transformed shapes, text decorations, simple text, rich paragraph text, table-cell text, header/footer text, generated image placement, text/image stamp and watermark streams, and graphics-state operators; generated and rewrite-style PDF names, literal strings, and indirect references share one syntax escaper, generated indirect-object creation, explicit object reservation/replacement, rewrite-style object wrapping, and stream body wrapping share one object-byte helper, generated page objects reference a reserved
/Pagesobject directly instead of string-patched parent placeholders, generated page dictionaries including/MediaBox,/Resources,/Contents, and/Annotsnow use one page dictionary builder, generated URI link annotations now use one annotation dictionary builder with literal-string escaping and rectangle validation, generated outline root/item dictionaries now use one outline dictionary builder with title escaping, navigation links, child counts, and destination validation, generated PDFs, metadata editing, and merge outputs share one Info dictionary builder, generated PDFs, page extraction, and merge outputs share one/Pagesdictionary builder, generated catalog dictionaries and rewrite-style catalog prefix/name/reference entries share one catalog dictionary builder, generated PDFs and rewrite-style manipulation outputs share one classic xref/trailer assembler, generated and stamp-injected standard Type1 WinAnsi font dictionaries share one font dictionary builder, generated/stamped JPEG and PNG image XObject dictionaries including soft masks share one image XObject dictionary builder, page resource reference dictionaries for Font, XObject, ExtGState, and Shading now share one formatter with PDF name escaping, and generated ExtGState alpha plus axial shading object bodies share one visual resource dictionary builder with opacity and finite-coordinate validation. - Tests may use helper packages such as PdfPig to inspect output, because test dependencies do not ship with
OfficeIMO.Pdf. - External rasterizers such as Poppler belong only in development/test lanes; they must never become runtime dependencies of
OfficeIMO.Pdf.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
| .NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
| .NET Standard | netstandard2.0 is compatible. netstandard2.1 was computed. |
| .NET Framework | net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 is compatible. net48 was computed. net481 was computed. |
| MonoAndroid | monoandroid was computed. |
| MonoMac | monomac was computed. |
| MonoTouch | monotouch was computed. |
| Tizen | tizen40 was computed. tizen60 was computed. |
| Xamarin.iOS | xamarinios was computed. |
| Xamarin.Mac | xamarinmac was computed. |
| Xamarin.TVOS | xamarintvos was computed. |
| Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETFramework 4.7.2
- OfficeIMO.Drawing (>= 1.0.13)
-
.NETStandard 2.0
- OfficeIMO.Drawing (>= 1.0.13)
-
net10.0
- OfficeIMO.Drawing (>= 1.0.13)
-
net8.0
- OfficeIMO.Drawing (>= 1.0.13)
NuGet packages (1)
Showing the top 1 NuGet packages that depend on OfficeIMO.Pdf:
| Package | Downloads |
|---|---|
|
OfficeIMO.Reader
Unified, read-only document extraction facade for OfficeIMO (Word/Excel/PowerPoint/Markdown/PDF) intended for AI ingestion. |
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 0.1.34 | 371 | 5/27/2026 |
| 0.1.33 | 438 | 5/26/2026 |
| 0.1.32 | 444 | 5/26/2026 |
| 0.1.31 | 462 | 5/23/2026 |
| 0.1.30 | 440 | 5/22/2026 |
| 0.1.29 | 449 | 5/21/2026 |
| 0.1.28 | 433 | 5/21/2026 |
| 0.1.27 | 433 | 5/20/2026 |
| 0.1.26 | 414 | 5/19/2026 |
| 0.1.25 | 405 | 5/18/2026 |
| 0.1.24 | 482 | 5/16/2026 |
| 0.1.23 | 435 | 5/14/2026 |
| 0.1.22 | 430 | 5/14/2026 |
| 0.1.21 | 426 | 5/7/2026 |
| 0.1.20 | 470 | 5/1/2026 |
| 0.1.19 | 429 | 4/27/2026 |
| 0.1.18 | 610 | 4/10/2026 |
| 0.1.17 | 150 | 4/9/2026 |
| 0.1.16 | 178 | 4/3/2026 |
| 0.1.15 | 156 | 4/1/2026 |