DocumentFormat.OpenXml.Markdown 1.0.1.1

dotnet add package DocumentFormat.OpenXml.Markdown --version 1.0.1.1
                    
NuGet\Install-Package DocumentFormat.OpenXml.Markdown -Version 1.0.1.1
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="DocumentFormat.OpenXml.Markdown" Version="1.0.1.1" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="DocumentFormat.OpenXml.Markdown" Version="1.0.1.1" />
                    
Directory.Packages.props
<PackageReference Include="DocumentFormat.OpenXml.Markdown" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add DocumentFormat.OpenXml.Markdown --version 1.0.1.1
                    
#r "nuget: DocumentFormat.OpenXml.Markdown, 1.0.1.1"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package DocumentFormat.OpenXml.Markdown@1.0.1.1
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=DocumentFormat.OpenXml.Markdown&version=1.0.1.1
                    
Install as a Cake Addin
#tool nuget:?package=DocumentFormat.OpenXml.Markdown&version=1.0.1.1
                    
Install as a Cake Tool

DocumentFormat.OpenXml.Markdown

A lightweight, powerful extension library for the official Open XML SDK. It reliably converts Word (.docx), Excel (.xlsx), and PowerPoint (.pptx) documents into clean, GitHub-Flavored Markdown.

Perfect for migrating internal documents, formatting wiki content, or pre-processing Office files into LLM-friendly plain text!

Features

  • 📝 Wordprocessing (.docx)
    Maps paragraph styles to standard Markdown headings. Parses inline text formatting like bold (**), italics (*), and strikethrough (~~). Supports multi-level nested lists and extracts hyperlinks and tables.
  • 📊 Spreadsheet (.xlsx)
    Resolves the SharedString table and extracts cached formula values. Renders worksheets as elegant GitHub-Flavored Markdown (GFM) tables.
  • 📽️ Presentation (.pptx)
    Iterates through slides sequentially. Re-maps slide titles to headings, and extracts nested text from Shapes, TextBoxes, and Tables.
  • 🖼️ Flexible Image Handling
    Easily control how embedded images are handled. You can safely ignore them, inline them directly via Base64 Data URIs, or export them to a local asset directory with custom URL prefixes.

Getting Started

Installation

Note: Instructions will be finalized when the package is published to NuGet.

dotnet add package DocumentFormat.OpenXml.Markdown

Basic Usage

Usage is incredibly simple and stream-aware. You can pass raw OpenXML types or let the library infer the document naturally via a standard Stream.

Convert from a Stream

The converter will inspect the package's contents and automatically identify if it's Word, Excel, or PowerPoint.

using DocumentFormat.OpenXml.Markdown;
using System.IO;

var fileStream = File.OpenRead("SampleDocument.docx");
string markdown = MarkdownConverter.ConvertToMarkdown(fileStream);

Console.WriteLine(markdown);
Convert from OpenXML objects

If you are already interacting with the Open XML SDK, you can convert specific documents you've already opened.

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Markdown;

using (var wordDoc = WordprocessingDocument.Open("SampleDocument.docx", false))
{
    string markdown = MarkdownConverter.ConvertToMarkdown(wordDoc);
}

Advanced Usage: Image Handling Configurations

By default, any visual image found in the document stream is encoded internally as a Base64 string. However, you can change this behavior by instantiating MarkdownConverterSettings:

var settings = new MarkdownConverterSettings 
{
    // Mode: Ignore         -> Ignores images entirely
    // Mode: Base64         -> Default behavior (![alt](data:image/png;base64,...))
    // Mode: ExportToFolder -> Writes images to disk and generates links
    ImageExportMode = ImageExportMode.ExportToFolder,

    // If using ExportToFolder, define where to persist the binaries:
    AssetExportDirectory = "./docs/assets/",

    // (Optional) Define a prefix for generated URLs (e.g. for CDN hosting):
    AssetLinkUrlPrefix = "https://cdn.example.com/assets/"
};

using var wordDoc = WordprocessingDocument.Open("Report.docx", false);
string markdown = MarkdownConverter.ConvertToMarkdown(wordDoc, settings);

Complex Formatting Notes

  • Tables: Small, simple tables work best. For Word and PowerPoint tables containing lists or multiple paragraphs in a single cell, the library uses <br> tags to preserve visual structure within GFM-compliant rows.
  • Lists: Multi-level bullets are supported for both Word and PowerPoint, utilizing standard Markdown indentation logic.

Office Math to LaTeX Supported Expressions

The DocumentFormat.OpenXml.Markdown parser comes with exhaustive support for Office Math (OMML) conversion to high-fidelity LaTeX within Markdown output.

It recursively searches through Word (.docx) nested structures (including tables) and PowerPoint (.pptx) DrawingML containers to identify and seamlessly extract mathematical structures.

Supported OMML Mathematical Expressions

1. Basic Structures

  • Fractions (f): Automatically translated to \frac{numerator}{denominator} or \binom{n}{k} depending on the fractional bar presence.
  • Radicals (rad): N-th roots translated accurately (e.g. \sqrt[n]{x} vs square root \sqrt{x}).
  • Scripts & Indices: Subscripts (sSub), Superscripts (sSup), Sub-Superscripts (sSubSup), and Left Sub-Superscripts/Pre-scripts (sPre) (e.g., _{n}^{m}Y).

2. Groupings & Delimiters

  • Delimiters (d): Automatically detects and translates bracket mappings (parentheses, braces \{, brackets [, absolute values |, norms \|, and floor/ceiling). Dynamically scales using \left and \right.
  • Delimiters (d): Automatically detects and translates bracket mappings (parentheses, braces \{, brackets [, absolute values |, norms \|, and floor/ceiling). Dynamically scales using \left and \right.
  • Piecewise Functions (cases): Seamlessly identifies piecewise equations inside single-sided bracket structures { and maps them correctly to \begin{cases} ... \end{cases}.

3. Advanced Structures

  • Large Operators (nary): Supports Integrals (\int), Double/Triple Integrals, Summations (\sum), Products (\prod), and Contour Integrals, along with their respective lower bounds and upper limits.
  • Limiting Functions (limLow, limUpp): Detects layout instructions for generic functions (like \lim, \max, \sup) and properly applies subscript bounds via \lim_{x \to \infty} formatting.
  • Accents (acc): Replaces math accents correctly: \hat, \vec, \tilde, \bar, \dot, \ddot, \breve, etc.
  • Arrays & Matrices (eqArr, m): Maps equation alignments (eqArr) into \begin{aligned} systems, and native multi-dimensional cell arrays (m) into \begin{matrix} structures.
  • Group Characters (groupChr): Supports top-level and bottom-level layout formatting for grouping structures mapping to mathematical \overset or \underset.
  • Boxes and Underlines (box, borderBox, bar): Properly encapsulates equations inside \boxed{} and applies overlines/underlines (\overline, \underline) across structures.

4. Comprehensive Symbol Dictionary

Includes an extensive multi-character mapping dictionary designed to preserve visual parity during translation:

  • All Uppercase and Lowercase Greek Symbols (\alpha, \beta ... \Omega)
  • Set representations (\in, \subset, \cup, \cap)
  • Propositional logic & arrows (\forall, \exists, \rightarrow, \Leftrightarrow)
  • Math operations and relations (\pm, \times, \leq, \neq, \approx, \equiv)
  • Function representations natively wrapped (e.g., trigonometric \sin, \cos and logarithmic \ln)

Contributing

We welcome contributions! To collaborate on this repository:

  1. Ensure you agree to the .editorconfig formatting rules.
  2. Verify all modifications still allow for cross-compilation across .NET 10+ properly.
  3. Write sufficient xUnit coverage for robust edge cases.

License

This project is licensed under the same MIT License constraints as the official DocumentFormat.OpenXml SDK repository.

Product Compatible and additional computed target framework versions.
.NET net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.0.1.1 110 4/14/2026
1.0.1 95 4/14/2026
1.0.0 104 4/13/2026