SourceDocParser.Common
0.4.1-alpha
See the version list below for details.
dotnet add package SourceDocParser.Common --version 0.4.1-alpha
NuGet\Install-Package SourceDocParser.Common -Version 0.4.1-alpha
<PackageReference Include="SourceDocParser.Common" Version="0.4.1-alpha" />
<PackageVersion Include="SourceDocParser.Common" Version="0.4.1-alpha" />
<PackageReference Include="SourceDocParser.Common" />
paket add SourceDocParser.Common --version 0.4.1-alpha
#r "nuget: SourceDocParser.Common, 0.4.1-alpha"
#:package SourceDocParser.Common@0.4.1-alpha
#addin nuget:?package=SourceDocParser.Common&version=0.4.1-alpha&prerelease
#tool nuget:?package=SourceDocParser.Common&version=0.4.1-alpha&prerelease
SourceDocParserLib
Roslyn-based .NET assembly walker that turns compiled .dll + .pdb + .xml triples into a strongly-typed API catalog (types, members, signatures, XML docs, inheritdoc, SourceLink) and hands it to a pluggable emitter for rendering.
The catalog is format-neutral. Emitters decide how to render it — Markdown for Zensical / mkdocs Material, or YAML for docfx ManagedReference, with room for other targets.
Packages
| Package | What it does |
|---|---|
SourceDocParser |
Core walker, merger, source-link resolution. Defines IAssemblySource, IDocumentationEmitter, IMetadataExtractor. |
SourceDocParser.NuGet |
IAssemblySource that fetches packages from nuget.org by owner / explicit list and exposes the per-TFM lib/ trees. |
SourceDocParser.Zensical |
IDocumentationEmitter that writes Markdown tuned for Zensical / mkdocs Material (admonitions, content tabs, mermaid). |
SourceDocParser.Docfx |
IDocumentationEmitter that writes docfx ManagedReference YAML pages (drop-in replacement for dotnet docfx metadata output) plus the docfx.json config-file shim that lets an existing docfx site drive the parser pipeline. |
Logging flows through Microsoft.Extensions.Logging.Abstractions source-generated [LoggerMessage] partials, so any host (Serilog, Console, NLog, …) plugs in without the libraries taking a dependency on a specific backend.
Quick start
var loggerFactory = LoggerFactory.Create(b => b.AddConsole());
var source = new NuGetAssemblySource(
rootDirectory: "/path/to/repo", // contains nuget-packages.json
apiPath: "/path/to/api", // where lib/ + refs/ get extracted
logger: loggerFactory.CreateLogger<NuGetAssemblySource>());
var emitter = new ZensicalDocumentationEmitter();
var result = await new MetadataExtractor().RunAsync(
source,
outputRoot: "/path/to/markdown-output",
emitter,
loggerFactory.CreateLogger<MetadataExtractor>());
Console.WriteLine($"Emitted {result.PagesEmitted} pages across {result.CanonicalTypes} types.");
Performance
The pipeline is built around a span-based XML scanner, pooled buffers, eager release of memory-mapped reference DLLs, and a streaming type merger that consumes catalogs as they land. The result is a small, predictable allocation budget and a fast wall-time per assembly.
Benchmark workload. Numbers below are from the BenchmarkDotNet suite under src/benchmarks/, run on a Ryzen 7 5800X / .NET 10. The workload extracts three real NuGet packages from nuget.org — pulling each package's lib/ and ref/ trees and the matching reference assemblies, walking every public symbol across ~19 target-framework groups, parsing the shipped XML doc files for each assembly, resolving <inheritdoc/> chains, and emitting roughly 600 canonical type pages after cross-TFM merge. The local NuGet cache is warmed once during global setup so per-iteration timings measure the walk + merge + emit pipeline, not the network leg.
End-to-end (MetadataExtractor.RunAsync):
| Phase | Wall time | Allocated |
|---|---|---|
Full pipeline (RunAsync) |
~1.4 s | ~650 MB |
| Discover (NuGet config + cache scan) | ~660 ms | ~240 MB |
| Load + walk (parallel, all groups) | ~1.5 s | ~670 MB |
| Merge (cross-TFM dedup) | 2 ms | ~550 KB |
| Emit (Zensical Markdown) | 79 ms | ~63 MB |
Peak working set is bounded too: per-TFM compilation loaders dispose as soon as their last assembly finishes walking, so the memory-mapped BCL reference views are released eagerly instead of accumulating until RunAsync exits.
Per-call hotspots:
| Operation | Time | Allocated |
|---|---|---|
XmlDocToMarkdown.Convert — plain summary |
~25 ns | 176 B |
XmlDocToMarkdown.Convert — tagged with <see> / <c> / <paramref> |
~786 ns | 304 B |
XmlDocToMarkdown.Convert — code block + bullet list |
~1.0 µs | 440 B |
TfmResolver.FindBestRefsTfm — exact match |
~2 ns | 0 B |
TfmResolver.FindBestRefsTfm — platform-suffix strip |
~11 ns | 0 B |
TfmResolver.FindBestRefsTfm — netstandard fallback |
~471 ns | 1 KB |
TypeMerger.Merge — 600 types × 3 TFMs |
~115 µs | 325 KB |
Emitter cost per type page (no I/O, just markup formatting; baseline = Zensical Markdown):
| Workload (types × members/type) | Zensical Markdown | DocFx YAML | Time | Alloc |
|---|---|---|---|---|
| 100 × 5 | 78 µs / 420 KB | 305 µs / 1,410 KB | 3.9× | 3.4× |
| 100 × 30 | 288 µs / 1,334 KB | 1,618 µs / 6,184 KB | 5.5× | 4.6× |
| 600 × 5 | 459 µs / 2,522 KB | 1,823 µs / 8,461 KB | 3.9× | 3.4× |
| 600 × 30 | 1,938 µs / 8,006 KB | 10,820 µs / 37,106 KB | 5.7× | 4.6× |
| 2000 × 5 | 1,617 µs / 8,406 KB | 7,443 µs / 28,203 KB | 4.5× | 3.4× |
| 2000 × 30 | 8,528 µs / 26.7 MB | 37,166 µs / 123.7 MB | 4.4× | 4.6× |
DocFx YAML is heavier by design — every member duplicates uid / commentId / parent / name / nameWithType / fullName, and the page-level references: list adds another mapping per cross-referenced type. The emitter still hand-writes its YAML directly via StringBuilder (no YamlDotNet runtime dependency), with a single-allocation fast path for the qualified-name composites (type.Name + "." + member.Name) that round-trips identifiers as plain scalars when escape-safe.
Side-by-side against dotnet docfx metadata. Two fully isolated standalone benchmark assemblies — benchmarks/Docfx.StandaloneBenchmarks/ (calls DotnetApiCatalog.GenerateManagedReferenceYamlFiles in-process) and benchmarks/SourceDocParser.Docfx.StandaloneBenchmarks/ (drives our pipeline through DocfxYamlEmitter) — both target the same 4 NuGet packages (ReactiveUI, Splat, DynamicData, System.Reactive), measured by BenchmarkDotNet's [ShortRunJob] on the same machine:
| Pipeline | Mean | Allocated |
|---|---|---|
docfx 2.78.5 — DotnetApiCatalog.GenerateManagedReferenceYamlFiles |
1.598 s | 6.72 MB |
SourceDocParser + DocfxYamlEmitter |
2.031 s | 919.6 MB |
The two pipelines aren't strictly walking identical inputs — docfx loads a synthesised Fixture.csproj that pulls the 4 packages as transitive PackageReferences and walks one effective TFM, while our pipeline resolves every shipped lib//ref/ slice across ~19 supported TFMs from nuget-packages.json and merges across them. Working backward from that fixture difference, our per-TFM walk explains both the wall-time delta and the allocation gap (each TFM spins a fresh Roslyn compilation graph, and the cross-TFM merger holds catalogs while it dedupes UIDs). The contract pinned by the comparison is parity output (every T:, M:, P:, E: UID docfx emits, our pipeline emits too) at the per-page emit cost shown in the per-page table above.
Strategies the pipeline uses
- Custom span-based XML scanner. Every NuGet package ships an
<assembly-name>.xmldoc file alongside its.dll, holding the///doc comments for every public symbol. The walker has to read each member's XML fragment per symbol, render its<see>/<c>/<list>/<inheritdoc>tags into Markdown, and do the same again per<param>/<exception>inside it — for thousands of symbols per assembly.XmlReaderworks for that, but itsXmlTextReaderImplallocates multi-KB internal buffers (NodeData[],NamespaceManager, char buffers,Entry[]) per construction, which dominates the doc-parse profile. So the pipeline ships a smallref struct DocXmlScannerthat walks the doc text directly overReadOnlySpan<char>and implements just the XML grammar that///doc comments actually use. Both the per-symbol parser and the Markdown renderer drive the scanner, so per-element XML processing is allocation-free apart from the result string. - Build-once-then-read-many
XmlDocSource. Each.xmldoc file is read once viaFile.ReadAllBytes+Encoding.UTF8.GetString, then indexed by per-member(offset, length)ranges. The substring is only materialised when a consumer callsGet(memberId), and the source is safe for concurrent reads under the parallel walker. - Eager per-group loader disposal. Each TFM group has its own
CompilationLoaderwith a privateMetadataReferenceCacheholding memory-mapped views of every reference DLL. As soon as the last assembly in a group finishes its walk, an interlocked counter drops to zero and the loader disposes — peak working set scales with the slowest-finishing group, not the total number of groups times their references. - Streaming type merger. The parallel walk feeds
ApiCatalogs intoStreamingTypeMergerone at a time and immediately drops its reference, instead of accumulating every catalog in aConcurrentBaguntil the walk phase finishes. - Capture-free parallel dispatch. The
Parallel.ForEachAsynclambda isstatic— every dependency it touches is bundled into aWalkContextrecord attached to each work item, so dispatch never allocates a closure object per assembly. - Pooled
StringBuilderon the converter.XmlDocToMarkdownis per-walk by construction; reusing a single builder across everyConvertcall eliminates the per-element allocation that would otherwise dominate the renderer. - Pre-sized buffers. Each nupkg zip entry is sized to its known uncompressed length up front so the backing
byte[]is allocated once at the right size instead of doubling-and-copying on everyWrite. SourceLink URL rewriting fuses the base URL and the line anchor into one interpolated-string handler call so the GitHub / Bitbucket / GitLab / Azure DevOps blob URL is materialised in a singlestring.
Repository layout
SourceDocParserLib/
src/
SourceDocParser/
SourceDocParser.NuGet/
SourceDocParser.Docfx/
SourceDocParser.Zensical/
tests/
SourceDocParser.Tests/ unit tests (TUnit)
SourceDocParser.IntegrationTests/ end-to-end + Zensical render-smoke
Directory.Build.props shared lib config
Directory.Packages.props central package versions
SourceDocParserLib.slnx
Directory.Build.props
version.json Nerdbank.GitVersioning
.editorconfig
stylecop.json
dotnet build from src/ packs every non-test project into artifacts/packages/ automatically (<GeneratePackageOnBuild>true</GeneratePackageOnBuild>). Consumers in other repos can wire that directory up as a local feed via nuget.config until the libraries are published.
Acknowledgements
The metadata extraction pipeline is inspired by — and lifts patterns from — dotnet/docfx (MIT licensed). docfx's Roslyn-based assembly walker, inheritdoc resolution, and overall metadata model shaped this library's design. See LICENSE for the original docfx attribution.
Built on:
- Roslyn (Microsoft.CodeAnalysis.CSharp) for compilation + symbol model
- ICSharpCode.Decompiler for transitive reference resolution
- NuGet.Frameworks + NuGet.Versioning for proper TFM compatibility and SemVer ordering
- Polly v8 for HTTP retry/rate-limit pipelines
License
MIT — see LICENSE for the full text and the docfx attribution.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- No dependencies.
NuGet packages (2)
Showing the top 2 NuGet packages that depend on SourceDocParser.Common:
| Package | Downloads |
|---|---|
|
SourceDocParser.Zensical
Zensical / mkdocs Material emitter for SourceDocParser. Renders the parser's ApiCatalog into a flat tree of Markdown pages tuned for the Zensical theme (admonitions, content tabs, mermaid diagrams). |
|
|
SourceDocParser.Docfx
docfx compatibility for SourceDocParser. Reads and writes docfx.json shapes (metadata + build sections) so an existing docfx site can plug into the parser pipeline, and emits docfx ManagedReference YAML pages so the parser output is consumable by docfx as a drop-in replacement for its own metadata extractor. |
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 2.0.0 | 534 | 5/10/2026 |
| 1.4.2 | 774 | 5/2/2026 |
| 1.4.1 | 479 | 4/30/2026 |
| 1.3.1 | 225 | 4/28/2026 |
| 1.2.1 | 120 | 4/28/2026 |
| 1.1.1 | 121 | 4/28/2026 |
| 1.0.5 | 110 | 4/28/2026 |
| 1.0.3 | 102 | 4/28/2026 |
| 0.6.1-alpha | 104 | 4/28/2026 |
| 0.5.1-alpha | 115 | 4/28/2026 |
| 0.4.1-alpha | 102 | 4/28/2026 |
| 0.3.1-alpha | 108 | 4/27/2026 |
| 0.2.1-alpha | 109 | 4/27/2026 |