Mostlylucid.StyloExtract.Html 1.7.1

There is a newer prerelease version of this package available.
See the version list below for details.

dotnet add package Mostlylucid.StyloExtract.Html --version 1.7.1

NuGet\Install-Package Mostlylucid.StyloExtract.Html -Version 1.7.1

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="Mostlylucid.StyloExtract.Html" Version="1.7.1" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="Mostlylucid.StyloExtract.Html" Version="1.7.1" />
                    

                            Directory.Packages.props

<PackageReference Include="Mostlylucid.StyloExtract.Html" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add Mostlylucid.StyloExtract.Html --version 1.7.1

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: Mostlylucid.StyloExtract.Html, 1.7.1"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package Mostlylucid.StyloExtract.Html@1.7.1

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=Mostlylucid.StyloExtract.Html&version=1.7.1
                    

                            Install as a Cake Addin

#tool nuget:?package=Mostlylucid.StyloExtract.Html&version=1.7.1
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Mostlylucid.StyloExtract.Html

AngleSharp-backed DOM parser and cleaner for StyloExtract.

What this package is

Provides two implementations against StyloExtract.Abstractions interfaces:

AngleSharpHtmlDomParser (implements IHtmlDomParser) - parses raw HTML into an AngleSharp IDocument, normalises encoding, handles fragment inputs
DomCleaner (implements IDomCleaner) - strips scripts, style, noscript, SVG, and other non-content nodes; normalises whitespace; collapses empty containers

The cleaner is a prerequisite for fingerprinting: noise in the DOM inflates shingle variance and produces false-negative matches. The default cleaning rules follow the same boilerplate-removal logic used by block classifiers.

When to depend on this directly

Most consumers get this package transitively via Mostlylucid.StyloExtract.AspNetCore or Mostlylucid.StyloExtract.Core. Take a direct dependency only if you need AngleSharpHtmlDomParser or DomCleaner outside of the full extraction pipeline, for example in a standalone DOM analysis tool.

Usage

// Registration (handled automatically by AddStyloExtract)
services.AddSingleton<IHtmlDomParser, AngleSharpHtmlDomParser>();
services.AddSingleton<IDomCleaner, DomCleaner>();

// Direct usage (testing / standalone)
var parser = new AngleSharpHtmlDomParser();
var doc = parser.Parse(rawHtml);

var cleaner = new DomCleaner();
var cleaned = cleaner.Clean(doc);

AOT

This package is IsAotCompatible=true. AngleSharp itself is AOT-safe on .NET 10.

Full documentation and package family

Product	Compatible and additional computed target framework versions.
.NET	net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net10.0
- AngleSharp (>= 1.3.0)
- Mostlylucid.StyloExtract.Abstractions (>= 1.7.1)

NuGet packages (2)

Showing the top 2 NuGet packages that depend on Mostlylucid.StyloExtract.Html:

Package	Downloads
Mostlylucid.StyloExtract.AspNetCore AddStyloExtract() DI extensions for ASP.NET Core. The response-policy framework (IResponsePolicy) is the canonical response-transformation primitive: Markdown content negotiation and cache-hint emission are the first two built-in instances. Brings in the full StyloExtract stack wired through Microsoft.Extensions.DependencyInjection. Opt-in middleware, per-action attributes, and Minimal API extensions transparently convert HTML responses to Markdown when clients send Accept: text/markdown. Browser-friendly query-string Accept override and opt-in IDistributedCache support included.	1.2K
Mostlylucid.StyloExtract.Core Layout-fingerprint matching with template-keyed extractor reuse. The ILayoutExtractor.ExtractAsync entry point: parse, clean, fingerprint, fast-path LSH match, slow-path pq-gram cosine, novel template induction, refit-as-version-event. Sub-millisecond match step; AOT-compatible.	852

Package

Downloads

Mostlylucid.StyloExtract.AspNetCore

AddStyloExtract() DI extensions for ASP.NET Core. The response-policy framework (IResponsePolicy) is the canonical response-transformation primitive: Markdown content negotiation and cache-hint emission are the first two built-in instances. Brings in the full StyloExtract stack wired through Microsoft.Extensions.DependencyInjection. Opt-in middleware, per-action attributes, and Minimal API extensions transparently convert HTML responses to Markdown when clients send Accept: text/markdown. Browser-friendly query-string Accept override and opt-in IDistributedCache support included.

1.2K

Mostlylucid.StyloExtract.Core

Layout-fingerprint matching with template-keyed extractor reuse. The ILayoutExtractor.ExtractAsync entry point: parse, clean, fingerprint, fast-path LSH match, slow-path pq-gram cosine, novel template induction, refit-as-version-event. Sub-millisecond match step; AOT-compatible.

852

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
1.8.0-alpha.8	0	6/25/2026
1.8.0-alpha.4	0	6/25/2026
1.8.0-alpha.3	0	6/25/2026
1.8.0-alpha.2	0	6/25/2026
1.8.0-alpha.1	6	6/24/2026
1.7.1	84	6/23/2026
1.7.0	57	6/23/2026
1.6.2	57	6/23/2026
1.6.1	193	6/22/2026
1.6.0	103	6/22/2026
1.5.2	106	6/22/2026
1.4.0	109	6/21/2026
1.3.0	101	6/21/2026
1.2.0	106	6/21/2026
1.1.0	105	6/21/2026
1.0.1	102	6/21/2026
1.0.0	109	6/21/2026

StyloExtract 1.7.1 - 2026-06-23
================================

Patch release. One bug fix to DomMarkdownWalker so heavily-indented
source HTML (typical of Tailwind / HTMX / framework-generated markup)
stops producing markdown that CommonMark parses as indented code blocks.

Bug
---

* DomMarkdownWalker.AppendEscapedInline preserved leading whitespace at
line-start, so consecutive text-node visits each emitted a single
space and accumulated to 4+ spaces ahead of links and paragraphs.
CommonMark then parsed those lines as indented code blocks and the
resulting markdown rendered as raw `[text](href)` text instead of
clickable links. Now skipped at line-start; inner-paragraph whitespace
still collapses to single spaces as before.

Real-world repro: lucidVIEW loading mostlylucid.net (HTMX-driven blog
index). Before 1.7.1 every blog-post card after the first collapsed into
a code block; after 1.7.1 each card is a styled link with its summary
as its own paragraph beneath.

----

StyloExtract 1.7.0 - 2026-06-23
================================

Structured markdown output. Previously every classified block flattened
to element.TextContent.Trim() and the renderer emitted a wall of plain
paragraphs with "# " collapsing all six heading levels. This release
makes ExtractedBlock.Markdown carry a real GFM rendition produced by
walking the block's DOM subtree.

Highlights
----------

* Heading levels H1-H6 emit one-through-six "#" characters.
* Inline content preserved: links, **bold**, *italic*, `code`, images,
hard breaks.
* Lists, fenced code blocks (with language hint), blockquotes (single
and multi-paragraph following GFM convention), and figures all render
with their structure intact.
* GFM tables built from a WHATWG slot grid: colspan/rowspan respected,
caption rendered above as bold paragraph, alignment markers derived
from align attribute or style="text-align" via majority-vote, pipes
escaped, newlines converted to <br>. Complex tables (multi-row thead,
nested tables, block content in a cell) fall back to raw HTML which
CommonMark passes through.
* Sidebar and RelatedLinks now use the DOM walker. The classic "on this
page" TOC pattern renders as a proper markdown list with anchor links
instead of flattening to indented text.

Performance
-----------

Walker on Apple M5 / .NET 10, full pipeline numbers in parentheses:

Small article: 1.3 us / 8 KB    (full pipeline: 370 us / 925 KB)
Medium doc : 25.2 us / 72 KB   (full pipeline: 491 us / 823 KB)
Large doc   : 34.1 us / 114 KB (full pipeline: 642 us / 843 KB)
Table-heavy : 69.2 us / 165 KB (full pipeline: 641 us / 688 KB)

Walker share of ExtractAsync total time fell from 25-55% to 5-11% across
the four scenarios. ExtractAsync continues to sit well under the spec's
15ms p99 budget on a cache hit.

Compatibility
-------------

Backwards-compatible. ExtractedBlock.Text continues to project the
flattened plain-text view unchanged; the new markdown rendition is read
via ExtractedBlock.Markdown. Existing extraction profiles behave
identically; the only observable change is that the markdown emitted by
TypedMarkdownRenderer is now reader-grade rather than flat prose.

Tests
-----

329 tests across 7 projects, all green. 51 unit tests on the new walker
cover inline composition, list and code rendering, and the full GFM
table reconstruction path including the complexity-detection fallback
to raw HTML. Four end-to-end pipeline tests exercise the spec's headline
gaps (heading levels, inline links, lists, GFM tables) through
parse -> clean -> segment -> classify -> render -> SQLite.

See CHANGELOG.md for the full record.