Mostlylucid.StyloExtract.Abstractions 1.7.1

There is a newer prerelease version of this package available.
See the version list below for details.
dotnet add package Mostlylucid.StyloExtract.Abstractions --version 1.7.1
                    
NuGet\Install-Package Mostlylucid.StyloExtract.Abstractions -Version 1.7.1
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Mostlylucid.StyloExtract.Abstractions" Version="1.7.1" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Mostlylucid.StyloExtract.Abstractions" Version="1.7.1" />
                    
Directory.Packages.props
<PackageReference Include="Mostlylucid.StyloExtract.Abstractions" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Mostlylucid.StyloExtract.Abstractions --version 1.7.1
                    
#r "nuget: Mostlylucid.StyloExtract.Abstractions, 1.7.1"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package Mostlylucid.StyloExtract.Abstractions@1.7.1
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=Mostlylucid.StyloExtract.Abstractions&version=1.7.1
                    
Install as a Cake Addin
#tool nuget:?package=Mostlylucid.StyloExtract.Abstractions&version=1.7.1
                    
Install as a Cake Tool

Mostlylucid.StyloExtract.Abstractions

Core interfaces, records, and signal catalog for StyloExtract. Zero runtime dependencies beyond the mostlylucid.ephemeral signal sink.

What this package is

Abstractions defines the entire public contract of StyloExtract:

  • ILayoutExtractor - the main extraction entry point
  • IHtmlDomParser, IDomCleaner, IBlockSegmenter, IBlockClassifier - parse/clean/segment/classify pipeline interfaces
  • IStructuralFingerprinter, IMarkdownRenderer, IExtractorInducer, IExtractorApplicator - fingerprint and rendering interfaces
  • ITemplateIndex - template store interface
  • IRenderedHtmlFetcher - Playwright abstraction
  • ITemplateVersionEventSink - version change event consumer interface
  • ExtractionResult, ExtractionOptions, LayoutMatch, MatchStatus, ExtractionStats - result records
  • StyloExtractSignals - string constants for all 11 extraction signals
  • StyloExtractSignal - typed signal payload for TypedSignalSink<StyloExtractSignal>

When to depend on this directly

Take a direct dependency on Abstractions when you are:

  • Writing a custom implementation of any StyloExtract interface
  • Building a consumer that reads ExtractionResult records but does not perform extraction
  • Subscribing to TypedSignalSink<StyloExtractSignal> signals in a StyloFlow pipeline
  • Writing tests that mock ILayoutExtractor or ITemplateIndex

Normal application code should depend on Mostlylucid.StyloExtract.AspNetCore instead, which pulls this package transitively.

Key types

// The extraction result
var result = await extractor.ExtractAsync(html, sourceUri);
result.Markdown        // extracted content
result.Match.Status    // FastPathHit | SlowPathMatch | Novel | Refit
result.Match.TemplateId
result.Match.TemplateVersion

// Signal catalog
StyloExtractSignals.MatchFastPathHit   // "stylo.extract.match.fastpath.hit"
StyloExtractSignals.TemplateRefit      // "stylo.extract.template.refit"
StyloExtractSignals.VersionDetected    // "stylo.extract.version.detected"
// ... 11 signals total

AOT

This package is IsAotCompatible=true. No reflection, no dynamic codegen.


Full documentation and package family

Product Compatible and additional computed target framework versions.
.NET net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (11)

Showing the top 5 NuGet packages that depend on Mostlylucid.StyloExtract.Abstractions:

Package Downloads
Mostlylucid.StyloExtract.Heuristics

Block classifier and extractor inducer for StyloExtract novel-template path. Recogniser data (footer phrases, cookie banner phrases, nav class hints, ad hints) lives in embedded JSON resources; combinator code lives in C#. Source-gen JSON deserialisation keeps it AOT-compatible.

Mostlylucid.StyloExtract.Markdown

Profile-aware deterministic Markdown renderer for StyloExtract block maps. Four profiles (MainContentOnly, RagFull, AgentNavigation, DebugFull) control which roles emit. The model never generates Markdown; only deterministic rules do.

Mostlylucid.StyloExtract.Html

AngleSharp-backed DOM parser and cleaner for the StyloExtract pipeline. Strips script, style, template, noscript, and svg nodes; normalises whitespace; preserves semantic tags and ARIA roles.

Mostlylucid.StyloExtract.Core

Layout-fingerprint matching with template-keyed extractor reuse. The ILayoutExtractor.ExtractAsync entry point: parse, clean, fingerprint, fast-path LSH match, slow-path pq-gram cosine, novel template induction, refit-as-version-event. Sub-millisecond match step; AOT-compatible.

Mostlylucid.StyloExtract.Templates

SQLite-backed template index for StyloExtract. Single-writer coordination via mostlylucid.ephemeral; learned-extractor centroids that drift and refit with EWMA accumulation; refit-as-version-event for site-template-version monitoring. JSON export and import for portable template bundles.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.8.0-alpha.18 0 6/26/2026
1.8.0-alpha.17 0 6/26/2026
1.8.0-alpha.16 0 6/26/2026
1.8.0-alpha.15 0 6/26/2026
1.8.0-alpha.14 0 6/26/2026
1.8.0-alpha.13 0 6/26/2026
1.8.0-alpha.12 0 6/26/2026
1.8.0-alpha.11 0 6/26/2026
1.8.0-alpha.10 0 6/26/2026
1.8.0-alpha.9 47 6/25/2026
1.8.0-alpha.8 50 6/25/2026
1.8.0-alpha.4 55 6/25/2026
1.8.0-alpha.3 63 6/25/2026
1.8.0-alpha.2 55 6/25/2026
1.8.0-alpha.1 61 6/24/2026
1.7.1 146 6/23/2026
1.7.0 116 6/23/2026
1.6.2 124 6/23/2026
1.6.1 290 6/22/2026
1.6.0 179 6/22/2026
Loading failed

StyloExtract 1.7.1 - 2026-06-23
================================

Patch release. One bug fix to DomMarkdownWalker so heavily-indented
source HTML (typical of Tailwind / HTMX / framework-generated markup)
stops producing markdown that CommonMark parses as indented code blocks.

Bug
---

* DomMarkdownWalker.AppendEscapedInline preserved leading whitespace at
 line-start, so consecutive text-node visits each emitted a single
 space and accumulated to 4+ spaces ahead of links and paragraphs.
 CommonMark then parsed those lines as indented code blocks and the
 resulting markdown rendered as raw `[text](href)` text instead of
 clickable links. Now skipped at line-start; inner-paragraph whitespace
 still collapses to single spaces as before.

Real-world repro: lucidVIEW loading mostlylucid.net (HTMX-driven blog
index). Before 1.7.1 every blog-post card after the first collapsed into
a code block; after 1.7.1 each card is a styled link with its summary
as its own paragraph beneath.

----

StyloExtract 1.7.0 - 2026-06-23
================================

Structured markdown output. Previously every classified block flattened
to element.TextContent.Trim() and the renderer emitted a wall of plain
paragraphs with "# " collapsing all six heading levels. This release
makes ExtractedBlock.Markdown carry a real GFM rendition produced by
walking the block's DOM subtree.

Highlights
----------

* Heading levels H1-H6 emit one-through-six "#" characters.
* Inline content preserved: links, **bold**, *italic*, `code`, images,
 hard breaks.
* Lists, fenced code blocks (with language hint), blockquotes (single
 and multi-paragraph following GFM convention), and figures all render
 with their structure intact.
* GFM tables built from a WHATWG slot grid: colspan/rowspan respected,
 caption rendered above as bold paragraph, alignment markers derived
 from align attribute or style="text-align" via majority-vote, pipes
 escaped, newlines converted to <br>. Complex tables (multi-row thead,
 nested tables, block content in a cell) fall back to raw HTML which
 CommonMark passes through.
* Sidebar and RelatedLinks now use the DOM walker. The classic "on this
 page" TOC pattern renders as a proper markdown list with anchor links
 instead of flattening to indented text.

Performance
-----------

Walker on Apple M5 / .NET 10, full pipeline numbers in parentheses:

 Small article: 1.3 us / 8 KB    (full pipeline:  370 us /  925 KB)
 Medium doc  : 25.2 us / 72 KB   (full pipeline:  491 us /  823 KB)
 Large doc   : 34.1 us / 114 KB  (full pipeline:  642 us /  843 KB)
 Table-heavy : 69.2 us / 165 KB  (full pipeline:  641 us /  688 KB)

Walker share of ExtractAsync total time fell from 25-55% to 5-11% across
the four scenarios. ExtractAsync continues to sit well under the spec's
15ms p99 budget on a cache hit.

Compatibility
-------------

Backwards-compatible. ExtractedBlock.Text continues to project the
flattened plain-text view unchanged; the new markdown rendition is read
via ExtractedBlock.Markdown. Existing extraction profiles behave
identically; the only observable change is that the markdown emitted by
TypedMarkdownRenderer is now reader-grade rather than flat prose.

Tests
-----

329 tests across 7 projects, all green. 51 unit tests on the new walker
cover inline composition, list and code rendering, and the full GFM
table reconstruction path including the complexity-detection fallback
to raw HTML. Four end-to-end pipeline tests exercise the spec's headline
gaps (heading levels, inline links, lists, GFM tables) through
parse -> clean -> segment -> classify -> render -> SQLite.

See CHANGELOG.md for the full record.