Mostlylucid.StyloExtract.Abstractions
1.7.1
See the version list below for details.
dotnet add package Mostlylucid.StyloExtract.Abstractions --version 1.7.1
NuGet\Install-Package Mostlylucid.StyloExtract.Abstractions -Version 1.7.1
<PackageReference Include="Mostlylucid.StyloExtract.Abstractions" Version="1.7.1" />
<PackageVersion Include="Mostlylucid.StyloExtract.Abstractions" Version="1.7.1" />
<PackageReference Include="Mostlylucid.StyloExtract.Abstractions" />
paket add Mostlylucid.StyloExtract.Abstractions --version 1.7.1
#r "nuget: Mostlylucid.StyloExtract.Abstractions, 1.7.1"
#:package Mostlylucid.StyloExtract.Abstractions@1.7.1
#addin nuget:?package=Mostlylucid.StyloExtract.Abstractions&version=1.7.1
#tool nuget:?package=Mostlylucid.StyloExtract.Abstractions&version=1.7.1
Mostlylucid.StyloExtract.Abstractions
Core interfaces, records, and signal catalog for StyloExtract. Zero runtime dependencies beyond the mostlylucid.ephemeral signal sink.
What this package is
Abstractions defines the entire public contract of StyloExtract:
ILayoutExtractor- the main extraction entry pointIHtmlDomParser,IDomCleaner,IBlockSegmenter,IBlockClassifier- parse/clean/segment/classify pipeline interfacesIStructuralFingerprinter,IMarkdownRenderer,IExtractorInducer,IExtractorApplicator- fingerprint and rendering interfacesITemplateIndex- template store interfaceIRenderedHtmlFetcher- Playwright abstractionITemplateVersionEventSink- version change event consumer interfaceExtractionResult,ExtractionOptions,LayoutMatch,MatchStatus,ExtractionStats- result recordsStyloExtractSignals- string constants for all 11 extraction signalsStyloExtractSignal- typed signal payload forTypedSignalSink<StyloExtractSignal>
When to depend on this directly
Take a direct dependency on Abstractions when you are:
- Writing a custom implementation of any StyloExtract interface
- Building a consumer that reads
ExtractionResultrecords but does not perform extraction - Subscribing to
TypedSignalSink<StyloExtractSignal>signals in a StyloFlow pipeline - Writing tests that mock
ILayoutExtractororITemplateIndex
Normal application code should depend on Mostlylucid.StyloExtract.AspNetCore instead, which pulls this package transitively.
Key types
// The extraction result
var result = await extractor.ExtractAsync(html, sourceUri);
result.Markdown // extracted content
result.Match.Status // FastPathHit | SlowPathMatch | Novel | Refit
result.Match.TemplateId
result.Match.TemplateVersion
// Signal catalog
StyloExtractSignals.MatchFastPathHit // "stylo.extract.match.fastpath.hit"
StyloExtractSignals.TemplateRefit // "stylo.extract.template.refit"
StyloExtractSignals.VersionDetected // "stylo.extract.version.detected"
// ... 11 signals total
AOT
This package is IsAotCompatible=true. No reflection, no dynamic codegen.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- AngleSharp (>= 1.3.0)
- Mostlylucid.Ephemeral (>= 2.6.4)
NuGet packages (11)
Showing the top 5 NuGet packages that depend on Mostlylucid.StyloExtract.Abstractions:
| Package | Downloads |
|---|---|
|
Mostlylucid.StyloExtract.Heuristics
Block classifier and extractor inducer for StyloExtract novel-template path. Recogniser data (footer phrases, cookie banner phrases, nav class hints, ad hints) lives in embedded JSON resources; combinator code lives in C#. Source-gen JSON deserialisation keeps it AOT-compatible. |
|
|
Mostlylucid.StyloExtract.Markdown
Profile-aware deterministic Markdown renderer for StyloExtract block maps. Four profiles (MainContentOnly, RagFull, AgentNavigation, DebugFull) control which roles emit. The model never generates Markdown; only deterministic rules do. |
|
|
Mostlylucid.StyloExtract.Html
AngleSharp-backed DOM parser and cleaner for the StyloExtract pipeline. Strips script, style, template, noscript, and svg nodes; normalises whitespace; preserves semantic tags and ARIA roles. |
|
|
Mostlylucid.StyloExtract.Core
Layout-fingerprint matching with template-keyed extractor reuse. The ILayoutExtractor.ExtractAsync entry point: parse, clean, fingerprint, fast-path LSH match, slow-path pq-gram cosine, novel template induction, refit-as-version-event. Sub-millisecond match step; AOT-compatible. |
|
|
Mostlylucid.StyloExtract.Templates
SQLite-backed template index for StyloExtract. Single-writer coordination via mostlylucid.ephemeral; learned-extractor centroids that drift and refit with EWMA accumulation; refit-as-version-event for site-template-version monitoring. JSON export and import for portable template bundles. |
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 1.8.0-alpha.18 | 0 | 6/26/2026 |
| 1.8.0-alpha.17 | 0 | 6/26/2026 |
| 1.8.0-alpha.16 | 0 | 6/26/2026 |
| 1.8.0-alpha.15 | 0 | 6/26/2026 |
| 1.8.0-alpha.14 | 0 | 6/26/2026 |
| 1.8.0-alpha.13 | 0 | 6/26/2026 |
| 1.8.0-alpha.12 | 0 | 6/26/2026 |
| 1.8.0-alpha.11 | 0 | 6/26/2026 |
| 1.8.0-alpha.10 | 0 | 6/26/2026 |
| 1.8.0-alpha.9 | 47 | 6/25/2026 |
| 1.8.0-alpha.8 | 50 | 6/25/2026 |
| 1.8.0-alpha.4 | 55 | 6/25/2026 |
| 1.8.0-alpha.3 | 63 | 6/25/2026 |
| 1.8.0-alpha.2 | 55 | 6/25/2026 |
| 1.8.0-alpha.1 | 61 | 6/24/2026 |
| 1.7.1 | 146 | 6/23/2026 |
| 1.7.0 | 116 | 6/23/2026 |
| 1.6.2 | 124 | 6/23/2026 |
| 1.6.1 | 290 | 6/22/2026 |
| 1.6.0 | 179 | 6/22/2026 |
StyloExtract 1.7.1 - 2026-06-23
================================
Patch release. One bug fix to DomMarkdownWalker so heavily-indented
source HTML (typical of Tailwind / HTMX / framework-generated markup)
stops producing markdown that CommonMark parses as indented code blocks.
Bug
---
* DomMarkdownWalker.AppendEscapedInline preserved leading whitespace at
line-start, so consecutive text-node visits each emitted a single
space and accumulated to 4+ spaces ahead of links and paragraphs.
CommonMark then parsed those lines as indented code blocks and the
resulting markdown rendered as raw `[text](href)` text instead of
clickable links. Now skipped at line-start; inner-paragraph whitespace
still collapses to single spaces as before.
Real-world repro: lucidVIEW loading mostlylucid.net (HTMX-driven blog
index). Before 1.7.1 every blog-post card after the first collapsed into
a code block; after 1.7.1 each card is a styled link with its summary
as its own paragraph beneath.
----
StyloExtract 1.7.0 - 2026-06-23
================================
Structured markdown output. Previously every classified block flattened
to element.TextContent.Trim() and the renderer emitted a wall of plain
paragraphs with "# " collapsing all six heading levels. This release
makes ExtractedBlock.Markdown carry a real GFM rendition produced by
walking the block's DOM subtree.
Highlights
----------
* Heading levels H1-H6 emit one-through-six "#" characters.
* Inline content preserved: links, **bold**, *italic*, `code`, images,
hard breaks.
* Lists, fenced code blocks (with language hint), blockquotes (single
and multi-paragraph following GFM convention), and figures all render
with their structure intact.
* GFM tables built from a WHATWG slot grid: colspan/rowspan respected,
caption rendered above as bold paragraph, alignment markers derived
from align attribute or style="text-align" via majority-vote, pipes
escaped, newlines converted to <br>. Complex tables (multi-row thead,
nested tables, block content in a cell) fall back to raw HTML which
CommonMark passes through.
* Sidebar and RelatedLinks now use the DOM walker. The classic "on this
page" TOC pattern renders as a proper markdown list with anchor links
instead of flattening to indented text.
Performance
-----------
Walker on Apple M5 / .NET 10, full pipeline numbers in parentheses:
Small article: 1.3 us / 8 KB (full pipeline: 370 us / 925 KB)
Medium doc : 25.2 us / 72 KB (full pipeline: 491 us / 823 KB)
Large doc : 34.1 us / 114 KB (full pipeline: 642 us / 843 KB)
Table-heavy : 69.2 us / 165 KB (full pipeline: 641 us / 688 KB)
Walker share of ExtractAsync total time fell from 25-55% to 5-11% across
the four scenarios. ExtractAsync continues to sit well under the spec's
15ms p99 budget on a cache hit.
Compatibility
-------------
Backwards-compatible. ExtractedBlock.Text continues to project the
flattened plain-text view unchanged; the new markdown rendition is read
via ExtractedBlock.Markdown. Existing extraction profiles behave
identically; the only observable change is that the markdown emitted by
TypedMarkdownRenderer is now reader-grade rather than flat prose.
Tests
-----
329 tests across 7 projects, all green. 51 unit tests on the new walker
cover inline composition, list and code rendering, and the full GFM
table reconstruction path including the complexity-detection fallback
to raw HTML. Four end-to-end pipeline tests exercise the spec's headline
gaps (heading levels, inline links, lists, GFM tables) through
parse -> clean -> segment -> classify -> render -> SQLite.
See CHANGELOG.md for the full record.