Nick.HtmlParser 1.0.10

.NET 10.0

dotnet add package Nick.HtmlParser --version 1.0.10

NuGet\Install-Package Nick.HtmlParser -Version 1.0.10

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="Nick.HtmlParser" Version="1.0.10" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="Nick.HtmlParser" Version="1.0.10" />
                    

                            Directory.Packages.props

<PackageReference Include="Nick.HtmlParser" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add Nick.HtmlParser --version 1.0.10

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: Nick.HtmlParser, 1.0.10"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package Nick.HtmlParser@1.0.10

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=Nick.HtmlParser&version=1.0.10
                    

                            Install as a Cake Addin

#tool nuget:?package=Nick.HtmlParser&version=1.0.10
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Nick.HtmlParser

A lightweight, dependency-free HTML parser for .NET that converts HTML into a flat, typed IEnumerable<INode> structure. Ideal for scenarios where you need to quickly search, filter, or traverse parsed HTML without the overhead of a full DOM tree.

Features

Flat structure – Parses HTML into a flat list of INode objects, making it easy to query with LINQ.
Parent & child references – Each node exposes Parent and Children properties for tree-style navigation when needed.
Attribute parsing – Extracts tag attributes into a Dictionary<string, string>.
Typed nodes – Every node has a NodeType enum value for fast type checking (e.g., NodeType.div, NodeType.a, NodeType.img).
Void / self-closing tag support – Correctly handles <br>, <img>, <input>, <meta>, and all other HTML void elements.
Script & style skipping – Automatically skips <script> and <style> tag contents (including nested script tags) to avoid parse errors from < characters in code.
Comment & DOCTYPE handling – Comments (``), <!DOCTYPE>, and XML processing instructions (<?xml ?>) are skipped during parsing.
Malformed HTML recovery – Gracefully recovers from common issues such as missing closing tags, rogue closing tags without openers, and malformed attribute quotes.
Optional raw content loading – Pass loadContent: true to retain the original HTML text of each node.
Zero dependencies – Only relies on the .NET base class library.
Targets .NET 10

Installation

Install via NuGet:

dotnet add package Nick.HtmlParser

Or via the NuGet Package Manager:

Install-Package Nick.HtmlParser

Quick Start

using HtmlParser;

var html = "<html><body><div class=\"container\"><p>Hello World</p></div></body></html>";

// Parse without loading raw content
IReadOnlyList<INode> nodes = Parser.Parse(html);

// Parse with raw content loaded into each node
IReadOnlyList<INode> nodesWithContent = Parser.Parse(html, loadContent: true);

API Reference

`Parser.Parse(string html, bool loadContent = false)`

Parses an HTML string and returns an IReadOnlyList<INode>.

Parameter	Type	Default	Description
`html`	`string`	—	The HTML string to parse.
`loadContent`	`bool`	`false`	When `true`, populates each node's `Content` property with the raw HTML text. Uses more memory.

`INode` Interface

Property	Type	Description
`Name`	`string`	The tag name (e.g., `"div"`, `"a"`, `"img"`).
`Type`	`NodeType`	The parsed enum type of the tag. Returns `NodeType.unknown` for non-standard tags.
`Content`	`string?`	The raw HTML of the node. Only populated when `loadContent` is `true`.
`Attributes`	`Dictionary<string, string>`	The tag's attributes as key-value pairs.
`OpenPosition`	`int`	Character position of the opening `<` in the source HTML.
`ClosedPosition`	`int`	Character position of the closing `>` in the source HTML. `-1` if unclosed.
`Depth`	`int`	The nesting depth of the node (0-based).
`Parent`	`INode?`	Reference to the parent node, or `null` for top-level nodes.
`Children`	`IReadOnlyCollection<INode>?`	Direct child nodes, or `null` if the node has no children.

`NodeType` Enum

Contains values for all standard HTML tags (div, p, a, img, span, table, etc.) plus unknown for unrecognized tags.

Usage Examples

Find all links

var links = nodes.Where(n => n.Type == NodeType.a);
foreach (var link in links)
{
    if (link.Attributes.TryGetValue("href", out var href))
        Console.WriteLine(href);
}

Find nodes by depth

// Get all top-level nodes
var topLevel = nodes.Where(n => n.Depth == 0);

Navigate parent/child relationships

var divs = nodes.Where(n => n.Type == NodeType.div);
foreach (var div in divs)
{
    Console.WriteLine($"Div at depth {div.Depth} has {div.Children?.Count ?? 0} children");
    if (div.Parent != null)
        Console.WriteLine($"  Parent: {div.Parent.Name}");
}

Get raw HTML content

var nodesWithContent = Parser.Parse(html, loadContent: true);
var firstDiv = nodesWithContent.First(n => n.Type == NodeType.div);
Console.WriteLine(firstDiv.Content); // e.g. <div class="container"><p>Hello World</p></div>

Find elements by attribute

var elementsWithClass = nodes.Where(n => n.Attributes.ContainsKey("class"));
var specificClass = nodes.Where(n =>
    n.Attributes.TryGetValue("class", out var cls) && cls.Contains("container"));

Supported Void (Self-Closing) Tags

The following tags are treated as self-closing and will not look for a closing tag:

area, base, br, col, command, embed, hr, img, input, keygen, link, meta, param, source, track, wbr

Limitations

Script & style content is skipped – The parser does not produce nodes for content inside <script> or <style> tags, though the tags themselves are captured.
No CSS selector support – Use LINQ to query the flat node list instead.
No modification / serialization – This is a read-only parser; it does not support modifying or serializing HTML back to a string.
Error reporting – Parsing errors (e.g., duplicate attributes, unclosed tags) are silently handled rather than reported. Error reporting is planned for a future release.

Building & Testing

# Build the library
dotnet build Nick.HtmlParser/Nick.HtmlParser.csproj

# Run tests
dotnet test Test/Test.csproj

License

See LICENSE for details.

Changelog

v1.0.10

Bugfix: Improved bounds checking to prevent crashes on malformed HTML (unterminated comments, unclosed tags, lone chevrons).
Bugfix: Fixed handling of unterminated DOCTYPE and XML processing instructions.
Upgrade to .NET 10.
Increased test coverage for edge cases.

v1.0.9

Skipped, so version 10 matches .net 10

v1.0.8

Bugfix: Handle attributes that contain tags, including malformed attributes.
Upgrade to .NET 8.0.

v1.0.7

Reference parent and child nodes from INode.

v1.0.6

Skip <script> and <style> tag contents.
Bugfix: Handle nested script tags.

v1.0.4

Cater for non-standard HTML tags.
Ignore XML processing instructions.
Improve parsing recovery for malformed HTML documents.
Load raw node text into parsed INode objects (opt-in via loadContent).

Product	Compatible and additional computed target framework versions.
.NET	net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net10.0
- No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
1.0.10	364	2/8/2026
1.0.8	387	1/16/2024
1.0.7	302	5/2/2023
1.0.6	487	1/2/2022
1.0.4	443	1/1/2022
1.0.2	444	1/1/2022
1.0.0	446	12/31/2021

- Bugfix: Improved bounds checking to prevent crashes on malformed HTML (unterminated comments, unclosed tags, lone chevrons).
- Bugfix: Fixed handling of unterminated DOCTYPE and XML processing instructions.
- Upgrade to .net10.0.
- Increased test coverage for edge cases.