HtmlTag 1.0.0-beta-1.0.0
dotnet add package HtmlTag --version 1.0.0-beta-1.0.0
NuGet\Install-Package HtmlTag -Version 1.0.0-beta-1.0.0
<PackageReference Include="HtmlTag" Version="1.0.0-beta-1.0.0" />
<PackageVersion Include="HtmlTag" Version="1.0.0-beta-1.0.0" />
<PackageReference Include="HtmlTag" />
paket add HtmlTag --version 1.0.0-beta-1.0.0
#r "nuget: HtmlTag, 1.0.0-beta-1.0.0"
#:package HtmlTag@1.0.0-beta-1.0.0
#addin nuget:?package=HtmlTag&version=1.0.0-beta-1.0.0&prerelease
#tool nuget:?package=HtmlTag&version=1.0.0-beta-1.0.0&prerelease
HtmlTag
A high-performance, single-pass, zero-allocation HTML opening tag parser for .NET 8+
Overview
HtmlTag is a lightweight, high-performance HTML tag parser optimized for parsing opening HTML tags with minimal memory allocations. It's designed for scenarios where you're already processing HTML and need to quickly extract tag names, attributes, and positional information from opening tags, while avoiding the headache of crazy HTML issues found in the wild.
Key Features
- ⚡ High Performance - Single-pass parsing with
ReadOnlySpan<char>for zero-allocation operations - 🎯 Precise Parsing - Extracts tag name, attributes, and positional information
- 🔧 Robust - Handles malformed and "ugly" HTML that browsers accept
- 📦 Minimal Allocations - Dictionary only allocated if attributes exist
- 🔄 Reusable - Same instance can parse multiple tags
- 🎨 Flexible API - Accepts both
stringandReadOnlySpan<char>inputs
Supported HTML Patterns
HtmlTag handles a wide variety of HTML tag formats:
- ✅ Self-closing tags:
<img />,<br/> - ✅ Tags with quoted attributes:
<div class="red"> - ✅ Tags with unquoted attributes:
<input type=text> - ✅ Boolean attributes:
<input disabled> - ✅ Mixed quote styles:
<div class='a' title="b"> - ✅ Malformed but browser-accepted HTML:
<div class = " red " foo =bar> - ✅ Complex attribute values: URLs, special characters, etc.
Installation
dotnet add package HtmlTag
Or add to your .csproj:
<PackageReference Include="HtmlTag" Version="1.0.0" />
Quick Start
Following examples assume:
using static System.Console;
using static Xunit.Assert;
Basic Usage
using DotNetXtensions.Html;
// Parse a simple tag
var tag = new HtmlTag();
bool parsed = tag.Parse("<div class='container'>");
True(parsed);
Equal("div", tag.TagName);
Equal("container", tag.Attributes["class"]);
Parsing with Attributes
var tag = new HtmlTag();
bool parsed = tag.Parse("<img src='photo.jpg' alt='My Photo' width=800 />");
True(parsed);
Equal("img", tag.TagName);
True(tag.IsSelfClosed);
Equal(3, tag.Attributes.Count);
Equal("photo.jpg", tag.Attributes["src"]);
Equal("My Photo", tag.Attributes["alt"]);
Equal("800", tag.Attributes["width"]);
High-Performance Span API
For zero-allocation parsing when you already have a ReadOnlySpan<char>:
ReadOnlySpan<char> html = "<div class='test'>".AsSpan();
var tag = new HtmlTag();
bool parsed = tag.Parse(html, startIndex: 0);
True(parsed);
Equal("div", tag.TagName);
Equal("test", tag.Attributes["class"]);
Parsing from Within a Larger HTML Document
string html = "<html><body><div id='main'>Content</div></body></html>";
var tag = new HtmlTag();
// Find and parse the div tag starting at position 12
bool parsed = tag.Parse(html, startIndex: 12);
True(parsed);
Equal("div", tag.TagName);
Equal(12, tag.TagStartIndex);
Equal(15, tag.TagLength);
Equal("main", tag.Attributes["id"]);
API Reference
Parse Methods
bool Parse(string inputHtml, int startIndex = 0, bool findTagEnd = true)
Parses an HTML opening tag from a string.
- inputHtml: The HTML string containing the tag
- startIndex: Zero-based index where the tag starts (at
<) - findTagEnd: If true, searches for the closing
>; if false, assumes input ends at> - Returns:
trueif parsing succeeded,falseif malformed
bool Parse(ReadOnlySpan<char> htmlSpan, int startIndex = 0, bool findTagEnd = true)
High-performance overload that accepts a ReadOnlySpan<char> for zero-allocation parsing.
Properties
| Property | Type | Description |
|---|---|---|
TagName |
string |
The HTML tag name (e.g., "div", "img") |
Attributes |
Dictionary<string, string> |
Attribute name-value pairs (null if no attributes) |
TagStartIndex |
int |
Zero-based index where tag starts (at <) |
TagLength |
int |
Total length of tag including < and > |
IsSelfClosed |
bool |
Whether tag ends with /> |
InnerTagStartIndex |
int |
Index one character after < |
InnerTagLength |
int |
Length excluding < and > |
NoAtts |
bool |
Whether tag has no attributes |
Examples
Boolean Attributes
var tag = new HtmlTag();
bool parsed = tag.Parse("<input type=text disabled required>");
True(parsed);
Equal("text", tag.Attributes["type"]);
Null(tag.Attributes["disabled"]);
Null(tag.Attributes["required"]);
Complex Real-World Example
var tag = new HtmlTag();
string html = @"<img
class='thumb-image'
data-image='https://example.com/photo.png'
data-image-dimensions=1920x1080
data-image-focal-point='0.5,0.5'
alt='My Photo'
src='https://example.com/photo.png?format=1200w'
data-load=false />";
bool parsed = tag.Parse(html);
True(parsed);
Equal("img", tag.TagName);
True(tag.IsSelfClosed);
Equal(7, tag.Attributes.Count);
Equal("thumb-image", tag.Attributes["class"]);
Equal("https://example.com/photo.png", tag.Attributes["data-image"]);
Equal("1920x1080", tag.Attributes["data-image-dimensions"]);
Equal("0.5,0.5", tag.Attributes["data-image-focal-point"]);
Equal("My Photo", tag.Attributes["alt"]);
Equal("https://example.com/photo.png?format=1200w", tag.Attributes["src"]);
Equal("false", tag.Attributes["data-load"]);
Parsing Multiple Tags (Reusable Instance)
NOTE! This is a dubious example, as HtmlTag is designed to parse a single tag at a time, not to stream through and correctly find all tags. This is a KISS, simply type, and not a full HTML parser. In any case... this still may demonstrate reusability.
string html = "<div><p>Hello</p> \t<img src='pic.jpg'/></div>";
var tag = new HtmlTag();
int pos = 0;
var foundTags = new List<(string name, int position)>();
while (pos < html.Length) {
if (html[pos] == '<') {
if (tag.Parse(html, startIndex: pos)) {
foundTags.Add((tag.TagName, tag.TagStartIndex));
pos = tag.TagStartIndex + tag.TagLength;
continue;
}
}
pos++;
}
Equal(3, foundTags.Count);
Equal(("div", 0), foundTags[0]);
Equal(("p", 5), foundTags[1]);
Equal(("img", 20), foundTags[2]);
Handling Malformed HTML
HtmlTag is forgiving like browsers, handling common malformations:
var tag = new HtmlTag();
// Extra spaces around equals
bool parsed = tag.Parse("<div class = 'test'>");
True(parsed);
Equal("test", tag.Attributes["class"]);
// Unquoted attribute values
tag.Parse("<img width=800 height=600>");
Equal("800", tag.Attributes["width"]);
Equal("600", tag.Attributes["height"]);
Performance Characteristics
Memory Allocations
- Zero allocations for tag and attribute name parsing (uses spans)
- Minimal allocations: Only for final tag name and attribute values
- Dictionary only allocated if attributes exist - tags without attributes have zero dictionary overhead
- Reusable instances - Parse multiple tags with the same
HtmlTagobject
Parsing Speed
- Single-pass parsing - Scans input once
- Aggressive inlining on hot paths
- SIMD-optimized
IndexOfoperations viaReadOnlySpan<char> - Fast-path validation for common ASCII tag/attribute names
Typical Performance
Parsing 1000 typical HTML tags (3-5 attributes each):
- Time: ~100-200 microseconds
- Allocations: ~50-100 KB total
- GC pressure: Minimal
Edge Cases and Behavior
Duplicate Attributes
When duplicate attribute names exist, the last one wins (matching browser behavior):
var tag = new HtmlTag();
bool parsed = tag.Parse("<div class='first' class='second'>");
True(parsed);
Equal("second", tag.Attributes["class"]); // Last one wins
Empty Attribute Values
var tag = new HtmlTag();
bool parsed = tag.Parse("<div title='' data-test>");
True(parsed);
Equal("", tag.Attributes["title"]);
Null(tag.Attributes["data-test"]);
Equal(2, tag.Attributes.Count);
Special Characters in Attributes
var tag = new HtmlTag();
bool parsed = tag.Parse(@"<div data-json='{""key"":""value""}'>");
True(parsed);
// Attributes are NOT unescaped - returned as-is
Equal(@"{""key"":""value""}", tag.Attributes["data-json"]);
Limitations
- ❌ Only parses opening tags - Does not handle closing tags or tag content
- ❌ Does not unescape HTML entities - Attribute values returned as-is
- ❌ No validation of tag or attribute names against HTML spec, but does validate according to XML-valid names
- ❌ No namespace support - Treats namespaces as part of the tag name
Requirements
- .NET 8.0 or higher
- C# 12.0 or higher
Related Projects
- HtmlAgilityPack - Full-featured HTML parser
- AngleSharp - Comprehensive HTML/CSS/SVG parser
HtmlTag differs by being laser-focused on fast, single-tag parsing with minimal overhead, rather than full document parsing.
Benchmarks
Coming soon hopefully.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- DotNetXtensions.Mini (>= 1.0.0-alpha-1.0)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 1.0.0-beta-1.0.0 | 235 | 11/27/2025 |