HtmlTag 1.0.0-beta-1.0.0

This is a prerelease version of HtmlTag.
dotnet add package HtmlTag --version 1.0.0-beta-1.0.0
                    
NuGet\Install-Package HtmlTag -Version 1.0.0-beta-1.0.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="HtmlTag" Version="1.0.0-beta-1.0.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="HtmlTag" Version="1.0.0-beta-1.0.0" />
                    
Directory.Packages.props
<PackageReference Include="HtmlTag" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add HtmlTag --version 1.0.0-beta-1.0.0
                    
#r "nuget: HtmlTag, 1.0.0-beta-1.0.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package HtmlTag@1.0.0-beta-1.0.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=HtmlTag&version=1.0.0-beta-1.0.0&prerelease
                    
Install as a Cake Addin
#tool nuget:?package=HtmlTag&version=1.0.0-beta-1.0.0&prerelease
                    
Install as a Cake Tool

HtmlTag

A high-performance, single-pass, zero-allocation HTML opening tag parser for .NET 8+

.NET License

Overview

HtmlTag is a lightweight, high-performance HTML tag parser optimized for parsing opening HTML tags with minimal memory allocations. It's designed for scenarios where you're already processing HTML and need to quickly extract tag names, attributes, and positional information from opening tags, while avoiding the headache of crazy HTML issues found in the wild.

Key Features

  • High Performance - Single-pass parsing with ReadOnlySpan<char> for zero-allocation operations
  • 🎯 Precise Parsing - Extracts tag name, attributes, and positional information
  • 🔧 Robust - Handles malformed and "ugly" HTML that browsers accept
  • 📦 Minimal Allocations - Dictionary only allocated if attributes exist
  • 🔄 Reusable - Same instance can parse multiple tags
  • 🎨 Flexible API - Accepts both string and ReadOnlySpan<char> inputs

Supported HTML Patterns

HtmlTag handles a wide variety of HTML tag formats:

  • ✅ Self-closing tags: <img />, <br/>
  • ✅ Tags with quoted attributes: <div class="red">
  • ✅ Tags with unquoted attributes: <input type=text>
  • ✅ Boolean attributes: <input disabled>
  • ✅ Mixed quote styles: <div class='a' title="b">
  • ✅ Malformed but browser-accepted HTML: <div class = " red " foo =bar>
  • ✅ Complex attribute values: URLs, special characters, etc.

Installation

dotnet add package HtmlTag

Or add to your .csproj:

<PackageReference Include="HtmlTag" Version="1.0.0" />

Quick Start

Following examples assume:


using static System.Console;
using static Xunit.Assert;

Basic Usage

using DotNetXtensions.Html;

// Parse a simple tag
var tag = new HtmlTag();
bool parsed = tag.Parse("<div class='container'>");

True(parsed);
Equal("div", tag.TagName);
Equal("container", tag.Attributes["class"]);

Parsing with Attributes

var tag = new HtmlTag();
bool parsed = tag.Parse("<img src='photo.jpg' alt='My Photo' width=800 />");

True(parsed);
Equal("img", tag.TagName);
True(tag.IsSelfClosed);
Equal(3, tag.Attributes.Count);
Equal("photo.jpg", tag.Attributes["src"]);
Equal("My Photo", tag.Attributes["alt"]);
Equal("800", tag.Attributes["width"]);

High-Performance Span API

For zero-allocation parsing when you already have a ReadOnlySpan<char>:

ReadOnlySpan<char> html = "<div class='test'>".AsSpan();
var tag = new HtmlTag();

bool parsed = tag.Parse(html, startIndex: 0);

True(parsed);
Equal("div", tag.TagName);
Equal("test", tag.Attributes["class"]);

Parsing from Within a Larger HTML Document

string html = "<html><body><div id='main'>Content</div></body></html>";
var tag = new HtmlTag();

// Find and parse the div tag starting at position 12
bool parsed = tag.Parse(html, startIndex: 12);

True(parsed);
Equal("div", tag.TagName);
Equal(12, tag.TagStartIndex);
Equal(15, tag.TagLength);
Equal("main", tag.Attributes["id"]);

API Reference

Parse Methods

bool Parse(string inputHtml, int startIndex = 0, bool findTagEnd = true)

Parses an HTML opening tag from a string.

  • inputHtml: The HTML string containing the tag
  • startIndex: Zero-based index where the tag starts (at <)
  • findTagEnd: If true, searches for the closing >; if false, assumes input ends at >
  • Returns: true if parsing succeeded, false if malformed
bool Parse(ReadOnlySpan<char> htmlSpan, int startIndex = 0, bool findTagEnd = true)

High-performance overload that accepts a ReadOnlySpan<char> for zero-allocation parsing.

Properties

Property Type Description
TagName string The HTML tag name (e.g., "div", "img")
Attributes Dictionary<string, string> Attribute name-value pairs (null if no attributes)
TagStartIndex int Zero-based index where tag starts (at <)
TagLength int Total length of tag including < and >
IsSelfClosed bool Whether tag ends with />
InnerTagStartIndex int Index one character after <
InnerTagLength int Length excluding < and >
NoAtts bool Whether tag has no attributes

Examples

Boolean Attributes

var tag = new HtmlTag();
bool parsed = tag.Parse("<input type=text disabled required>");

True(parsed);
Equal("text", tag.Attributes["type"]);
Null(tag.Attributes["disabled"]);
Null(tag.Attributes["required"]);

Complex Real-World Example

var tag = new HtmlTag();
string html = @"<img 
    class='thumb-image' 
    data-image='https://example.com/photo.png' 
    data-image-dimensions=1920x1080 
    data-image-focal-point='0.5,0.5' 
    alt='My Photo' 
    src='https://example.com/photo.png?format=1200w' 
    data-load=false />";

bool parsed = tag.Parse(html);

True(parsed);
Equal("img", tag.TagName);
True(tag.IsSelfClosed);
Equal(7, tag.Attributes.Count);
Equal("thumb-image", tag.Attributes["class"]);
Equal("https://example.com/photo.png", tag.Attributes["data-image"]);
Equal("1920x1080", tag.Attributes["data-image-dimensions"]);
Equal("0.5,0.5", tag.Attributes["data-image-focal-point"]);
Equal("My Photo", tag.Attributes["alt"]);
Equal("https://example.com/photo.png?format=1200w", tag.Attributes["src"]);
Equal("false", tag.Attributes["data-load"]);

Parsing Multiple Tags (Reusable Instance)

NOTE! This is a dubious example, as HtmlTag is designed to parse a single tag at a time, not to stream through and correctly find all tags. This is a KISS, simply type, and not a full HTML parser. In any case... this still may demonstrate reusability.

string html = "<div><p>Hello</p>  \t<img src='pic.jpg'/></div>";
var tag = new HtmlTag();
int pos = 0;
var foundTags = new List<(string name, int position)>();

while (pos < html.Length) {
    if (html[pos] == '<') {
        if (tag.Parse(html, startIndex: pos)) {
            foundTags.Add((tag.TagName, tag.TagStartIndex));
            pos = tag.TagStartIndex + tag.TagLength;
            continue;
        }
    }
    pos++;
}

Equal(3, foundTags.Count);
Equal(("div", 0), foundTags[0]);
Equal(("p", 5), foundTags[1]);
Equal(("img", 20), foundTags[2]);

Handling Malformed HTML

HtmlTag is forgiving like browsers, handling common malformations:

var tag = new HtmlTag();

// Extra spaces around equals
bool parsed = tag.Parse("<div class  =  'test'>");
True(parsed);
Equal("test", tag.Attributes["class"]);

// Unquoted attribute values
tag.Parse("<img width=800 height=600>");
Equal("800", tag.Attributes["width"]);
Equal("600", tag.Attributes["height"]);

Performance Characteristics

Memory Allocations

  • Zero allocations for tag and attribute name parsing (uses spans)
  • Minimal allocations: Only for final tag name and attribute values
  • Dictionary only allocated if attributes exist - tags without attributes have zero dictionary overhead
  • Reusable instances - Parse multiple tags with the same HtmlTag object

Parsing Speed

  • Single-pass parsing - Scans input once
  • Aggressive inlining on hot paths
  • SIMD-optimized IndexOf operations via ReadOnlySpan<char>
  • Fast-path validation for common ASCII tag/attribute names

Typical Performance

Parsing 1000 typical HTML tags (3-5 attributes each):

  • Time: ~100-200 microseconds
  • Allocations: ~50-100 KB total
  • GC pressure: Minimal

Edge Cases and Behavior

Duplicate Attributes

When duplicate attribute names exist, the last one wins (matching browser behavior):

var tag = new HtmlTag();
bool parsed = tag.Parse("<div class='first' class='second'>");

True(parsed);
Equal("second", tag.Attributes["class"]); // Last one wins

Empty Attribute Values

var tag = new HtmlTag();
bool parsed = tag.Parse("<div title='' data-test>");

True(parsed);
Equal("", tag.Attributes["title"]);
Null(tag.Attributes["data-test"]);
Equal(2, tag.Attributes.Count);

Special Characters in Attributes

var tag = new HtmlTag();
bool parsed = tag.Parse(@"<div data-json='{""key"":""value""}'>");

True(parsed);
// Attributes are NOT unescaped - returned as-is
Equal(@"{""key"":""value""}", tag.Attributes["data-json"]);

Limitations

  • Only parses opening tags - Does not handle closing tags or tag content
  • Does not unescape HTML entities - Attribute values returned as-is
  • No validation of tag or attribute names against HTML spec, but does validate according to XML-valid names
  • No namespace support - Treats namespaces as part of the tag name

Requirements

  • .NET 8.0 or higher
  • C# 12.0 or higher

HtmlTag differs by being laser-focused on fast, single-tag parsing with minimal overhead, rather than full document parsing.

Benchmarks

Coming soon hopefully.

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.0.0-beta-1.0.0 235 11/27/2025