HeroParser 2.4.1
dotnet add package HeroParser --version 2.4.1
NuGet\Install-Package HeroParser -Version 2.4.1
<PackageReference Include="HeroParser" Version="2.4.1" />
<PackageVersion Include="HeroParser" Version="2.4.1" />
<PackageReference Include="HeroParser" />
paket add HeroParser --version 2.4.1
#r "nuget: HeroParser, 2.4.1"
#:package HeroParser@2.4.1
#addin nuget:?package=HeroParser&version=2.4.1
#tool nuget:?package=HeroParser&version=2.4.1
HeroParser - High-Performance, AI-Native CSV, Fixed-Width, Excel (.xlsx), JSONL & HTB Parser
HeroParser is a zero-allocation, SIMD-accelerated tabular data parser and writer for .NET 8, 9, and 10. Designed for extreme speed, memory efficiency, and Native AOT compatibility, it also offers first-class integrations for AI agents, vector embeddings, and LLM pipelines.
Why Choose HeroParser?
- Extreme Performance: Engineered with AVX-512, AVX2, and ARM NEON SIMD optimizations to deliver ultra-high-throughput reading and writing.
- AI-Native integrations: Built-in support for token-budgeted chunking, LLM output structured repair, vector embedding pipelines, and agent tool mapping.
- Zero Dependencies & Low Footprint: Operates with zero external packages. Employs a fixed 112-byte heap memory footprint on the reading hot-path regardless of file size.
- Unified Attributes: Annotate your C# classes once, and use them across CSV, Excel, Fixed-Width, and HTB APIs.
Performance & Memory
Tested under .NET 10.0 on an AMD Ryzen AI 9 HX PRO 370 CPU:
- Read Throughput: SIMD-accelerated UTF-8 (
byte[]) read paths on both quoted and unquoted data. - Write Throughput: Highly optimized CSV/JSONL serialization achieving massive throughput.
- GC Allocations: Fixed 112-byte allocation throughout parsing, representing a 97% memory reduction compared to traditional reflection-based parsers.
- String Generation: Up to 64% speedup on synchronous text generation via pre-allocated capacities.
View live performance graphs and history on the HeroParser Performance Portal.
Install
dotnet add package HeroParser
Simple Quick Starts
Define your record type. Decorate it with [GenerateBinder] to enable source-generated, reflection-free, and Native AOT-safe binding:
using HeroParser;
[GenerateBinder]
public class Product
{
public int Id { get; set; }
public string Name { get; set; } = "";
[Validate(RangeMin = 0)]
public decimal Price { get; set; }
[Parse(Format = "yyyy-MM-dd")]
public DateTime ReleaseDate { get; set; }
}
CSV
// Read a file in one line (zero-allocation binding)
List<Product> products = Csv.Read<Product>().FromFile("products.csv").ToList();
// Async stream millions of rows without buffering
await foreach (Product p in Csv.Read<Product>().FromFileAsync("products.csv"))
{
Console.WriteLine($"{p.Name}: {p.Price:C}");
}
// Write collection to a file
Csv.Write<Product>().ToFile("out.csv", products);
Excel (.xlsx)
Reads and writes Excel workbooks with zero external dependencies (utilizes only standard .NET compression and XML packages).
// Read Excel files
List<Product> products = Excel.Read<Product>().FromFile("products.xlsx");
// Read specific sheet
var sheetData = Excel.Read<Product>().FromSheet("Sales").FromFile("workbook.xlsx");
// Write Excel workbook
Excel.Write<Product>().ToFile("out.xlsx", products);
Fixed-Width
Map properties to specific character boundaries using the [PositionalMap] attribute:
[GenerateBinder]
public class Employee
{
[PositionalMap(Start = 0, Length = 10)]
public string Id { get; set; } = "";
[PositionalMap(Start = 10, Length = 30)]
public string Name { get; set; } = "";
[PositionalMap(Start = 40, Length = 10, Alignment = FieldAlignment.Right)]
public decimal Salary { get; set; }
}
// Read positional records
var employees = FixedWidth.Read<Employee>().FromFile("employees.dat").Records;
// Write positional records
FixedWidth.Write<Employee>().ToFile("out.dat", employees);
JSON Lines (JSONL)
Perfect for AI fine-tuning datasets, streamed LLM responses, and database bulk loading.
// Read JSONL files (AOT-safe)
var records = Jsonl.Read<Product>().FromFile("products.jsonl").ToList();
// Write JSONL files
Jsonl.Write<Product>().ToFile("out.jsonl", records);
High-Throughput Tabular Binary (HTB)
A custom, high-speed binary serialization format optimized for zero allocations, platform independence, and vector embedding storage (supporting float[] arrays).
Why HTB?
- Zero Heap Allocations: Utilizes Roslyn source generators (
[GenerateBinder]) to map properties directly with zero-boxing and zero-reflection overhead. - Vector Embedding Native: Natively supports floating-point arrays (
float[]), enabling ultra-fast vector embedding serialization without string parsing overhead. - Platform-Independent Endianness: Automatically handles big-endian byte-order reversal for floats, doubles, and ints for cross-architecture safety.
- Allocation-Free CSV ↔ HTB Conversion: Stream-convert CSV directly to HTB (and vice-versa) with zero heap allocation overhead.
- AOT & Trim Ready: Fully compatible with Native AOT compilation out-of-the-box.
// Read HTB binary files (AOT-safe)
List<Product> products = Htb.Read<Product>().FromFile("products.htb").ToList();
// Async stream HTB records
await foreach (Product p in Htb.Read<Product>().FromFileAsync("products.htb"))
{
Console.WriteLine($"{p.Name}: {p.Price:C}");
}
// Write records to an HTB file
Htb.Write<Product>().ToFile("out.htb", products);
// Direct, allocation-free CSV ↔ HTB conversions
Htb.ConvertFromCsv("products.csv", "products.htb", HtbSchema.FromType<Product>());
Unified Attribute System
Annotate a single record class once, and read or write it across multiple formats:
| Attribute | Purpose | CSV | Excel | Fixed-Width | HTB |
|---|---|---|---|---|---|
[GenerateBinder] |
Emits Roslyn source-generated, reflection-free mapping binder | Yes | Yes | Yes | Yes |
[TabularMap(Name, Index)] |
Maps property to column header or index | Yes | Yes | No | Yes |
[PositionalMap(Start, Length...)] |
Declares character position, alignment, and pad characters | No | No | Yes | No |
[Parse(Format)] |
Converts raw values to custom types (e.g. DateTime format) | Yes | Yes | Yes | No |
[Format(WriteFormat...)] |
Customizes output formatting during serialization | Yes | Yes | Yes | No |
[Validate(Range, Pattern...)] |
Validates properties bidirectionally (Strict/Lenient modes) | Yes | Yes | Yes | Yes |
AI-Native Tabular Capabilities
HeroParser includes first-class support for LLM, vector search, and RAG pipelines:
1. LLM Structured Output Repair (LlmRepair)
Repairs truncated final rows (unclosed quotes/escapes) and strips markdown code blocks from raw LLM text streams.
using HeroParser.AI;
// Repaired on-the-fly and parsed directly into strongly-typed records
await foreach (var dev in LlmRepair.ReadFromTextAsync<Developer>(rawLlmResponse))
{
Console.WriteLine($"{dev.Name} is a {dev.Role}");
}
2. Tabular Embedding Pipeline (ToLlmEmbeddingsAsync)
Batches streamed records and pairs them with vector embeddings with a zero-allocation, token-budgeted streaming wrapper.
using HeroParser.AI;
await foreach (var chunk in developers.ToLlmEmbeddingsAsync(
async (texts, ct) => await GetEmbeddingsFromApiAsync(texts, ct),
options: new LlmChunkOptions { MaxTokensPerChunk = 250 },
batchSize: 16))
{
Console.WriteLine($"Chunk of {chunk.Chunk.EndRow - chunk.Chunk.StartRow + 1} rows embedded.");
}
3. Agent Tool Argument Mapper (SchemaMetadata.MapFromToolCall)
Maps flat dictionaries of case-insensitive arguments returned by tool calling models into typed record models, executing validation constraints and raising rich validation feedback.
using HeroParser.AI;
Developer dev = SchemaMetadata.MapFromToolCall<Developer>(arguments);
4. Tabular Context Profiler (TabularContextProfiler)
Generates structured statistical profile cards in markdown directly from datasets to inject into LLM system prompts.
string contextCard = developers.GenerateContextCard(datasetName: "Engineering Team");
5. Token-Bounded JSON Chunker (JsonLlmChunker)
Chunks datasets into valid, token-bounded JSON array blocks, optimized for ingestion into RAG context windows.
await foreach (var jsonChunk in developers.ToJsonLlmChunksAsync(options)) { ... }
Key Features
- SIMD-accelerated CSV parsing — AVX-512, AVX2, and ARM NEON instruction sets; PCLMULQDQ-based branchless quote tracking
- Zero allocations — fixed 4 KB stack footprint regardless of column count or file size;
ArrayPoolfor buffers - AOT/trimming ready — source generators emit reflection-free binders; annotated with
[RequiresUnreferencedCode]where reflection is unavoidable - Async streaming —
IAsyncEnumerable<T>for CSV, Fixed-Width, Excel, JSONL, and HTB; true non-blocking I/O with sync fast paths - Excel without extra dependencies — reads and writes
.xlsxusing onlySystem.IO.CompressionandSystem.Xml - JSONL for AI/ML pipelines —
Jsonl.Read<T>()/Jsonl.Write<T>()mirror the CSV builder pattern; AOT-safe viaJsonTypeInfo<T>;CsvToJsonlConverterprojects tabular data into OpenAI/Anthropic fine-tuning shapes - HTB binary format — custom, high-speed, zero-allocation binary format featuring float-array embedding support, big-endian byte-order reversal, and direct CSV ↔ HTB conversion
- Embedding-API batching —
IAsyncEnumerable<T>.BatchAsync(size)groups streamed records into fixed-size batches for OpenAI/Voyage/Cohere/Anthropic embedding calls - Inline vector parser —
VectorParser.ParseFloats(span)handles pre-computed embeddings ("[0.1,0.2,…]", comma/semicolon/whitespace separators, culture-aware) - DataReader support —
Csv.CreateDataReader(),FixedWidth.CreateDataReader(),Excel.CreateDataReader(),Jsonl.CreateDataReader()for database bulk loading viaSqlBulkCopy - PipeReader integration —
Csv.ReadFromPipeReaderAsync(pipe)for network streaming without buffering the entire payload - Multi-schema CSV — discriminator-based row routing to different record types; source-generated dispatch for ~2.85x faster throughput
- Delimiter detection — auto-detect comma, semicolon, pipe, or tab from sample rows with a confidence score
- CSV validation — pre-flight structural checks with detailed per-row error reporting
- Field validation —
[Validate]constraints (NotNull, NotEmpty, Range, Pattern) collected lazily;result.ThrowIfAnyError()for fail-fast - CSV injection protection — configurable sanitization modes for user-data exports
- Progress reporting — row/byte callbacks for large-file UX
- Custom type converters — register converters for domain types on any reader or writer
- Write capacity pre-allocation — backing buffer capacity is pre-allocated via estimated record counts when writing collections to strings, yielding up to 64% speedup and completely eliminating buffer resize copy overhead
- Multi-framework — .NET 8, 9, 10; CI validates all three on Windows, Linux, and macOS
Detailed Documentation
For advanced features and full API guides, see the files under the docs folder:
- CSV Guide — Fluent readers/writers, validation, PipeReader, and multi-schema dispatching.
- Excel Guide — Multi-sheet workbooks, custom formatting, and progress tracking.
- Fixed-Width Guide — Positional mapping, alignment, padding, and custom type converters.
- JSONL Guide — Fine-tuning templates, vector parsing, and Native AOT setups.
- HTB Guide — High-Throughput Tabular Binary format, fluent APIs, CSV ↔ HTB conversion, and Native AOT support.
- Benchmarks Guide — Execution environments, detailed CPU metrics, and comparisons.
Building & Testing
# Build all projects
dotnet build
# Run unit tests
dotnet test --filter Category=Unit
# Run integration tests
dotnet test --filter Category=Integration
# Run all tests
dotnet test
# Check code formatting
dotnet format --verify-no-changes
# Run benchmarks
dotnet run -c Release --project benchmarks/HeroParser.Benchmarks
License
MIT
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- No dependencies.
-
net8.0
- System.IO.Pipelines (>= 10.0.8)
-
net9.0
- System.IO.Pipelines (>= 10.0.8)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 2.4.1 | 0 | 6/6/2026 |
| 2.4.0 | 87 | 5/29/2026 |
| 2.3.0 | 104 | 5/28/2026 |
| 2.2.1 | 94 | 5/26/2026 |
| 2.2.0 | 98 | 5/23/2026 |
| 2.1.3 | 124 | 4/24/2026 |
| 2.1.2 | 156 | 3/17/2026 |
| 2.1.1 | 121 | 3/16/2026 |
| 2.1.0 | 125 | 3/16/2026 |
| 2.0.0 | 121 | 3/14/2026 |
| 1.7.1 | 125 | 3/13/2026 |
| 1.7.0 | 112 | 3/12/2026 |
| 1.6.4 | 127 | 3/11/2026 |
| 1.6.3 | 130 | 1/13/2026 |
| 1.6.2 | 128 | 1/12/2026 |
| 1.6.1 | 127 | 1/10/2026 |
| 1.6.0 | 126 | 1/9/2026 |
| 1.5.4 | 127 | 12/29/2025 |
| 1.5.3 | 137 | 12/29/2025 |
| 1.5.2 | 121 | 12/27/2025 |
v2.4.1 delivers comprehensive security hardening, streaming reliability improvements, and zero-allocation parser optimizations: global XML entity expansion / Zip-bomb mitigations (XXE prevention) in Excel sheet reading, thread-safe cached Regex and ReDoS timeout protections in LLM validation, token-aligned PipeReader CRLF boundary splits safeguards, HTB float array endianness fixes, and AOT-safe progressive JSONL stream enhancements. Full changelog: https://github.com/KoalaFacts/HeroParser/blob/main/CHANGELOG.md