Dataplat.Dbatools.Csv 1.1.15

dotnet add package Dataplat.Dbatools.Csv --version 1.1.15
                    
NuGet\Install-Package Dataplat.Dbatools.Csv -Version 1.1.15
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Dataplat.Dbatools.Csv" Version="1.1.15" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Dataplat.Dbatools.Csv" Version="1.1.15" />
                    
Directory.Packages.props
<PackageReference Include="Dataplat.Dbatools.Csv" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Dataplat.Dbatools.Csv --version 1.1.15
                    
#r "nuget: Dataplat.Dbatools.Csv, 1.1.15"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package Dataplat.Dbatools.Csv@1.1.15
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=Dataplat.Dbatools.Csv&version=1.1.15
                    
Install as a Cake Addin
#tool nuget:?package=Dataplat.Dbatools.Csv&version=1.1.15
                    
Install as a Cake Tool

Dataplat.Dbatools.Csv

NuGet NuGet Downloads License: MIT

The performance-optimized CSV library built for SQL Server. High-performance CSV reader and writer for .NET from the trusted dbatools project.

What makes this library unique:

  • Native IDataReader - Stream directly to SqlBulkCopy with zero intermediate allocations
  • Schema Inference - Auto-detect SQL Server column types (int, bigint, decimal, datetime2, bit, uniqueidentifier, varchar/nvarchar)
  • Built-in compression - GZip, Brotli, Deflate, ZLib with decompression bomb protection
  • Real-world data handling - Lenient parsing, smart quotes, duplicate headers, field count mismatches
  • Faster than LumenWorks & CsvHelper - ~1.5x faster with modern .NET (Span<T>, ArrayPool)
  • Cancellation & Progress - CancellationToken support and progress callbacks for long imports

Installation

dotnet add package Dataplat.Dbatools.Csv

Or via Package Manager:

Install-Package Dataplat.Dbatools.Csv

Features

  • Streaming IDataReader - Works seamlessly with SqlBulkCopy and other ADO.NET consumers
  • Schema Inference - Analyze CSV data to determine optimal SQL Server column types
  • Strongly Typed Columns - Define column types for automatic conversion with built-in and custom converters
  • High Performance - ~1.5x faster than LumenWorks/CsvHelper with ArrayPool-based memory management
  • Parallel Processing - Optional multi-threaded parsing for large files (25K+ rows/sec)
  • String Interning - Reduce memory for files with repeated values
  • Compression Support - Automatic handling of GZip, Deflate, Brotli (.NET 8+), and ZLib (.NET 8+)
  • Culture-Aware Parsing - Configurable type converters for dates, numbers, booleans, and GUIDs
  • Flexible Delimiters - Single or multi-character delimiters (e.g., ::, ||)
  • Robust Error Handling - Collect errors, throw on first error, or skip bad rows
  • Security Built-in - Decompression bomb protection, max field length limits
  • Smart Quote Handling - Normalize curly/smart quotes from Word/Excel
  • Lenient Parsing Mode - Handle real-world malformed CSV data gracefully
  • Duplicate Header Support - Rename, ignore, or use first/last occurrence
  • Field Count Mismatch Handling - Pad with nulls, truncate, or fail on row length mismatches

Performance

Benchmark: 100,000 rows × 10 columns (.NET 8, AVX-512)

Single column read (typical SqlBulkCopy/IDataReader pattern):

Library Time (ms) vs Dataplat
Sep 18 ms 3.7x faster
Sylvan 27 ms 2.5x faster
Dataplat 67 ms baseline
CsvHelper 76 ms 1.1x slower
LumenWorks 395 ms 5.9x slower

All columns read (full row processing):

Library Time (ms) vs Dataplat
Sep 30 ms 1.8x faster
Sylvan 35 ms 1.6x faster
Dataplat 55 ms baseline
CsvHelper 97 ms 1.8x slower
LumenWorks 102 ms 1.9x slower

Understanding the performance tradeoffs

Sep achieves 21 GB/s by using Span<T> and only materializing strings when explicitly requested. Sylvan uses similar techniques. Both avoid allocations until the last possible moment.

Why Dataplat can't match this: The IDataReader interface requires GetValue() to return actual object instances. For string columns, this means creating real string objects—we can't return spans. This is a fundamental architectural tradeoff for SqlBulkCopy compatibility.

When each library shines:

Scenario Bottleneck Winner
CSV → SqlBulkCopy → SQL Server Network/disk I/O, not parsing Dataplat (integrated)
CSV.gz → SQL Server Decompression overhead Dataplat (built-in)
Messy enterprise exports Error handling complexity Dataplat (lenient mode)
Raw in-memory parsing benchmark CPU/allocations Sep/Sylvan

For database import workflows, the complete file.csv.gz → SqlBulkCopy → SQL Server pipeline with Dataplat is often comparable to combining Sep + manual decompression + custom IDataReader wrapper, while requiring less code.

Quick Start

Reading CSV Files

using Dataplat.Dbatools.Csv.Reader;

// Simple usage
using var reader = new CsvDataReader("data.csv");
while (reader.Read())
{
    var name = reader.GetString(0);
    var value = reader.GetInt32(1);
}

// With options
var options = new CsvReaderOptions
{
    Delimiter = ";",
    HasHeaderRow = true,
    Culture = CultureInfo.GetCultureInfo("de-DE")
};
using var reader = new CsvDataReader("data.csv", options);

Reading Compressed Files

// Automatically detects compression from extension (.gz, .gzip, .deflate, .br, .zlib)
using var reader = new CsvDataReader("data.csv.gz");

// Or specify explicitly
var options = new CsvReaderOptions
{
    CompressionType = CompressionType.GZip,
    MaxDecompressedSize = 100 * 1024 * 1024  // 100MB limit
};
using var reader = new CsvDataReader(stream, options);

Bulk Loading to SQL Server

using var reader = new CsvDataReader("data.csv");
using var connection = new SqlConnection(connectionString);
connection.Open();

using var bulkCopy = new SqlBulkCopy(connection);
bulkCopy.DestinationTableName = "MyTable";
bulkCopy.WriteToServer(reader);  // Streams directly, low memory usage

Parallel Processing (Large Files)

var options = new CsvReaderOptions
{
    EnableParallelProcessing = true,
    MaxDegreeOfParallelism = Environment.ProcessorCount
};

using var reader = new CsvDataReader("large-file.csv", options);
// Process as normal - parallel parsing happens automatically

Writing CSV Files

using Dataplat.Dbatools.Csv.Writer;

var options = new CsvWriterOptions
{
    Delimiter = ",",
    QuoteAllFields = false,
    Culture = CultureInfo.InvariantCulture
};

using var writer = new CsvWriter("output.csv", options);
writer.WriteHeader(new[] { "Id", "Name", "Date" });
writer.WriteRecord(new object[] { 1, "Test", DateTime.Now });

Error Handling

var options = new CsvReaderOptions
{
    CollectParseErrors = true,
    MaxParseErrors = 100,
    ParseErrorAction = CsvParseErrorAction.AdvanceToNextLine
};

using var reader = new CsvDataReader("data.csv", options);
while (reader.Read())
{
    // Process valid records
}

// Check collected errors
foreach (var error in reader.ParseErrors)
{
    Console.WriteLine($"Row {error.RowIndex}, Line {error.LineNumber}: {error.Message}");
}

Handling Malformed Data

// Handle files with duplicate column names
var options = new CsvReaderOptions
{
    DuplicateHeaderBehavior = DuplicateHeaderBehavior.Rename  // Name, Name_2, Name_3
};

// Handle rows with wrong number of fields
var options = new CsvReaderOptions
{
    MismatchedFieldAction = MismatchedFieldAction.PadOrTruncate
};

// Handle malformed quotes (e.g., unmatched quotes, backslash escapes)
var options = new CsvReaderOptions
{
    QuoteMode = QuoteMode.Lenient
};

// Normalize smart/curly quotes from Word/Excel
var options = new CsvReaderOptions
{
    NormalizeQuotes = true
};

// Distinguish between null and empty string (see examples below)
var options = new CsvReaderOptions
{
    DistinguishEmptyFromNull = true
};

Cancellation Support

using var cts = new CancellationTokenSource();

var options = new CsvReaderOptions
{
    CancellationToken = cts.Token
};

// In another thread or after timeout
cts.CancelAfter(TimeSpan.FromSeconds(30));

try
{
    using var reader = new CsvDataReader("large-file.csv", options);
    while (reader.Read())
    {
        // Process records - will throw OperationCanceledException when cancelled
    }
}
catch (OperationCanceledException)
{
    Console.WriteLine("Import was cancelled");
}

Progress Reporting

var options = new CsvReaderOptions
{
    ProgressReportInterval = 10000,  // Report every 10,000 records
    ProgressCallback = progress =>
    {
        Console.WriteLine($"Processed {progress.RecordsRead:N0} records " +
                          $"({progress.RowsPerSecond:N0} rows/sec)");

        if (progress.PercentComplete >= 0)
            Console.WriteLine($"Progress: {progress.PercentComplete:F1}%");
    }
};

using var reader = new CsvDataReader("large-file.csv", options);
while (reader.Read())
{
    // Process records
}

Schema Inference

Automatically detect optimal SQL Server column types from CSV data. No more nvarchar(MAX) for everything:

using Dataplat.Dbatools.Csv.Reader;

// Fast: Sample first 1000 rows (tiny risk if data changes after sample)
var columns = CsvSchemaInference.InferSchemaFromSample("data.csv");

// Safe: Scan entire file with progress reporting (zero risk of type mismatches)
var columns = CsvSchemaInference.InferSchema("data.csv", null, progress => {
    Console.WriteLine($"Progress: {progress:P0}");
});

// Examine inferred types
foreach (var col in columns)
{
    Console.WriteLine($"{col.ColumnName}: {col.SqlDataType} {(col.IsNullable ? "NULL" : "NOT NULL")}");
}
// Output:
// Id: int NOT NULL
// Name: nvarchar(100) NULL
// Price: decimal(10,2) NOT NULL
// Created: datetime2 NULL

// Generate CREATE TABLE statement
string sql = CsvSchemaInference.GenerateCreateTableStatement(columns, "Products", "dbo");
// CREATE TABLE [dbo].[Products] (
//     [Id] int NOT NULL,
//     [Name] nvarchar(100) NULL,
//     [Price] decimal(10,2) NOT NULL,
//     [Created] datetime2 NULL
// );

// Use inferred types with CsvDataReader
var typeMap = CsvSchemaInference.ToColumnTypes(columns);
var options = new CsvReaderOptions { ColumnTypes = typeMap };
using var reader = new CsvDataReader("data.csv", options);

Detected types: uniqueidentifier, bit, int, bigint, decimal(p,s), datetime2, varchar(n), nvarchar(n) (when Unicode is detected)

InferredColumn properties:

Property Type Description
ColumnName string Column header name
SqlDataType string SQL Server data type (e.g., int, decimal(10,2), nvarchar(50))
IsNullable bool True if any NULL/empty values were found
IsUnicode bool True if non-ASCII characters detected
MaxLength int Maximum string length observed
Precision int Decimal precision (total digits)
Scale int Decimal scale (digits after decimal point)
Ordinal int Column position (0-based)
TotalCount long Total rows analyzed
NonNullCount long Rows with non-null values

Strongly Typed Columns

Define column types explicitly for automatic conversion during reading:

var options = new CsvReaderOptions
{
    ColumnTypes = new Dictionary<string, Type>
    {
        ["Id"] = typeof(int),
        ["Price"] = typeof(decimal),
        ["IsActive"] = typeof(bool),
        ["Created"] = typeof(DateTime),
        ["UniqueId"] = typeof(Guid)
    }
};

using var reader = new CsvDataReader("data.csv", options);
while (reader.Read())
{
    int id = reader.GetInt32(0);           // Already converted from string
    decimal price = reader.GetDecimal(1);  // Culture-aware parsing
    bool active = reader.GetBoolean(2);    // Handles true/false/yes/no/1/0
    DateTime created = reader.GetDateTime(3);
    Guid guid = reader.GetGuid(4);
}

Built-in type converters: Guid, bool, DateTime, short, int, long, float, double, decimal, byte, string, money, vector (SQL Server 2025)

Combine with schema inference:

// Infer types from CSV data, then use them for reading
var columns = CsvSchemaInference.InferSchemaFromSample("data.csv");
var typeMap = CsvSchemaInference.ToColumnTypes(columns);

var options = new CsvReaderOptions { ColumnTypes = typeMap };
using var reader = new CsvDataReader("data.csv", options);

Custom type converters:

using Dataplat.Dbatools.Csv.TypeConverters;

// Create a custom converter for enums or custom types
public class StatusConverter : TypeConverterBase<OrderStatus>
{
    public override bool TryConvert(string value, out OrderStatus result)
    {
        return Enum.TryParse(value, true, out result);
    }
}

// Register and use
var registry = TypeConverterRegistry.Default;
registry.Register(new StatusConverter());

var options = new CsvReaderOptions
{
    TypeConverterRegistry = registry,
    ColumnTypes = new Dictionary<string, Type> { ["Status"] = typeof(OrderStatus) }
};

Null vs Empty String Handling

CSV files can represent missing data in two ways: an empty field (,,) or an explicitly quoted empty string (,"",...). The DistinguishEmptyFromNull option controls how these are interpreted.

Example CSV:

Name,Description,Notes
Alice,,""
Bob,"",
Charlie,"Has value","Also has value"

Default behavior (DistinguishEmptyFromNull = false):

Both empty fields and quoted empty strings become empty string (""):

var options = new CsvReaderOptions { DistinguishEmptyFromNull = false }; // default
using var reader = new CsvDataReader("data.csv", options);

reader.Read(); // Alice row
reader.IsDBNull(1);  // false - Description is ""
reader.IsDBNull(2);  // false - Notes is ""
reader.GetString(1); // ""
reader.GetString(2); // ""

With DistinguishEmptyFromNull = true:

Empty fields become null, quoted empty strings remain empty string:

var options = new CsvReaderOptions { DistinguishEmptyFromNull = true };
using var reader = new CsvDataReader("data.csv", options);

reader.Read(); // Alice row
reader.IsDBNull(1);  // true  - Description (,,) is NULL
reader.IsDBNull(2);  // false - Notes ("") is empty string
reader.GetString(1); // throws InvalidCastException (value is null)
reader.GetValue(1);  // DBNull.Value
reader.GetString(2); // ""

When to use this option:

Use Case Recommendation
SQL bulk import where NULL matters Enable (true)
Database columns with NOT NULL constraints Disable (false) - default
Preserving exact semantics from source system Enable (true)
Simple data processing Disable (false) - default

Quick reference:

CSV Input DistinguishEmptyFromNull = false DistinguishEmptyFromNull = true
,, (empty field) "" (empty string) null (DBNull.Value)
,"", (quoted empty) "" (empty string) "" (empty string)
,value, "value" "value"

LumenWorks Compatibility

For projects migrating from LumenWorks CsvReader, these methods provide familiar APIs:

using var reader = new CsvDataReader("data.csv");

while (reader.Read())
{
    // Get column index by name (-1 if not found, unlike GetOrdinal which throws)
    int idx = reader.GetFieldIndex("ColumnName");

    // Get current record as reconstructed CSV string (useful for error logging)
    string rawData = reader.GetCurrentRawData();

    // Efficiently copy all fields to an array
    string[] values = new string[reader.FieldCount];
    reader.CopyCurrentRecordTo(values);

    // Check if current record had issues
    if (reader.MissingFieldFlag)
        Console.WriteLine("Record had missing fields (padded with nulls)");
    if (reader.ParseErrorFlag)
        Console.WriteLine("Record had a parse error that was skipped");
}

// Check if stream is fully consumed
if (reader.EndOfStream)
    Console.WriteLine("Finished reading all data");

Empty Header Handling

CSV files with empty or whitespace-only headers are automatically assigned default names:

// CSV: Name,,Value
// Headers become: Name, Column1, Value

var options = new CsvReaderOptions
{
    DefaultHeaderName = "Field"  // Custom prefix (default is "Column")
};
// Headers become: Name, Field1, Value

Configuration Options

CsvReaderOptions

Option Default Description
Delimiter "," Field delimiter (supports multi-character)
HasHeaderRow true First row contains column names
SkipRows 0 Number of rows to skip before reading
Culture InvariantCulture Culture for parsing numbers/dates
ParseErrorAction ThrowException How to handle parse errors
CollectParseErrors false Collect errors instead of throwing
MaxParseErrors 1000 Maximum errors to collect
TrimmingOptions None Whitespace trimming options
CompressionType None Compression format (auto-detected by default)
MaxDecompressedSize 10GB Limit for decompression bomb protection
MaxQuotedFieldLength 0 Limit for quoted field length (0 = unlimited)
QuoteMode Strict RFC 4180 strict or lenient parsing mode
DuplicateHeaderBehavior ThrowException How to handle duplicate column names
MismatchedFieldAction ThrowException How to handle rows with wrong field count
NormalizeQuotes false Convert smart/curly quotes to ASCII quotes
DistinguishEmptyFromNull false Distinguish ,, (null) from ,"", (empty)
EnableParallelProcessing false Enable multi-threaded parsing
MaxDegreeOfParallelism 0 Worker threads (0 = processor count)
InternStrings false Intern common string values
CancellationToken None Token to monitor for cancellation requests
ProgressReportInterval 10000 Records between progress reports (0 = disabled)
ProgressCallback null Callback receiving CsvProgress updates

Thread Safety

When parallel processing is enabled, CsvDataReader provides the following thread-safety guarantees:

Method/Property Thread-Safe Notes
GetValue() Yes Returns consistent snapshot of current record
GetValues() Yes Atomic copy of all values in current record
CurrentRecordIndex Yes No torn reads on 64-bit values
Close() / Dispose() Yes Safely stops parallel pipeline from any thread
Read() No Only one thread should call Read()

Usage Pattern

var options = new CsvReaderOptions
{
    EnableParallelProcessing = true,
    MaxDegreeOfParallelism = 4
};

using var reader = new CsvDataReader("large-file.csv", options);

while (reader.Read())  // Main thread only
{
    // Safe to read values from multiple threads concurrently
    Parallel.For(0, 4, _ =>
    {
        var values = new object[reader.FieldCount];
        reader.GetValues(values);  // Thread-safe
        ProcessValues(values);
    });
}

Important Notes

  • Sequential mode (parallel processing disabled): The reader is not thread-safe. All access should be from a single thread.
  • Snapshot semantics: Values returned by GetValue()/GetValues() represent a snapshot that may change after the next Read() call.
  • Single reader thread: Only one thread should call Read() at a time. Concurrent Read() calls are not supported.

Target Frameworks

  • .NET Framework 4.7.2
  • .NET 8.0

Security Considerations

  • QuoteMode.Lenient: Deviates from RFC 4180 and may parse data differently than expected. Use only for known malformed data sources.
  • MismatchedFieldAction.PadWithNulls/TruncateExtra: May mask data corruption or cause silent data loss. Use with caution on untrusted data.
  • MaxDecompressedSize: Always set an appropriate limit when processing compressed files from untrusted sources to prevent decompression bomb attacks.
  • MaxQuotedFieldLength: Set a limit when processing untrusted data to prevent memory exhaustion from malformed multiline quoted fields.

License

MIT License - see the LICENSE file for details.

Development

This CSV library was created using Claude Code (Opus 4.5) with the following initial prompt:

the dbatools repo is at C:\github\dbatools and this repo is at C:\github\dbatools.library

i would like to create a replacement for LumenWorks.Framework.IO.dll PLUS the additional functionality requested in dbatools issues on github which you can find using the gh command

the source code for lumenworks is https://github.com/phatcher/CsvReader/tree/master/code/LumenWorks.Framework.IO

This library was written over a decade ago. considering the advances in .NET and SqlClient etc, please add a CSV reader of better quality (more functionality often seen in paid systems, faster) using recent .NET and Microsoft Data best practices

Please ultrathink about the best way to go about creating this new, extensive functionality within the dbatools library. if it should be a new project that is linked or whatever, do it in this repo.

Additional refinements included a security review and feature additions based on dbatools GitHub issues.

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
.NET Framework net472 is compatible.  net48 was computed.  net481 was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.1.15 96 12/28/2025
1.1.10 173 12/26/2025
1.1.5 195 12/4/2025
1.1.1 190 12/4/2025
1.1.0 192 12/4/2025
1.0.2 654 12/3/2025

v1.1.10: SQL Server schema inference - auto-detect column types (int, bigint, decimal, datetime2, bit, uniqueidentifier, varchar/nvarchar). v1.1.5: Updated package metadata and URL. v1.1.1: ~25% performance improvement for all-columns reads. v1.1.0: Added CancellationToken and progress reporting support.