Dataplat.Dbatools.Csv 1.1.15

.NET 8.0 .NET Framework 4.7.2

dotnet add package Dataplat.Dbatools.Csv --version 1.1.15

NuGet\Install-Package Dataplat.Dbatools.Csv -Version 1.1.15

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="Dataplat.Dbatools.Csv" Version="1.1.15" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="Dataplat.Dbatools.Csv" Version="1.1.15" />
                    

                            Directory.Packages.props

<PackageReference Include="Dataplat.Dbatools.Csv" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add Dataplat.Dbatools.Csv --version 1.1.15

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: Dataplat.Dbatools.Csv, 1.1.15"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package Dataplat.Dbatools.Csv@1.1.15

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=Dataplat.Dbatools.Csv&version=1.1.15
                    

                            Install as a Cake Addin

#tool nuget:?package=Dataplat.Dbatools.Csv&version=1.1.15
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Dataplat.Dbatools.Csv

The performance-optimized CSV library built for SQL Server. High-performance CSV reader and writer for .NET from the trusted dbatools project.

What makes this library unique:

Native IDataReader - Stream directly to SqlBulkCopy with zero intermediate allocations
Schema Inference - Auto-detect SQL Server column types (int, bigint, decimal, datetime2, bit, uniqueidentifier, varchar/nvarchar)
Built-in compression - GZip, Brotli, Deflate, ZLib with decompression bomb protection
Real-world data handling - Lenient parsing, smart quotes, duplicate headers, field count mismatches
Faster than LumenWorks & CsvHelper - ~1.5x faster with modern .NET (Span<T>, ArrayPool)
Cancellation & Progress - CancellationToken support and progress callbacks for long imports

Installation

dotnet add package Dataplat.Dbatools.Csv

Or via Package Manager:

Install-Package Dataplat.Dbatools.Csv

Features

Streaming IDataReader - Works seamlessly with SqlBulkCopy and other ADO.NET consumers
Schema Inference - Analyze CSV data to determine optimal SQL Server column types
Strongly Typed Columns - Define column types for automatic conversion with built-in and custom converters
High Performance - ~1.5x faster than LumenWorks/CsvHelper with ArrayPool-based memory management
Parallel Processing - Optional multi-threaded parsing for large files (25K+ rows/sec)
String Interning - Reduce memory for files with repeated values
Compression Support - Automatic handling of GZip, Deflate, Brotli (.NET 8+), and ZLib (.NET 8+)
Culture-Aware Parsing - Configurable type converters for dates, numbers, booleans, and GUIDs
Flexible Delimiters - Single or multi-character delimiters (e.g., ::, ||)
Robust Error Handling - Collect errors, throw on first error, or skip bad rows
Security Built-in - Decompression bomb protection, max field length limits
Smart Quote Handling - Normalize curly/smart quotes from Word/Excel
Lenient Parsing Mode - Handle real-world malformed CSV data gracefully
Duplicate Header Support - Rename, ignore, or use first/last occurrence
Field Count Mismatch Handling - Pad with nulls, truncate, or fail on row length mismatches

Performance

Benchmark: 100,000 rows × 10 columns (.NET 8, AVX-512)

Single column read (typical SqlBulkCopy/IDataReader pattern):

Library	Time (ms)	vs Dataplat
Sep	18 ms	3.7x faster
Sylvan	27 ms	2.5x faster
Dataplat	67 ms	baseline
CsvHelper	76 ms	1.1x slower
LumenWorks	395 ms	5.9x slower

All columns read (full row processing):

Library	Time (ms)	vs Dataplat
Sep	30 ms	1.8x faster
Sylvan	35 ms	1.6x faster
Dataplat	55 ms	baseline
CsvHelper	97 ms	1.8x slower
LumenWorks	102 ms	1.9x slower

Understanding the performance tradeoffs

Sep achieves 21 GB/s by using Span<T> and only materializing strings when explicitly requested. Sylvan uses similar techniques. Both avoid allocations until the last possible moment.

Why Dataplat can't match this: The IDataReader interface requires GetValue() to return actual object instances. For string columns, this means creating real string objects—we can't return spans. This is a fundamental architectural tradeoff for SqlBulkCopy compatibility.

When each library shines:

Scenario	Bottleneck	Winner
CSV → SqlBulkCopy → SQL Server	Network/disk I/O, not parsing	Dataplat (integrated)
CSV.gz → SQL Server	Decompression overhead	Dataplat (built-in)
Messy enterprise exports	Error handling complexity	Dataplat (lenient mode)
Raw in-memory parsing benchmark	CPU/allocations	Sep/Sylvan

For database import workflows, the complete file.csv.gz → SqlBulkCopy → SQL Server pipeline with Dataplat is often comparable to combining Sep + manual decompression + custom IDataReader wrapper, while requiring less code.

Quick Start

Reading CSV Files

using Dataplat.Dbatools.Csv.Reader;

// Simple usage
using var reader = new CsvDataReader("data.csv");
while (reader.Read())
{
    var name = reader.GetString(0);
    var value = reader.GetInt32(1);
}

// With options
var options = new CsvReaderOptions
{
    Delimiter = ";",
    HasHeaderRow = true,
    Culture = CultureInfo.GetCultureInfo("de-DE")
};
using var reader = new CsvDataReader("data.csv", options);

Reading Compressed Files

// Automatically detects compression from extension (.gz, .gzip, .deflate, .br, .zlib)
using var reader = new CsvDataReader("data.csv.gz");

// Or specify explicitly
var options = new CsvReaderOptions
{
    CompressionType = CompressionType.GZip,
    MaxDecompressedSize = 100 * 1024 * 1024  // 100MB limit
};
using var reader = new CsvDataReader(stream, options);

Bulk Loading to SQL Server

using var reader = new CsvDataReader("data.csv");
using var connection = new SqlConnection(connectionString);
connection.Open();

using var bulkCopy = new SqlBulkCopy(connection);
bulkCopy.DestinationTableName = "MyTable";
bulkCopy.WriteToServer(reader);  // Streams directly, low memory usage

Parallel Processing (Large Files)

var options = new CsvReaderOptions
{
    EnableParallelProcessing = true,
    MaxDegreeOfParallelism = Environment.ProcessorCount
};

using var reader = new CsvDataReader("large-file.csv", options);
// Process as normal - parallel parsing happens automatically

Writing CSV Files

using Dataplat.Dbatools.Csv.Writer;

var options = new CsvWriterOptions
{
    Delimiter = ",",
    QuoteAllFields = false,
    Culture = CultureInfo.InvariantCulture
};

using var writer = new CsvWriter("output.csv", options);
writer.WriteHeader(new[] { "Id", "Name", "Date" });
writer.WriteRecord(new object[] { 1, "Test", DateTime.Now });

Error Handling

var options = new CsvReaderOptions
{
    CollectParseErrors = true,
    MaxParseErrors = 100,
    ParseErrorAction = CsvParseErrorAction.AdvanceToNextLine
};

using var reader = new CsvDataReader("data.csv", options);
while (reader.Read())
{
    // Process valid records
}

// Check collected errors
foreach (var error in reader.ParseErrors)
{
    Console.WriteLine($"Row {error.RowIndex}, Line {error.LineNumber}: {error.Message}");
}

Handling Malformed Data

// Handle files with duplicate column names
var options = new CsvReaderOptions
{
    DuplicateHeaderBehavior = DuplicateHeaderBehavior.Rename  // Name, Name_2, Name_3
};

// Handle rows with wrong number of fields
var options = new CsvReaderOptions
{
    MismatchedFieldAction = MismatchedFieldAction.PadOrTruncate
};

// Handle malformed quotes (e.g., unmatched quotes, backslash escapes)
var options = new CsvReaderOptions
{
    QuoteMode = QuoteMode.Lenient
};

// Normalize smart/curly quotes from Word/Excel
var options = new CsvReaderOptions
{
    NormalizeQuotes = true
};

// Distinguish between null and empty string (see examples below)
var options = new CsvReaderOptions
{
    DistinguishEmptyFromNull = true
};

Cancellation Support

using var cts = new CancellationTokenSource();

var options = new CsvReaderOptions
{
    CancellationToken = cts.Token
};

// In another thread or after timeout
cts.CancelAfter(TimeSpan.FromSeconds(30));

try
{
    using var reader = new CsvDataReader("large-file.csv", options);
    while (reader.Read())
    {
        // Process records - will throw OperationCanceledException when cancelled
    }
}
catch (OperationCanceledException)
{
    Console.WriteLine("Import was cancelled");
}

Progress Reporting

var options = new CsvReaderOptions
{
    ProgressReportInterval = 10000,  // Report every 10,000 records
    ProgressCallback = progress =>
    {
        Console.WriteLine($"Processed {progress.RecordsRead:N0} records " +
                          $"({progress.RowsPerSecond:N0} rows/sec)");

        if (progress.PercentComplete >= 0)
            Console.WriteLine($"Progress: {progress.PercentComplete:F1}%");
    }
};

using var reader = new CsvDataReader("large-file.csv", options);
while (reader.Read())
{
    // Process records
}

Schema Inference

Automatically detect optimal SQL Server column types from CSV data. No more nvarchar(MAX) for everything:

using Dataplat.Dbatools.Csv.Reader;

// Fast: Sample first 1000 rows (tiny risk if data changes after sample)
var columns = CsvSchemaInference.InferSchemaFromSample("data.csv");

// Safe: Scan entire file with progress reporting (zero risk of type mismatches)
var columns = CsvSchemaInference.InferSchema("data.csv", null, progress => {
    Console.WriteLine($"Progress: {progress:P0}");
});

// Examine inferred types
foreach (var col in columns)
{
    Console.WriteLine($"{col.ColumnName}: {col.SqlDataType} {(col.IsNullable ? "NULL" : "NOT NULL")}");
}
// Output:
// Id: int NOT NULL
// Name: nvarchar(100) NULL
// Price: decimal(10,2) NOT NULL
// Created: datetime2 NULL

// Generate CREATE TABLE statement
string sql = CsvSchemaInference.GenerateCreateTableStatement(columns, "Products", "dbo");
// CREATE TABLE [dbo].[Products] (
//     [Id] int NOT NULL,
//     [Name] nvarchar(100) NULL,
//     [Price] decimal(10,2) NOT NULL,
//     [Created] datetime2 NULL
// );

// Use inferred types with CsvDataReader
var typeMap = CsvSchemaInference.ToColumnTypes(columns);
var options = new CsvReaderOptions { ColumnTypes = typeMap };
using var reader = new CsvDataReader("data.csv", options);

Detected types: uniqueidentifier, bit, int, bigint, decimal(p,s), datetime2, varchar(n), nvarchar(n) (when Unicode is detected)

InferredColumn properties:

Property	Type	Description
`ColumnName`	string	Column header name
`SqlDataType`	string	SQL Server data type (e.g., `int`, `decimal(10,2)`, `nvarchar(50)`)
`IsNullable`	bool	True if any NULL/empty values were found
`IsUnicode`	bool	True if non-ASCII characters detected
`MaxLength`	int	Maximum string length observed
`Precision`	int	Decimal precision (total digits)
`Scale`	int	Decimal scale (digits after decimal point)
`Ordinal`	int	Column position (0-based)
`TotalCount`	long	Total rows analyzed
`NonNullCount`	long	Rows with non-null values

Strongly Typed Columns

Define column types explicitly for automatic conversion during reading:

var options = new CsvReaderOptions
{
    ColumnTypes = new Dictionary<string, Type>
    {
        ["Id"] = typeof(int),
        ["Price"] = typeof(decimal),
        ["IsActive"] = typeof(bool),
        ["Created"] = typeof(DateTime),
        ["UniqueId"] = typeof(Guid)
    }
};

using var reader = new CsvDataReader("data.csv", options);
while (reader.Read())
{
    int id = reader.GetInt32(0);           // Already converted from string
    decimal price = reader.GetDecimal(1);  // Culture-aware parsing
    bool active = reader.GetBoolean(2);    // Handles true/false/yes/no/1/0
    DateTime created = reader.GetDateTime(3);
    Guid guid = reader.GetGuid(4);
}

Built-in type converters: Guid, bool, DateTime, short, int, long, float, double, decimal, byte, string, money, vector (SQL Server 2025)

Combine with schema inference:

// Infer types from CSV data, then use them for reading
var columns = CsvSchemaInference.InferSchemaFromSample("data.csv");
var typeMap = CsvSchemaInference.ToColumnTypes(columns);

var options = new CsvReaderOptions { ColumnTypes = typeMap };
using var reader = new CsvDataReader("data.csv", options);

Custom type converters:

using Dataplat.Dbatools.Csv.TypeConverters;

// Create a custom converter for enums or custom types
public class StatusConverter : TypeConverterBase<OrderStatus>
{
    public override bool TryConvert(string value, out OrderStatus result)
    {
        return Enum.TryParse(value, true, out result);
    }
}

// Register and use
var registry = TypeConverterRegistry.Default;
registry.Register(new StatusConverter());

var options = new CsvReaderOptions
{
    TypeConverterRegistry = registry,
    ColumnTypes = new Dictionary<string, Type> { ["Status"] = typeof(OrderStatus) }
};

Null vs Empty String Handling

CSV files can represent missing data in two ways: an empty field (,,) or an explicitly quoted empty string (,"",...). The DistinguishEmptyFromNull option controls how these are interpreted.

Example CSV:

Name,Description,Notes
Alice,,""
Bob,"",
Charlie,"Has value","Also has value"

Default behavior (DistinguishEmptyFromNull = false):

Both empty fields and quoted empty strings become empty string (""):

var options = new CsvReaderOptions { DistinguishEmptyFromNull = false }; // default
using var reader = new CsvDataReader("data.csv", options);

reader.Read(); // Alice row
reader.IsDBNull(1);  // false - Description is ""
reader.IsDBNull(2);  // false - Notes is ""
reader.GetString(1); // ""
reader.GetString(2); // ""

With DistinguishEmptyFromNull = true:

Empty fields become null, quoted empty strings remain empty string:

var options = new CsvReaderOptions { DistinguishEmptyFromNull = true };
using var reader = new CsvDataReader("data.csv", options);

reader.Read(); // Alice row
reader.IsDBNull(1);  // true  - Description (,,) is NULL
reader.IsDBNull(2);  // false - Notes ("") is empty string
reader.GetString(1); // throws InvalidCastException (value is null)
reader.GetValue(1);  // DBNull.Value
reader.GetString(2); // ""

When to use this option:

Use Case	Recommendation
SQL bulk import where NULL matters	Enable (`true`)
Database columns with NOT NULL constraints	Disable (`false`) - default
Preserving exact semantics from source system	Enable (`true`)
Simple data processing	Disable (`false`) - default

Quick reference:

CSV Input	`DistinguishEmptyFromNull = false`	`DistinguishEmptyFromNull = true`
`,,` (empty field)	`""` (empty string)	`null` (DBNull.Value)
`,"",` (quoted empty)	`""` (empty string)	`""` (empty string)
`,value,`	`"value"`	`"value"`

LumenWorks Compatibility

For projects migrating from LumenWorks CsvReader, these methods provide familiar APIs:

using var reader = new CsvDataReader("data.csv");

while (reader.Read())
{
    // Get column index by name (-1 if not found, unlike GetOrdinal which throws)
    int idx = reader.GetFieldIndex("ColumnName");

    // Get current record as reconstructed CSV string (useful for error logging)
    string rawData = reader.GetCurrentRawData();

    // Efficiently copy all fields to an array
    string[] values = new string[reader.FieldCount];
    reader.CopyCurrentRecordTo(values);

    // Check if current record had issues
    if (reader.MissingFieldFlag)
        Console.WriteLine("Record had missing fields (padded with nulls)");
    if (reader.ParseErrorFlag)
        Console.WriteLine("Record had a parse error that was skipped");
}

// Check if stream is fully consumed
if (reader.EndOfStream)
    Console.WriteLine("Finished reading all data");

Empty Header Handling

CSV files with empty or whitespace-only headers are automatically assigned default names:

// CSV: Name,,Value
// Headers become: Name, Column1, Value

var options = new CsvReaderOptions
{
    DefaultHeaderName = "Field"  // Custom prefix (default is "Column")
};
// Headers become: Name, Field1, Value

Configuration Options

CsvReaderOptions

Option	Default	Description
`Delimiter`	`","`	Field delimiter (supports multi-character)
`HasHeaderRow`	`true`	First row contains column names
`SkipRows`	`0`	Number of rows to skip before reading
`Culture`	`InvariantCulture`	Culture for parsing numbers/dates
`ParseErrorAction`	`ThrowException`	How to handle parse errors
`CollectParseErrors`	`false`	Collect errors instead of throwing
`MaxParseErrors`	`1000`	Maximum errors to collect
`TrimmingOptions`	`None`	Whitespace trimming options
`CompressionType`	`None`	Compression format (auto-detected by default)
`MaxDecompressedSize`	`10GB`	Limit for decompression bomb protection
`MaxQuotedFieldLength`	`0`	Limit for quoted field length (0 = unlimited)
`QuoteMode`	`Strict`	RFC 4180 strict or lenient parsing mode
`DuplicateHeaderBehavior`	`ThrowException`	How to handle duplicate column names
`MismatchedFieldAction`	`ThrowException`	How to handle rows with wrong field count
`NormalizeQuotes`	`false`	Convert smart/curly quotes to ASCII quotes
`DistinguishEmptyFromNull`	`false`	Distinguish `,,` (null) from `,"",` (empty)
`EnableParallelProcessing`	`false`	Enable multi-threaded parsing
`MaxDegreeOfParallelism`	`0`	Worker threads (0 = processor count)
`InternStrings`	`false`	Intern common string values
`CancellationToken`	`None`	Token to monitor for cancellation requests
`ProgressReportInterval`	`10000`	Records between progress reports (0 = disabled)
`ProgressCallback`	`null`	Callback receiving `CsvProgress` updates

Thread Safety

When parallel processing is enabled, CsvDataReader provides the following thread-safety guarantees:

Method/Property	Thread-Safe	Notes
`GetValue()`	Yes	Returns consistent snapshot of current record
`GetValues()`	Yes	Atomic copy of all values in current record
`CurrentRecordIndex`	Yes	No torn reads on 64-bit values
`Close()` / `Dispose()`	Yes	Safely stops parallel pipeline from any thread
`Read()`	No	Only one thread should call Read()

Usage Pattern

var options = new CsvReaderOptions
{
    EnableParallelProcessing = true,
    MaxDegreeOfParallelism = 4
};

using var reader = new CsvDataReader("large-file.csv", options);

while (reader.Read())  // Main thread only
{
    // Safe to read values from multiple threads concurrently
    Parallel.For(0, 4, _ =>
    {
        var values = new object[reader.FieldCount];
        reader.GetValues(values);  // Thread-safe
        ProcessValues(values);
    });
}

Important Notes

Sequential mode (parallel processing disabled): The reader is not thread-safe. All access should be from a single thread.
Snapshot semantics: Values returned by GetValue()/GetValues() represent a snapshot that may change after the next Read() call.
Single reader thread: Only one thread should call Read() at a time. Concurrent Read() calls are not supported.

Target Frameworks

.NET Framework 4.7.2
.NET 8.0

Security Considerations

QuoteMode.Lenient: Deviates from RFC 4180 and may parse data differently than expected. Use only for known malformed data sources.
MismatchedFieldAction.PadWithNulls/TruncateExtra: May mask data corruption or cause silent data loss. Use with caution on untrusted data.
MaxDecompressedSize: Always set an appropriate limit when processing compressed files from untrusted sources to prevent decompression bomb attacks.
MaxQuotedFieldLength: Set a limit when processing untrusted data to prevent memory exhaustion from malformed multiline quoted fields.

dbatools - PowerShell module for SQL Server DBAs
dbatools.library - Core library for dbatools

License

MIT License - see the LICENSE file for details.

Development

This CSV library was created using Claude Code (Opus 4.5) with the following initial prompt:

the dbatools repo is at C:\github\dbatools and this repo is at C:\github\dbatools.library

i would like to create a replacement for LumenWorks.Framework.IO.dll PLUS the additional functionality requested in dbatools issues on github which you can find using the gh command

the source code for lumenworks is https://github.com/phatcher/CsvReader/tree/master/code/LumenWorks.Framework.IO

This library was written over a decade ago. considering the advances in .NET and SqlClient etc, please add a CSV reader of better quality (more functionality often seen in paid systems, faster) using recent .NET and Microsoft Data best practices

Please ultrathink about the best way to go about creating this new, extensive functionality within the dbatools library. if it should be a new project that is linked or whatever, do it in this repo.

Additional refinements included a security review and feature additions based on dbatools GitHub issues.

Product	Compatible and additional computed target framework versions.
.NET	net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.
.NET Framework	net472 is compatible. net48 was computed. net481 was computed.

Product

.NET

.NET Framework

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

.NETFramework 4.7.2
- System.Buffers (>= 4.5.1)
- System.Memory (>= 4.5.5)
net8.0
- No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
1.1.15	96	12/28/2025
1.1.10	173	12/26/2025
1.1.5	195	12/4/2025
1.1.1	190	12/4/2025
1.1.0	192	12/4/2025
1.0.2	654	12/3/2025

v1.1.10: SQL Server schema inference - auto-detect column types (int, bigint, decimal, datetime2, bit, uniqueidentifier, varchar/nvarchar). v1.1.5: Updated package metadata and URL. v1.1.1: ~25% performance improvement for all-columns reads. v1.1.0: Added CancellationToken and progress reporting support.