ParquetSharpLINQ 1.1.0-alpha.3

This is a prerelease version of ParquetSharpLINQ.

dotnet add package ParquetSharpLINQ --version 1.1.0-alpha.3

NuGet\Install-Package ParquetSharpLINQ -Version 1.1.0-alpha.3

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="ParquetSharpLINQ" Version="1.1.0-alpha.3" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="ParquetSharpLINQ" Version="1.1.0-alpha.3" />
                    

                            Directory.Packages.props

<PackageReference Include="ParquetSharpLINQ" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add ParquetSharpLINQ --version 1.1.0-alpha.3

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: ParquetSharpLINQ, 1.1.0-alpha.3"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package ParquetSharpLINQ@1.1.0-alpha.3

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=ParquetSharpLINQ&version=1.1.0-alpha.3&prerelease
                    

                            Install as a Cake Addin

#tool nuget:?package=ParquetSharpLINQ&version=1.1.0-alpha.3&prerelease
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

ParquetSharpLINQ

A high-performance LINQ provider for querying Hive-partitioned and Delta Lake Parquet files with automatic query optimization.

Features

Source Generated Mappers - Zero reflection for data mapping
Delta Lake Support - Automatic transaction log reading
Partition Pruning - Only scans matching partitions
Column Projection - Only reads requested columns
Indexed Column Predicates - Row-group pruning for indexed properties
Type Safe - Compile-time validation
Cross-Platform - Works on Windows and Linux
Azure Blob Storage - Stream directly from cloud storage

Quick Start

Installation

dotnet add package ParquetSharpLINQ

For Azure Blob Storage support:

dotnet add package ParquetSharpLINQ.Azure

Define Your Entity

using ParquetSharpLINQ.Attributes;

public class SalesRecord
{
    [ParquetColumn("id")]
    public long Id { get; set; }
    
    [ParquetColumn("product_name")]
    public string ProductName { get; set; }
    
    [ParquetColumn("total_amount")]
    public decimal TotalAmount { get; set; }
    
    [ParquetColumn("year", IsPartition = true)]
    public int Year { get; set; }
    
    [ParquetColumn("region", IsPartition = true)]
    public string Region { get; set; }
}

Query Local Files

using ParquetSharpLINQ;

using var table = ParquetTable<SalesRecord>.Factory.FromFileSystem("/data/sales");

var count = table.Count(s => s.Region == "eu-west" && s.Year == 2024);
var results = table.Where(s => s.TotalAmount > 1000).ToList();

Query Delta Lake Tables

using ParquetSharpLINQ;

// Automatically detects and reads _delta_log/
using var table = ParquetTable<SalesRecord>.Factory.FromFileSystem("/data/delta-sales");

var results = table.Where(s => s.Year == 2024).ToList();

Query Azure Blob Storage

using ParquetSharpLINQ;
using ParquetSharpLINQ.Azure;  // Extension methods for Azure support

using var table = ParquetTable<SalesRecord>.Factory.FromAzureBlob(
    connectionString: "DefaultEndpointsProtocol=https;AccountName=...",
    containerName: "sales-data");

var results = table.Where(s => s.Year == 2024).ToList();

Directory Structure

Hive-style partitioning:

/data/sales/
├── year=2023/
│   ├── region=us-east/
│   │   └── data.parquet
│   └── region=eu-west/
│       └── data.parquet
└── year=2024/
    └── region=us-east/
        └── data.parquet

Delta Lake:

/data/delta-sales/
├── _delta_log/
│   ├── 00000000000000000000.json
│   └── 00000000000000000001.json
└── year=2024/
    └── data.parquet

Key Features

Partition Pruning

All LINQ methods with predicates support automatic partition pruning:

Count(predicate), LongCount(predicate)
Any(predicate), All(predicate)
First(predicate), FirstOrDefault(predicate)
Single(predicate), SingleOrDefault(predicate)
Last(predicate), LastOrDefault(predicate)
Where(predicate)

Column Projection

Only requested columns are read. When there is no Select projection, only mapped entity columns are read (partition columns are enriched from directory metadata, not read from Parquet files):

var summary = table
    .Select(s => new { s.Id, s.ProductName })
    .ToList();

Indexed Column Predicates

Mark properties as indexed to enable fast row-group pruning for Count and Where predicates.

using ParquetSharpLINQ.Attributes;

public class SalesRecord
{
    [ParquetColumn("id", Indexed = true)]
    public long Id { get; set; }

    [ParquetColumn("product_name", Indexed = true, ComparerType = typeof(StringComparer))]
    public string ProductName { get; set; }
}

Notes:

Indexing uses values read per row group and an in-memory cache per column/file.
ComparerType is optional; if omitted, the property type must implement IComparable or IComparable<T>.
Currently optimized constraints: equality, inequality, range comparisons, and string.StartsWith with StringComparison.Ordinal.

AllowMissing

Use AllowMissing to permit missing columns (nullable properties only):

public class SalesRecord
{
    [ParquetColumn("optional_note", AllowMissing = true)]
    public string? OptionalNote { get; set; }
}

Automatic Type Conversion

Partition directory names (strings) are converted to property types:

"06" → 6 (int)
"2024-12-07" → DateTime or DateOnly

Case-Insensitive Matching

Column names and partition values are case-insensitive. For partition filtering, use lowercase values or case-insensitive comparison:

// Recommended: lowercase (matches normalized values)
table.Where(s => s.Region == "us-east")

// Alternative: case-insensitive comparison
table.Where(s => s.Region.Equals("US-EAST", StringComparison.OrdinalIgnoreCase))

Delta Lake Support

Automatically detects _delta_log/ directory and:

Reads transaction log files
Queries only active files (respects deletes/updates)
Falls back to Hive-style scanning if no Delta log found

Supported: Add, Remove, Metadata, Protocol actions
Not supported: Time travel, Checkpoints (uses JSON logs only)

Testing

# All tests
dotnet test

# Unit tests only
dotnet test --filter "Category=Unit"

# Integration tests
dotnet test --filter "Category=Integration"

See ParquetSharpLINQ.Tests/README.md for details.

Performance

Benchmark results with 180 partitions (900K records):

Query	Partitions Read	Speedup
Full scan	180/180	1.0x
`region='eu-west'`	36/180	~5x
`year=2024 AND region='eu-west'`	12/180	~15x

Indexed column benchmarks (180 partitions, 540 parquet files, 5,400,000 records):

Method	Mean	Error	StdDev	Allocated
Indexed<br>`.Count(r => r.ClientId.StartsWith("46"))`	7.035 ms	0.1072 ms	0.0166 ms	105.62 KB
Non Indexed<br>`.Count(r => r.ClientId.StartsWith("46"))`	5,938.280 ms	1,191.0275 ms	309.3061 ms	6,973,873.87 KB
Indexed<br>`.Where(r => r.ClientId.StartsWith("46"))`<br>`.ToList()`	343.405 ms	8.3807 ms	2.1765 ms	352,443.88 KB
Non Indexed<br>`.Where(r => r.ClientId.StartsWith("46"))`<br>`.ToList()`	6,274.230 ms	1,088.8686 ms	168.5036 ms	6,974,388.8 KB
Indexed<br>`.Where(r => r.ClientId.StartsWith("46"))`<br>`.Select(r => r.ProductName)`<br>`.ToList()`	254.965 ms	11.1857 ms	2.9049 ms	250,093.02 KB
Non Indexed<br>`.Where(r => r.ClientId.StartsWith("46"))`<br>`.Select(r => r.ProductName)`<br>`.ToList()`	3,902.831 ms	499.7335 ms	129.7792 ms	4,668,789.84 KB

Benchmarks

cd ParquetSharpLINQ.Benchmarks

# Generate test data
dotnet run -c Release -- generate ./data 5000

# Run benchmarks  
dotnet run -c Release -- analyze ./data

See ParquetSharpLINQ.Benchmarks/README.md for details.

Requirements

.NET 8.0 or higher
ParquetSharp 21.0.0+

Architecture

ParquetSharpLINQ - Core LINQ query provider with composition-based design
- ParquetTable<T> - Main queryable interface (implements IQueryable<T>)
- ParquetTableFactory<T> - Factory for creating ParquetTable instances
- ParquetQueryProvider<T> - LINQ expression tree visitor and query optimizer
- ParquetEnumerationStrategy<T> - Executes queries with partition pruning, indexing, and column projection
- IPartitionDiscoveryStrategy - Pluggable partition discovery interface
- IParquetReader - Pluggable Parquet reading interface
- FileSystemPartitionDiscovery - Discovers Hive partitions and Delta logs from local filesystem
- ParquetSharpReader - Reads Parquet files from local filesystem
ParquetSharpLINQ.Generator - Source generator for zero-reflection data mappers
ParquetSharpLINQ.Azure - Azure Blob Storage extension package
- ParquetTableFactoryExtensions - Adds FromAzureBlob() factory method
- AzureBlobPartitionDiscovery - Discovers partitions and Delta logs from Azure Blob Storage
- AzureBlobParquetReader - Streams Parquet files from Azure with LRU caching
ParquetSharpLINQ.Tests - Unit and integration tests
ParquetSharpLINQ.Benchmarks - Performance testing

License

MIT License - see LICENSE for details.

Author

Kornél Naszály - GitHub

Product	Compatible and additional computed target framework versions.
.NET	net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net10.0
- ParquetSharp (>= 21.0.0)
net8.0
- ParquetSharp (>= 21.0.0)
net9.0
- ParquetSharp (>= 21.0.0)

NuGet packages (1)

Showing the top 1 NuGet packages that depend on ParquetSharpLINQ:

Package	Downloads
ParquetSharpLINQ.Azure Azure Blob Storage support for ParquetSharpLINQ with Delta Lake. Stream Parquet and Delta Lake tables directly from Azure without downloading to disk, with automatic caching and the same LINQ query capabilities including Delta transaction log support.	2.4K

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
1.1.0-alpha.3	131	12/23/2025
1.1.0-alpha.2	140	12/23/2025
1.1.0-alpha.1	116	12/21/2025
1.0.0-alpha.6	79	12/11/2025
1.0.0-alpha.5	395	12/10/2025
1.0.0-alpha.4	394	12/9/2025
1.0.0-alpha.3	395	12/9/2025
1.0.0-alpha.2	410	12/8/2025
1.0.0-alpha.1	371	12/8/2025

Initial release with LINQ query support, Hive partitioning, Delta Lake support, source generation, and partition pruning.