ParquetSharpLINQ 1.1.0-alpha.3

This is a prerelease version of ParquetSharpLINQ.
dotnet add package ParquetSharpLINQ --version 1.1.0-alpha.3
                    
NuGet\Install-Package ParquetSharpLINQ -Version 1.1.0-alpha.3
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="ParquetSharpLINQ" Version="1.1.0-alpha.3" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="ParquetSharpLINQ" Version="1.1.0-alpha.3" />
                    
Directory.Packages.props
<PackageReference Include="ParquetSharpLINQ" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add ParquetSharpLINQ --version 1.1.0-alpha.3
                    
#r "nuget: ParquetSharpLINQ, 1.1.0-alpha.3"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package ParquetSharpLINQ@1.1.0-alpha.3
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=ParquetSharpLINQ&version=1.1.0-alpha.3&prerelease
                    
Install as a Cake Addin
#tool nuget:?package=ParquetSharpLINQ&version=1.1.0-alpha.3&prerelease
                    
Install as a Cake Tool

ParquetSharpLINQ

NuGet NuGet Downloads Build Integration Tests License .NET

A high-performance LINQ provider for querying Hive-partitioned and Delta Lake Parquet files with automatic query optimization.

Features

  • Source Generated Mappers - Zero reflection for data mapping
  • Delta Lake Support - Automatic transaction log reading
  • Partition Pruning - Only scans matching partitions
  • Column Projection - Only reads requested columns
  • Indexed Column Predicates - Row-group pruning for indexed properties
  • Type Safe - Compile-time validation
  • Cross-Platform - Works on Windows and Linux
  • Azure Blob Storage - Stream directly from cloud storage

Quick Start

Installation

dotnet add package ParquetSharpLINQ

For Azure Blob Storage support:

dotnet add package ParquetSharpLINQ.Azure

Define Your Entity

using ParquetSharpLINQ.Attributes;

public class SalesRecord
{
    [ParquetColumn("id")]
    public long Id { get; set; }
    
    [ParquetColumn("product_name")]
    public string ProductName { get; set; }
    
    [ParquetColumn("total_amount")]
    public decimal TotalAmount { get; set; }
    
    [ParquetColumn("year", IsPartition = true)]
    public int Year { get; set; }
    
    [ParquetColumn("region", IsPartition = true)]
    public string Region { get; set; }
}

Query Local Files

using ParquetSharpLINQ;

using var table = ParquetTable<SalesRecord>.Factory.FromFileSystem("/data/sales");

var count = table.Count(s => s.Region == "eu-west" && s.Year == 2024);
var results = table.Where(s => s.TotalAmount > 1000).ToList();

Query Delta Lake Tables

using ParquetSharpLINQ;

// Automatically detects and reads _delta_log/
using var table = ParquetTable<SalesRecord>.Factory.FromFileSystem("/data/delta-sales");

var results = table.Where(s => s.Year == 2024).ToList();

Query Azure Blob Storage

using ParquetSharpLINQ;
using ParquetSharpLINQ.Azure;  // Extension methods for Azure support

using var table = ParquetTable<SalesRecord>.Factory.FromAzureBlob(
    connectionString: "DefaultEndpointsProtocol=https;AccountName=...",
    containerName: "sales-data");

var results = table.Where(s => s.Year == 2024).ToList();

Directory Structure

Hive-style partitioning:

/data/sales/
├── year=2023/
│   ├── region=us-east/
│   │   └── data.parquet
│   └── region=eu-west/
│       └── data.parquet
└── year=2024/
    └── region=us-east/
        └── data.parquet

Delta Lake:

/data/delta-sales/
├── _delta_log/
│   ├── 00000000000000000000.json
│   └── 00000000000000000001.json
└── year=2024/
    └── data.parquet

Key Features

Partition Pruning

All LINQ methods with predicates support automatic partition pruning:

  • Count(predicate), LongCount(predicate)
  • Any(predicate), All(predicate)
  • First(predicate), FirstOrDefault(predicate)
  • Single(predicate), SingleOrDefault(predicate)
  • Last(predicate), LastOrDefault(predicate)
  • Where(predicate)

Column Projection

Only requested columns are read. When there is no Select projection, only mapped entity columns are read (partition columns are enriched from directory metadata, not read from Parquet files):

var summary = table
    .Select(s => new { s.Id, s.ProductName })
    .ToList();

Indexed Column Predicates

Mark properties as indexed to enable fast row-group pruning for Count and Where predicates.

using ParquetSharpLINQ.Attributes;

public class SalesRecord
{
    [ParquetColumn("id", Indexed = true)]
    public long Id { get; set; }

    [ParquetColumn("product_name", Indexed = true, ComparerType = typeof(StringComparer))]
    public string ProductName { get; set; }
}

Notes:

  • Indexing uses values read per row group and an in-memory cache per column/file.
  • ComparerType is optional; if omitted, the property type must implement IComparable or IComparable<T>.
  • Currently optimized constraints: equality, inequality, range comparisons, and string.StartsWith with StringComparison.Ordinal.

AllowMissing

Use AllowMissing to permit missing columns (nullable properties only):

public class SalesRecord
{
    [ParquetColumn("optional_note", AllowMissing = true)]
    public string? OptionalNote { get; set; }
}

Automatic Type Conversion

Partition directory names (strings) are converted to property types:

  • "06"6 (int)
  • "2024-12-07"DateTime or DateOnly

Case-Insensitive Matching

Column names and partition values are case-insensitive. For partition filtering, use lowercase values or case-insensitive comparison:

// Recommended: lowercase (matches normalized values)
table.Where(s => s.Region == "us-east")

// Alternative: case-insensitive comparison
table.Where(s => s.Region.Equals("US-EAST", StringComparison.OrdinalIgnoreCase))

Delta Lake Support

Automatically detects _delta_log/ directory and:

  • Reads transaction log files
  • Queries only active files (respects deletes/updates)
  • Falls back to Hive-style scanning if no Delta log found

Supported: Add, Remove, Metadata, Protocol actions
Not supported: Time travel, Checkpoints (uses JSON logs only)

Testing

# All tests
dotnet test

# Unit tests only
dotnet test --filter "Category=Unit"

# Integration tests
dotnet test --filter "Category=Integration"

See ParquetSharpLINQ.Tests/README.md for details.

Performance

Benchmark results with 180 partitions (900K records):

Query Partitions Read Speedup
Full scan 180/180 1.0x
region='eu-west' 36/180 ~5x
year=2024 AND region='eu-west' 12/180 ~15x

Indexed column benchmarks (180 partitions, 540 parquet files, 5,400,000 records):

Method Mean Error StdDev Allocated
Indexed<br>.Count(r => r.ClientId.StartsWith("46")) 7.035 ms 0.1072 ms 0.0166 ms 105.62 KB
Non Indexed<br>.Count(r => r.ClientId.StartsWith("46")) 5,938.280 ms 1,191.0275 ms 309.3061 ms 6,973,873.87 KB
Indexed<br>.Where(r => r.ClientId.StartsWith("46"))<br>.ToList() 343.405 ms 8.3807 ms 2.1765 ms 352,443.88 KB
Non Indexed<br>.Where(r => r.ClientId.StartsWith("46"))<br>.ToList() 6,274.230 ms 1,088.8686 ms 168.5036 ms 6,974,388.8 KB
Indexed<br>.Where(r => r.ClientId.StartsWith("46"))<br>.Select(r => r.ProductName)<br>.ToList() 254.965 ms 11.1857 ms 2.9049 ms 250,093.02 KB
Non Indexed<br>.Where(r => r.ClientId.StartsWith("46"))<br>.Select(r => r.ProductName)<br>.ToList() 3,902.831 ms 499.7335 ms 129.7792 ms 4,668,789.84 KB

Benchmarks

cd ParquetSharpLINQ.Benchmarks

# Generate test data
dotnet run -c Release -- generate ./data 5000

# Run benchmarks  
dotnet run -c Release -- analyze ./data

See ParquetSharpLINQ.Benchmarks/README.md for details.

Requirements

  • .NET 8.0 or higher
  • ParquetSharp 21.0.0+

Architecture

  • ParquetSharpLINQ - Core LINQ query provider with composition-based design
    • ParquetTable<T> - Main queryable interface (implements IQueryable<T>)
    • ParquetTableFactory<T> - Factory for creating ParquetTable instances
    • ParquetQueryProvider<T> - LINQ expression tree visitor and query optimizer
    • ParquetEnumerationStrategy<T> - Executes queries with partition pruning, indexing, and column projection
    • IPartitionDiscoveryStrategy - Pluggable partition discovery interface
    • IParquetReader - Pluggable Parquet reading interface
    • FileSystemPartitionDiscovery - Discovers Hive partitions and Delta logs from local filesystem
    • ParquetSharpReader - Reads Parquet files from local filesystem
  • ParquetSharpLINQ.Generator - Source generator for zero-reflection data mappers
  • ParquetSharpLINQ.Azure - Azure Blob Storage extension package
    • ParquetTableFactoryExtensions - Adds FromAzureBlob() factory method
    • AzureBlobPartitionDiscovery - Discovers partitions and Delta logs from Azure Blob Storage
    • AzureBlobParquetReader - Streams Parquet files from Azure with LRU caching
  • ParquetSharpLINQ.Tests - Unit and integration tests
  • ParquetSharpLINQ.Benchmarks - Performance testing

License

MIT License - see LICENSE for details.

Author

Kornél Naszály - GitHub

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 is compatible.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on ParquetSharpLINQ:

Package Downloads
ParquetSharpLINQ.Azure

Azure Blob Storage support for ParquetSharpLINQ with Delta Lake. Stream Parquet and Delta Lake tables directly from Azure without downloading to disk, with automatic caching and the same LINQ query capabilities including Delta transaction log support.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.1.0-alpha.3 128 12/23/2025
1.1.0-alpha.2 130 12/23/2025
1.1.0-alpha.1 113 12/21/2025
1.0.0-alpha.6 75 12/11/2025
1.0.0-alpha.5 389 12/10/2025
1.0.0-alpha.4 385 12/9/2025
1.0.0-alpha.3 383 12/9/2025
1.0.0-alpha.2 398 12/8/2025
1.0.0-alpha.1 367 12/8/2025

Initial release with LINQ query support, Hive partitioning, Delta Lake support, source generation, and partition pruning.