DataLinq.Spark 1.1.0

.NET 8.0

dotnet add package DataLinq.Spark --version 1.1.0

NuGet\Install-Package DataLinq.Spark -Version 1.1.0

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="DataLinq.Spark" Version="1.1.0" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="DataLinq.Spark" Version="1.1.0" />
                    

                            Directory.Packages.props

<PackageReference Include="DataLinq.Spark" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add DataLinq.Spark --version 1.1.0

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: DataLinq.Spark, 1.1.0"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package DataLinq.Spark@1.1.0

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=DataLinq.Spark&version=1.1.0
                    

                            Install as a Cake Addin

#tool nuget:?package=DataLinq.Spark&version=1.1.0
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

DataLinq.Spark

LINQ-native Apache Spark integration for DataLinq.NET.

Migrating from dotnet/spark? Microsoft deprecated it March 2025. DataLinq.Spark is the maintained successor — same cluster, cleaner API, no spark-submit required. Migration guide →

dotnet add package DataLinq.Spark --version 1.1.0

Free dev tier included — 1,000 rows, no license key, no credit card. The core DataLinq.NET package (streaming, SUPRA pattern, Cases, EF Core) is Apache 2.0 free and a dependency.

📖 LINQ-to-Spark Guide | DataLinq.NET on GitHub | 🌐 Product Website

Features

Native LINQ Translation - Write C# LINQ, execute distributed Spark
Streaming Results - Efficient processing with DataFrames
Type Safety - Strong typing with automatic column mapping
Distributed Processing - Scale to petabytes with Apache Spark
O(1) Memory Writes - Batched streaming for table writes
Window Functions - Rank, Lead, Lag, running aggregates with expression syntax
Cases Pattern - Multi-output conditional routing
Auto-UDF — Custom methods in Where/Select auto-translate to Spark UDFs (static, instance, lambda)
ForEach — Distributed side effects with automatic field sync-back to the driver
In-Memory Push - context.Push(data) for test data injection

Quick Start

using DataLinq.Spark;

// Connect to Spark (local mode)
using var context = Spark.Connect("local[*]", "MyApp");

// Production cluster examples:
// using var context = Spark.Connect("spark://spark-master:7077", "MyApp");
// using var context = Spark.Connect("yarn", "MyApp");

// Query with LINQ (cluster-side execution)
var stats = context.Read.Table<Order>("sales.orders")
    .Where(o => o.Amount > 1000)
    .GroupBy(o => o.Region)
    .Select(g => new { Region = g.Key, Total = g.Sum(o => o.Amount) })
    .ToList();

// Side effects with ForEach (executes on Spark executors - NOT locally!)
int processed = 0;
context.Read.Table<Order>("sales.orders")
    .ForEach(o => processed++)
    .Do();  // ← Triggers distributed execution; field sync-back happens here
Console.WriteLine($"Processed {processed} orders");

Write Operations

using DataLinq.Spark;

// From SparkQuery (server-side)
await context.Read.Table<Order>("orders")
    .Where(o => o.Amount > 1000)
    .WriteParquet("/output/high_value");

await context.Read.Table<Order>("orders")
    .WriteTable("analytics.summary", overwrite: true);

// From local IEnumerable (client → server, context required)
await data.WriteTable(context, "orders", overwrite: true, bufferSize: 10_000);
await data.WriteParquet(context, "hdfs://data/orders.parquet", bufferSize: 50_000);

Test Coverage

Tier	Tests	Pass	Coverage
Unit Tests	122	122	100%
Integration Tests	250	250	100%
Adversarial Audit	306	306	100%
TOTAL	678	678	100%

Requirements

.NET 8.0+
DataLinq.NET 1.0.0+
Apache Spark 3.5.0+
DataLinq.Spark license for production

Before You Run

DataLinq.Spark is the developer layer — your DevOps/infra team owns the Spark cluster setup.

# Verify Spark is available:
spark-submit --version

# Or verify your cluster master is reachable:
curl http://spark-master:8080

If Spark.Connect(...) fails immediately, the issue is most likely your Spark environment. See the Apache Spark installation guide.

ForEach — Distributed Side Effects with Sync-Back

ForEach runs your code on Spark executors, then automatically syncs field mutations back to the driver:

// Static fields sync back after Do():
query.ForEach(OrderStats.ProcessOrder).Do();
Console.WriteLine(OrderStats.TotalAmount);  // ← Updated correctly

// Lambda closures sync back:
int count = 0;
query.ForEach(o => count++).Do();
Console.WriteLine(count);  // ← Updated correctly

// Instance fields sync back:
var processor = new OrderProcessor();
query.ForEach(processor.Process).Do();
Console.WriteLine(processor.Processed);  // ← Updated correctly

Limitations: Collections (List<T>, arrays) are not synchronized — use scalar accumulators. The Roslyn analyzer warns at compile time (DFSP001, DFSP002).

Support & Issues

📧 Contact: support@get-datalinq.net
🐛 Report Issues: github.com/improveTheWorld/DataLinq.NET/issues

License

Free Tier (No Setup Required)

DataLinq.Spark works out of the box with no license and no configuration. The free tier allows up to 1,000 rows per query — exceeding this throws a LicenseException. No environment variables, no opt-in needed. Just install and run.

Production License

For production workloads (unlimited rows), obtain a license at:

🌐 Pricing: https://get-datalinq.net/pricing
📧 Contact: support@get-datalinq.net

Set your license key as an environment variable (auto-detected at runtime):

# PowerShell
$env:DATALINQ_LICENSE_KEY="your-license-key"

# Bash/Linux/macOS
export DATALINQ_LICENSE_KEY="your-license-key"

# Docker / Kubernetes
ENV DATALINQ_LICENSE_KEY=your-license-key

Security: The license key is never in source code. Set it in your deployment environment (CI/CD secrets, Azure Key Vault, AWS Secrets Manager, etc.)

Product	Compatible and additional computed target framework versions.
.NET	net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net8.0
- DataLinq.Net (>= 1.0.0)
- Microsoft.Spark (>= 2.3.0)

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
1.1.0	87	3/27/2026
1.0.0	86	3/18/2026

v1.1.0: Task-returning Write API (CS4014 safety), expression-based MergeTable updateOnly, BREAKING: SaveMode enum replaced by bool overwrite/createIfMissing params. Full notes: https://github.com/improveTheWorld/DataLinq.NET/blob/main/releasenotes/DataLinq.Spark_1.1.0.md