DataFlow.Spark
1.1.0
dotnet add package DataFlow.Spark --version 1.1.0
NuGet\Install-Package DataFlow.Spark -Version 1.1.0
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="DataFlow.Spark" Version="1.1.0" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="DataFlow.Spark" Version="1.1.0" />
<PackageReference Include="DataFlow.Spark" />
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add DataFlow.Spark --version 1.1.0
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
#r "nuget: DataFlow.Spark, 1.1.0"
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package DataFlow.Spark@1.1.0
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=DataFlow.Spark&version=1.1.0
#tool nuget:?package=DataFlow.Spark&version=1.1.0
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
DataFlow.Spark
LINQ-native Apache Spark integration for DataFlow.NET.
Quality Metrics
| Metric | Value |
|---|---|
| Tests | 182 (100% passing) |
| LINQ Coverage | ~85% of common operations |
Features
- Native LINQ Translation - Write C# LINQ, execute distributed Spark
- Streaming Results - Efficient processing with DataFrames
- Type Safety - Strong typing with automatic column mapping
- Distributed Processing - Scale to petabytes with Apache Spark
- O(1) Memory Writes - Batched streaming for table writes
- Window Functions - Rank, Lead, Lag, running aggregates with expression syntax
- Cases Pattern - Multi-output conditional routing
- In-Memory Push -
context.Push(data)for test data injection
Quick Start
using DataFlow.Spark;
// Connect to Spark (local mode)
using var context = Spark.Connect("local[*]", "MyApp");
// Production cluster examples:
// using var context = Spark.Connect("spark://spark-master:7077", "MyApp");
// using var context = Spark.Connect("yarn", "MyApp");
// Query with LINQ (cluster-side execution)
var stats = context.Read.Table<Order>("sales.orders")
.Where(o => o.Amount > 1000)
.GroupBy(o => o.Region)
.Select(g => new { Region = g.Key, Total = g.Sum(o => o.Amount) })
.ToList();
// Side effects with ForEach (executes on Spark executors - NOT locally!)
context.Read.Table<Order>("sales.orders")
.ForEach(o => Metrics.Increment("orders_processed")) // Runs on cluster
.Show();
Write Operations
// Bulk write with O(1) memory streaming
await data.WriteTable(context, "orders", bufferSize: 10_000).Overwrite();
// File exports with custom buffer size
await data.WriteParquet(context, "hdfs://data/orders.parquet", bufferSize: 50_000);
await data.WriteCsv(context, "s3://bucket/orders.csv", bufferSize: 10_000);
// Async streaming with timeout flush (for IAsyncEnumerable)
await asyncStream.WriteParquet(context, "path.parquet",
bufferSize: 5_000,
flushInterval: TimeSpan.FromSeconds(30));
// Merge (upsert) into Delta tables
await data.MergeTable(context, "delta/orders", o => o.OrderId, bufferSize: 10_000);
Requirements
- .NET 8.0+
- DataFlow.Net 1.1.0+
- Apache Spark 3.5.0+
- DataFlow.Spark license for production
Support & Issues
📧 Contact: tecnet.paris@gmail.com
🐛 Report Issues: github.com/improveTheWorld/DataFlow.NET/issues
License
Free development tier (DEBUG builds, 1,000 row limit).
Production licensing: Coming soon at https://get-dataflow.net/pricing
📧 Urgent inquiries: tecnet.paris@gmail.com
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
-
net8.0
- DataFlow.Net (>= 1.1.0)
- Microsoft.Spark (>= 2.3.0)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
v1.1.0: Unified Context API. O(1) memory streaming writes. Push() for in-memory data. WithWindowTyped() with expression aggregates. 182 tests (100% pass).