AzureDataLakeTools.Storage
1.0.0.4
dotnet add package AzureDataLakeTools.Storage --version 1.0.0.4
NuGet\Install-Package AzureDataLakeTools.Storage -Version 1.0.0.4
<PackageReference Include="AzureDataLakeTools.Storage" Version="1.0.0.4" />
<PackageVersion Include="AzureDataLakeTools.Storage" Version="1.0.0.4" />
<PackageReference Include="AzureDataLakeTools.Storage" />
paket add AzureDataLakeTools.Storage --version 1.0.0.4
#r "nuget: AzureDataLakeTools.Storage, 1.0.0.4"
#:package AzureDataLakeTools.Storage@1.0.0.4
#addin nuget:?package=AzureDataLakeTools.Storage&version=1.0.0.4
#tool nuget:?package=AzureDataLakeTools.Storage&version=1.0.0.4
Azure Data Lake Tools
A comprehensive .NET library for working with Azure Data Lake Storage Gen2, supporting JSON, Parquet, and CSV file formats with thread-safe client caching and an intuitive API.
Features
- Multiple File Format Support
- JSON files with Newtonsoft.Json
- Parquet files with Parquet.Net and custom attribute mapping
- CSV files with CsvHelper and flexible column mapping
- Thread-Safe Client Management: Efficient caching of Azure Data Lake clients
- Comprehensive File Operations: Read, write, update, and validate files
- Directory Operations: List files with filtering and metadata
- Async/Await Pattern: All I/O operations are asynchronous
- Dependency Injection Ready: Easy integration with .NET DI containers
- Robust Error Handling: Detailed error messages and proper exception handling
- Automatic Directory Creation: Creates directories as needed
- Flexible Configuration: Multiple connection string resolution options
Prerequisites
- .NET 6.0 or later
- Azure Data Lake Storage Gen2 account
- Connection string with appropriate permissions
Installation
dotnet add package AzureDataLakeTools.Storage
Configuration
Add your Data Lake connection string to your appsettings.json
:
{
"DataLakeConnectionString": "DefaultEndpointsProtocol=https;AccountName=..."
// OR
"DataLake": {
"ConnectionString": "DefaultEndpointsProtocol=https;AccountName=..."
}
}
Quick Start
using AzureDataLakeTools.Storage;
using Microsoft.Extensions.Configuration;
// Set up configuration
var configuration = new ConfigurationBuilder()
.AddJsonFile("appsettings.json")
.AddEnvironmentVariables()
.Build();
// Create a new instance of AzureDataLakeContext
var dataLakeContext = new AzureDataLakeContext(configuration);
// Store an object as JSON
var user = new { Id = 1, Name = "John Doe", Email = "john@example.com" };
await dataLakeContext.StoreItemAsJson(user, "users", "myfilesystem", "user_1.json");
// Read it back
var retrievedUser = await dataLakeContext.ReadJsonFile<User>("users/user_1.json", "myfilesystem");
Detailed Usage Examples
JSON Operations
// Store a single item as JSON
var product = new Product { Id = 1, Name = "Laptop", Price = 999.99 };
await dataLakeContext.StoreItemAsJson(
product,
"products/electronics",
"myfilesystem",
"laptop.json");
// Store with custom JSON settings
var settings = new JsonSerializerSettings
{
Formatting = Formatting.Indented,
NullValueHandling = NullValueHandling.Ignore
};
await dataLakeContext.StoreItemAsJson(
product,
"products",
"myfilesystem",
jsonSettings: settings);
// Read a single JSON object
var product = await dataLakeContext.ReadJsonFile<Product>(
"products/laptop.json",
"myfilesystem");
// Read a JSON array
var products = await dataLakeContext.ReadJsonItems<Product>(
"products/all-products.json",
"myfilesystem");
// Update an existing JSON file
product.Price = 899.99;
await dataLakeContext.UpdateJsonFile(
product,
"products/laptop.json",
"myfilesystem");
Parquet Operations
To use Parquet storage, implement the IParquetSerializable<T>
interface:
public class SensorData : IParquetSerializable<SensorData>
{
[ParquetColumn("sensor_id")]
public string SensorId { get; set; }
[ParquetColumn("temperature")]
public double Temperature { get; set; }
[ParquetColumn("timestamp")]
public DateTime Timestamp { get; set; }
public void SerializeToParquet(ParquetRowGroupWriter writer)
{
// Implementation provided by library
}
public static SensorData DeserializeFromParquet(ParquetRowGroupReader reader)
{
// Implementation provided by library
}
}
// Store items as Parquet
var readings = new List<SensorData>
{
new() { SensorId = "S001", Temperature = 23.5, Timestamp = DateTime.UtcNow },
new() { SensorId = "S002", Temperature = 24.1, Timestamp = DateTime.UtcNow }
};
await dataLakeContext.StoreItemsAsParquet(
readings,
"sensor-data/2024",
"myfilesystem",
"readings.parquet");
// Read Parquet file
var data = await dataLakeContext.ReadParquetItems<SensorData>(
"sensor-data/2024/readings.parquet",
"myfilesystem");
// Validate Parquet file
bool isValid = await dataLakeContext.IsValidParquetFile(
"sensor-data/2024/readings.parquet",
"myfilesystem");
CSV Operations
// Store items as CSV with custom delimiter
var orders = new List<Order>
{
new() { OrderId = 1, CustomerName = "Alice", Total = 150.00 },
new() { OrderId = 2, CustomerName = "Bob", Total = 200.00 }
};
await dataLakeContext.StoreItemsAsCsv(
orders,
"reports/orders",
"myfilesystem",
"daily-orders.csv",
delimiter: ",",
hasHeader: true);
// Store with custom column mapping
var columnMapping = new Dictionary<string, string>
{
["OrderId"] = "Order ID",
["CustomerName"] = "Customer",
["Total"] = "Total Amount"
};
await dataLakeContext.StoreItemsAsCsv(
orders,
"reports/orders",
"myfilesystem",
columnMapping: columnMapping);
// Read CSV file
var orders = await dataLakeContext.ReadCsvItems<Order>(
"reports/orders/daily-orders.csv",
"myfilesystem",
delimiter: ",",
hasHeader: true);
// Update CSV file
orders.Add(new Order { OrderId = 3, CustomerName = "Charlie", Total = 175.00 });
await dataLakeContext.UpdateCsvFileWithItems(
orders,
"reports/orders/daily-orders.csv",
"myfilesystem");
Directory and File Listing
// List all files in a directory
var files = await dataLakeContext.ListFiles(
"products",
"myfilesystem",
recursive: true);
foreach (var file in files)
{
Console.WriteLine($"File: {file}");
}
// List files by date range
var recentFiles = await dataLakeContext.ListFilesByDateRange(
"logs",
"myfilesystem",
fromDate: DateTime.UtcNow.AddDays(-7),
toDate: DateTime.UtcNow,
recursive: true);
// List files with metadata
var filesWithInfo = await dataLakeContext.ListFilesWithMetadata(
"data",
"myfilesystem",
recursive: true);
foreach (var fileInfo in filesWithInfo)
{
Console.WriteLine($"Path: {fileInfo.Path}");
Console.WriteLine($"Size: {fileInfo.Size} bytes");
Console.WriteLine($"Modified: {fileInfo.LastModified}");
Console.WriteLine($"Is Directory: {fileInfo.IsDirectory}");
}
Client Management
// Get or create a service client (cached)
var serviceClient = dataLakeContext.GetOrCreateServiceClient(connectionString);
// Get or create a file system client (cached)
var fileSystemClient = await dataLakeContext.GetOrCreateFileSystemClient(
"myfilesystem",
connectionString);
// Dispose to clean up all cached clients
dataLakeContext.Dispose();
API Reference
AzureDataLakeContext
The main class for interacting with Azure Data Lake Storage.
Constructor
AzureDataLakeContext(IConfiguration configuration, ILogger<AzureDataLakeContext>? logger = null)
JSON Methods
StoreItemAsJson<T>
- Store a single item as JSONUpdateJsonFile<T>
- Update an existing JSON fileReadJsonFile<T>
- Read a single JSON objectReadJsonItems<T>
- Read a collection of JSON objects
Parquet Methods
StoreItemAsParquet<T>
- Store a single item as ParquetStoreItemsAsParquet<T>
- Store multiple items as ParquetUpdateParquetFile<T>
- Update with a single itemUpdateParquetFileWithItems<T>
- Update with multiple itemsReadParquetFile<T>
- Read a single Parquet objectReadParquetItems<T>
- Read multiple Parquet objectsIsValidParquetFile
- Validate Parquet file format
CSV Methods
StoreItemAsCsv<T>
- Store a single item as CSVStoreItemsAsCsv<T>
- Store multiple items as CSVUpdateCsvFile<T>
- Update with a single itemUpdateCsvFileWithItems<T>
- Update with multiple itemsReadCsvFile<T>
- Read a single CSV objectReadCsvItems<T>
- Read multiple CSV objects
Directory Operations
ListFiles
- List all files in a directoryListFilesByDateRange
- List files filtered by modification dateListFilesWithMetadata
- List files with detailed metadata
Client Management
GetOrCreateServiceClient
- Get cached service clientGetOrCreateFileSystemClient
- Get cached file system client
DataLakeFileInfo
Metadata information for files and directories:
Path
- Full path to the fileName
- File name onlySize
- File size in bytesLastModified
- Last modification timestampCreatedOn
- Creation timestamp (currently null)IsDirectory
- Whether the item is a directory
Advanced Features
Custom Parquet Attributes
Use ParquetColumnAttribute
to control Parquet serialization:
public class MetricData : IParquetSerializable<MetricData>
{
[ParquetColumn("metric_name")]
public string Name { get; set; }
[ParquetColumn("value", ParquetType = typeof(double))]
public decimal Value { get; set; }
[ParquetColumn("tags")]
public Dictionary<string, string> Tags { get; set; }
}
Error Handling
try
{
await dataLakeContext.ReadJsonFile<User>("users/user.json", "myfilesystem");
}
catch (FileNotFoundException ex)
{
// Handle missing file
}
catch (InvalidOperationException ex)
{
// Handle configuration issues
}
catch (Azure.RequestFailedException ex)
{
// Handle Azure service errors
}
Best Practices
- Use Dependency Injection: Register
AzureDataLakeContext
as a singleton - Handle Exceptions: Always wrap operations in try-catch blocks
- Validate File Paths: Ensure paths use forward slashes
- Dispose Properly: Call
Dispose()
when done to clean up clients - Use Appropriate Formats:
- JSON for flexible, human-readable data
- Parquet for large datasets and columnar analytics
- CSV for simple tabular data and Excel compatibility
Troubleshooting
Common Issues
Connection String Not Found
- Verify configuration keys:
DataLakeConnectionString
orDataLake:ConnectionString
- Check environment variables and user secrets
- Verify configuration keys:
File Not Found Errors
- Ensure file paths use forward slashes (
/
) - Verify the file system name is correct
- Check permissions on the storage account
- Ensure file paths use forward slashes (
Parquet Serialization Errors
- Ensure class implements
IParquetSerializable<T>
- Verify
ParquetColumnAttribute
usage - Check that all properties have appropriate types
- Ensure class implements
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Author
Hamza Abdagic
Acknowledgments
- Built on Azure.Storage.Files.DataLake
- JSON support via Newtonsoft.Json
- Parquet support via Parquet.Net
- CSV support via CsvHelper
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- Azure.Storage.Files.DataLake (>= 12.22.0)
- CsvHelper (>= 33.0.1)
- Microsoft.Extensions.Configuration.Abstractions (>= 9.0.5)
- Microsoft.Extensions.Logging.Abstractions (>= 9.0.5)
- Newtonsoft.Json (>= 13.0.3)
- Parquet.Net (>= 5.1.1)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.