AzureDataLakeTools.Storage 1.0.0.4

dotnet add package AzureDataLakeTools.Storage --version 1.0.0.4
                    
NuGet\Install-Package AzureDataLakeTools.Storage -Version 1.0.0.4
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="AzureDataLakeTools.Storage" Version="1.0.0.4" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="AzureDataLakeTools.Storage" Version="1.0.0.4" />
                    
Directory.Packages.props
<PackageReference Include="AzureDataLakeTools.Storage" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add AzureDataLakeTools.Storage --version 1.0.0.4
                    
#r "nuget: AzureDataLakeTools.Storage, 1.0.0.4"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package AzureDataLakeTools.Storage@1.0.0.4
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=AzureDataLakeTools.Storage&version=1.0.0.4
                    
Install as a Cake Addin
#tool nuget:?package=AzureDataLakeTools.Storage&version=1.0.0.4
                    
Install as a Cake Tool

Azure Data Lake Tools

NuGet Version License: MIT

A comprehensive .NET library for working with Azure Data Lake Storage Gen2, supporting JSON, Parquet, and CSV file formats with thread-safe client caching and an intuitive API.

Features

  • Multiple File Format Support
    • JSON files with Newtonsoft.Json
    • Parquet files with Parquet.Net and custom attribute mapping
    • CSV files with CsvHelper and flexible column mapping
  • Thread-Safe Client Management: Efficient caching of Azure Data Lake clients
  • Comprehensive File Operations: Read, write, update, and validate files
  • Directory Operations: List files with filtering and metadata
  • Async/Await Pattern: All I/O operations are asynchronous
  • Dependency Injection Ready: Easy integration with .NET DI containers
  • Robust Error Handling: Detailed error messages and proper exception handling
  • Automatic Directory Creation: Creates directories as needed
  • Flexible Configuration: Multiple connection string resolution options

Prerequisites

  • .NET 6.0 or later
  • Azure Data Lake Storage Gen2 account
  • Connection string with appropriate permissions

Installation

dotnet add package AzureDataLakeTools.Storage

Configuration

Add your Data Lake connection string to your appsettings.json:

{
  "DataLakeConnectionString": "DefaultEndpointsProtocol=https;AccountName=..."
  
  // OR
  
  "DataLake": {
    "ConnectionString": "DefaultEndpointsProtocol=https;AccountName=..."
  }
}

Quick Start

using AzureDataLakeTools.Storage;
using Microsoft.Extensions.Configuration;

// Set up configuration
var configuration = new ConfigurationBuilder()
    .AddJsonFile("appsettings.json")
    .AddEnvironmentVariables()
    .Build();

// Create a new instance of AzureDataLakeContext
var dataLakeContext = new AzureDataLakeContext(configuration);

// Store an object as JSON
var user = new { Id = 1, Name = "John Doe", Email = "john@example.com" };
await dataLakeContext.StoreItemAsJson(user, "users", "myfilesystem", "user_1.json");

// Read it back
var retrievedUser = await dataLakeContext.ReadJsonFile<User>("users/user_1.json", "myfilesystem");

Detailed Usage Examples

JSON Operations

// Store a single item as JSON
var product = new Product { Id = 1, Name = "Laptop", Price = 999.99 };
await dataLakeContext.StoreItemAsJson(
    product, 
    "products/electronics", 
    "myfilesystem", 
    "laptop.json");

// Store with custom JSON settings
var settings = new JsonSerializerSettings 
{ 
    Formatting = Formatting.Indented,
    NullValueHandling = NullValueHandling.Ignore
};
await dataLakeContext.StoreItemAsJson(
    product, 
    "products", 
    "myfilesystem", 
    jsonSettings: settings);

// Read a single JSON object
var product = await dataLakeContext.ReadJsonFile<Product>(
    "products/laptop.json", 
    "myfilesystem");

// Read a JSON array
var products = await dataLakeContext.ReadJsonItems<Product>(
    "products/all-products.json", 
    "myfilesystem");

// Update an existing JSON file
product.Price = 899.99;
await dataLakeContext.UpdateJsonFile(
    product, 
    "products/laptop.json", 
    "myfilesystem");

Parquet Operations

To use Parquet storage, implement the IParquetSerializable<T> interface:

public class SensorData : IParquetSerializable<SensorData>
{
    [ParquetColumn("sensor_id")]
    public string SensorId { get; set; }
    
    [ParquetColumn("temperature")]
    public double Temperature { get; set; }
    
    [ParquetColumn("timestamp")]
    public DateTime Timestamp { get; set; }
    
    public void SerializeToParquet(ParquetRowGroupWriter writer)
    {
        // Implementation provided by library
    }
    
    public static SensorData DeserializeFromParquet(ParquetRowGroupReader reader)
    {
        // Implementation provided by library
    }
}

// Store items as Parquet
var readings = new List<SensorData>
{
    new() { SensorId = "S001", Temperature = 23.5, Timestamp = DateTime.UtcNow },
    new() { SensorId = "S002", Temperature = 24.1, Timestamp = DateTime.UtcNow }
};

await dataLakeContext.StoreItemsAsParquet(
    readings, 
    "sensor-data/2024", 
    "myfilesystem", 
    "readings.parquet");

// Read Parquet file
var data = await dataLakeContext.ReadParquetItems<SensorData>(
    "sensor-data/2024/readings.parquet", 
    "myfilesystem");

// Validate Parquet file
bool isValid = await dataLakeContext.IsValidParquetFile(
    "sensor-data/2024/readings.parquet", 
    "myfilesystem");

CSV Operations

// Store items as CSV with custom delimiter
var orders = new List<Order>
{
    new() { OrderId = 1, CustomerName = "Alice", Total = 150.00 },
    new() { OrderId = 2, CustomerName = "Bob", Total = 200.00 }
};

await dataLakeContext.StoreItemsAsCsv(
    orders, 
    "reports/orders", 
    "myfilesystem", 
    "daily-orders.csv",
    delimiter: ",",
    hasHeader: true);

// Store with custom column mapping
var columnMapping = new Dictionary<string, string>
{
    ["OrderId"] = "Order ID",
    ["CustomerName"] = "Customer",
    ["Total"] = "Total Amount"
};

await dataLakeContext.StoreItemsAsCsv(
    orders, 
    "reports/orders", 
    "myfilesystem", 
    columnMapping: columnMapping);

// Read CSV file
var orders = await dataLakeContext.ReadCsvItems<Order>(
    "reports/orders/daily-orders.csv", 
    "myfilesystem",
    delimiter: ",",
    hasHeader: true);

// Update CSV file
orders.Add(new Order { OrderId = 3, CustomerName = "Charlie", Total = 175.00 });
await dataLakeContext.UpdateCsvFileWithItems(
    orders,
    "reports/orders/daily-orders.csv",
    "myfilesystem");

Directory and File Listing

// List all files in a directory
var files = await dataLakeContext.ListFiles(
    "products", 
    "myfilesystem", 
    recursive: true);

foreach (var file in files)
{
    Console.WriteLine($"File: {file}");
}

// List files by date range
var recentFiles = await dataLakeContext.ListFilesByDateRange(
    "logs",
    "myfilesystem",
    fromDate: DateTime.UtcNow.AddDays(-7),
    toDate: DateTime.UtcNow,
    recursive: true);

// List files with metadata
var filesWithInfo = await dataLakeContext.ListFilesWithMetadata(
    "data", 
    "myfilesystem", 
    recursive: true);

foreach (var fileInfo in filesWithInfo)
{
    Console.WriteLine($"Path: {fileInfo.Path}");
    Console.WriteLine($"Size: {fileInfo.Size} bytes");
    Console.WriteLine($"Modified: {fileInfo.LastModified}");
    Console.WriteLine($"Is Directory: {fileInfo.IsDirectory}");
}

Client Management

// Get or create a service client (cached)
var serviceClient = dataLakeContext.GetOrCreateServiceClient(connectionString);

// Get or create a file system client (cached)
var fileSystemClient = await dataLakeContext.GetOrCreateFileSystemClient(
    "myfilesystem", 
    connectionString);

// Dispose to clean up all cached clients
dataLakeContext.Dispose();

API Reference

AzureDataLakeContext

The main class for interacting with Azure Data Lake Storage.

Constructor
  • AzureDataLakeContext(IConfiguration configuration, ILogger<AzureDataLakeContext>? logger = null)
JSON Methods
  • StoreItemAsJson<T> - Store a single item as JSON
  • UpdateJsonFile<T> - Update an existing JSON file
  • ReadJsonFile<T> - Read a single JSON object
  • ReadJsonItems<T> - Read a collection of JSON objects
Parquet Methods
  • StoreItemAsParquet<T> - Store a single item as Parquet
  • StoreItemsAsParquet<T> - Store multiple items as Parquet
  • UpdateParquetFile<T> - Update with a single item
  • UpdateParquetFileWithItems<T> - Update with multiple items
  • ReadParquetFile<T> - Read a single Parquet object
  • ReadParquetItems<T> - Read multiple Parquet objects
  • IsValidParquetFile - Validate Parquet file format
CSV Methods
  • StoreItemAsCsv<T> - Store a single item as CSV
  • StoreItemsAsCsv<T> - Store multiple items as CSV
  • UpdateCsvFile<T> - Update with a single item
  • UpdateCsvFileWithItems<T> - Update with multiple items
  • ReadCsvFile<T> - Read a single CSV object
  • ReadCsvItems<T> - Read multiple CSV objects
Directory Operations
  • ListFiles - List all files in a directory
  • ListFilesByDateRange - List files filtered by modification date
  • ListFilesWithMetadata - List files with detailed metadata
Client Management
  • GetOrCreateServiceClient - Get cached service client
  • GetOrCreateFileSystemClient - Get cached file system client

DataLakeFileInfo

Metadata information for files and directories:

  • Path - Full path to the file
  • Name - File name only
  • Size - File size in bytes
  • LastModified - Last modification timestamp
  • CreatedOn - Creation timestamp (currently null)
  • IsDirectory - Whether the item is a directory

Advanced Features

Custom Parquet Attributes

Use ParquetColumnAttribute to control Parquet serialization:

public class MetricData : IParquetSerializable<MetricData>
{
    [ParquetColumn("metric_name")]
    public string Name { get; set; }
    
    [ParquetColumn("value", ParquetType = typeof(double))]
    public decimal Value { get; set; }
    
    [ParquetColumn("tags")]
    public Dictionary<string, string> Tags { get; set; }
}

Error Handling

try
{
    await dataLakeContext.ReadJsonFile<User>("users/user.json", "myfilesystem");
}
catch (FileNotFoundException ex)
{
    // Handle missing file
}
catch (InvalidOperationException ex)
{
    // Handle configuration issues
}
catch (Azure.RequestFailedException ex)
{
    // Handle Azure service errors
}

Best Practices

  1. Use Dependency Injection: Register AzureDataLakeContext as a singleton
  2. Handle Exceptions: Always wrap operations in try-catch blocks
  3. Validate File Paths: Ensure paths use forward slashes
  4. Dispose Properly: Call Dispose() when done to clean up clients
  5. Use Appropriate Formats:
    • JSON for flexible, human-readable data
    • Parquet for large datasets and columnar analytics
    • CSV for simple tabular data and Excel compatibility

Troubleshooting

Common Issues

  1. Connection String Not Found

    • Verify configuration keys: DataLakeConnectionString or DataLake:ConnectionString
    • Check environment variables and user secrets
  2. File Not Found Errors

    • Ensure file paths use forward slashes (/)
    • Verify the file system name is correct
    • Check permissions on the storage account
  3. Parquet Serialization Errors

    • Ensure class implements IParquetSerializable<T>
    • Verify ParquetColumnAttribute usage
    • Check that all properties have appropriate types

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Hamza Abdagic

Acknowledgments

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.0.0.4 371 7/21/2025
1.0.0.3 365 7/21/2025
1.0.0.1 152 6/3/2025
1.0.0 166 6/2/2025