Parquet.FSharp 0.2.0

dotnet add package Parquet.FSharp --version 0.2.0
                    
NuGet\Install-Package Parquet.FSharp -Version 0.2.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Parquet.FSharp" Version="0.2.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Parquet.FSharp" Version="0.2.0" />
                    
Directory.Packages.props
<PackageReference Include="Parquet.FSharp" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Parquet.FSharp --version 0.2.0
                    
#r "nuget: Parquet.FSharp, 0.2.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package Parquet.FSharp@0.2.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=Parquet.FSharp&version=0.2.0
                    
Install as a Cake Addin
#tool nuget:?package=Parquet.FSharp&version=0.2.0
                    
Install as a Cake Tool

Parquet.FSharp

NuGet Version GitHub Actions Workflow Status

An F# serailization library for the Apache Parquet file format, built on top of the fantastic Parquet.Net library. Parquet.FSharp adds first-class support for F# types such as records, options, lists and discriminated unions, whilst maintaining the performance of Parquet.Net.

NOTE: Parquet.FSharp is in its initial development phase - the behaviour and public API may change between minor version increments. Feedback, ideas and feature requests are all welcome!

Quickstart

open Parquet.FSharp
open System.IO

type Shape =
    | Circle of radius:int
    | Square of sideLength:int
    | Rectangle of height:int * width:int

type Node = {
    Id: int
    Shape: Shape
    Scale: float option
    Children: int list }

let nodes = [|
    { Id = 0; Shape = Square 1        ; Scale = None    ; Children = [ 1; 2 ] }
    { Id = 1; Shape = Circle 2        ; Scale = Some 1.5; Children = [ 4 ]    }
    { Id = 2; Shape = Square 3        ; Scale = Some 0.5; Children = [ 3 ]    }
    { Id = 3; Shape = Rectangle (1, 2); Scale = None    ; Children = [ 4 ]    }
    { Id = 4; Shape = Circle 1        ; Scale = Some 2.0; Children = []       } |]

// Serialize to file
use file = File.OpenWrite("./nodes.parquet")
ParquetSerializer.Serialize(nodes, file)

// Deserialize from file
use file = File.OpenRead("./nodes.parquet")
let nodes = ParquetSerializer.Deserialize<Node>(file)

<sub>[Return to top]</sub>

Supported Types

<sub>[Return to top]</sub>

Booleans

Applies to: bool

Booleans are serialized as required values by default. They can be deserialized from either required or optional boolean values. When deserialized from optional values, any null values encountered will result in a SerializationException.

<sub>[Return to top]</sub>

Numeric Types

Applies to: int8, int16, int32, int64, uint8, uint16, uint32, uint64, float32, float[64], decimal

Numeric types are serialized as required values by default. They can be deserialized from either required or optional values. When deserialized from optional values, any null values encountered will result in a SerializationException.

For deserialization, the target .NET numeric type does not have to match the source Parquet numeric type. Numeric type compatibility is determined based on whether the source type is implicitly convertible to the target type, e.g. a field of type int32 can be deserialized from a field of type int16. The following compatibility table lists the possible combinations - largely derived from .NET Implicit Numerical Conversions:

Target Type Supported Source Types
int8 int8
int16 int16, int8, uint8
int32 int32, int16, int8, uint16, uint8
int64 int64, int32, int16, int8, uint32, uint16, uint8
uint8 uint8
uint16 uint16, uint8
uint32 uint32, uint16, uint8
uint64 uint64, uint32, uint16, uint8
float32 float32, int16, int8, uint16, uint8
float[64] float[64], float32, int32, int16, int8, uint32, uint16, uint8
decimal decimal, int64, int32, int16, int8, uint64, uint32, uint16, uint8

<sub>[Return to top]</sub>

GUIDs

Applies to: Guid

GUIDs are serialized as required values by default. They can be deserialized from either required or optional GUID values. When deserialized from optional values, any null values encountered will result in a SerializationException.

<sub>[Return to top]</sub>

Date Times

Applies to: DateTime, DateTimeOffset

Date times are serialized as required UTC values with microsecond precision by default. They can be deserialized from either required or optional date time values. When deserialized from optional values, any null values encountered will result in a SerializationException.

Since DateTime values have an associated DateTimeKind, which is one of Unspecified, Utc or Local, conversion to UTC can be ambiguous. Default serialization does not make any assumptions or do any implicit conversions, so any DateTime values that are not defined with DateTimeKind.Utc will result in a SerializationException.

DateTimeOffset values always map to a specific instant in time, so can always be converted to UTC in an unambiguous way. During serialization, DateTimeOffset values will be converted to their UTC equivalent. This means that the offset information is lost, but the serialized value is guaranteed to identify the same instant in time.

Both DateTime and DateTimeOffset use 'ticks' as their base unit, where each tick represents a 100 nanosecond period. Since the default precision is microseconds, serialization results in a slight truncation, equivalent to rounding the values down to the nearest 10 ticks.

<sub>[Return to top]</sub>

Strings

Applies to: string

Strings are serialized as required values by default, despite being reference types and having null as a valid value. In F#, nullable values are not an idiomatic way to represent optionality - the preferred alternative being option types. Treating strings as required provides a guarantee that any serialized values are not null. If a null value is encountered during serialization, a SerializationException will be raised.

Strings can be deserialized from optional values as well as required values, but the same null guarantee is provided, so any null values encountered during deserialization will still result in an exception.

<sub>[Return to top]</sub>

Byte Arrays

Applies to: byte[]

Byte arrays are not treated the same as other array types since Parquet has native support for them. This means that instead of being treated as repeated values they are treated as atomic values.

Byte arrays are serialized as required values by default, despite being reference types and having null as a valid value. In F#, nullable values are not an idiomatic way to represent optionality - the preferred alternative being option types. Treating byte arrays as required provides a guarantee that any serialized values are not null. If a null value is encountered during serialization, a SerializationException will be raised.

Byte arrays can be deserialized from optional values as well as required values, but the same null guarantee is provided, so any null values encountered during deserialization will still result in an exception.

<sub>[Return to top]</sub>

Lists & Arrays

Applies to: 'Element list, 'Element[], ResizeArray<'Element>

Sequences of values are stored as Parquet lists, which contain repeated elements, analagous to the 'Element seq or IEnumerable<'Element> .NET types.

Supported sequence types are serialized as required Parquet lists by default, even for sequences that allow null as a valid value. In F#, nullable values are not an idiomatic way to represent optionality - the preferred alternative being option types. Treating sequences as required provides a guarantee that any serialized sequences are not null. If a null sequence is encountered during serialization, a SerializationException will be raised.

Sequences can be deserialized from optional Parquet lists as well as required lists, but the same null guarantee is provided, so any null lists encountered during deserialization will still result in an exception.

<sub>[Return to top]</sub>

Records

Applies to: 'FSharpRecord

As well as allowing sequences of values, Parquet allows arbitrary nesting of fields through records (also called structs). During serialization, these records are deconstructed into columns - 'shredded' in Parquet terminology - one column for each field. Information about the record nesting structure is also captured and stored alongside the columnar values, which allows records to be re-constructed - 'assembled' in Parquet terminology - during deserialization. More information on how this works can be found in Google's Dremel paper, on which Parquet is based.

Support for arbitrarily nested data is made possible through serialization support for F# record types. Record types are not nullable in F#, so are serialized as required values by default. They can be deserialized from either optional or required records. Any null values encountered during deserialization from an optional record will result in a SerializationException. Records using the [<Struct>] attribute are supported in addition to standard (reference-type) records, and are serialized in exactly the same way.

One of the advantages of Parquet being a columnar data format is that it's possible and efficient to load only a subset of columns. When deserializing records, any fields in the Parquet file that are not specified in the target .NET record type will be skipped. This enables a degree of forwards compatability - if new fields are added to the schema, it will still possible to deserialize using the old .NET record types.

<sub>[Return to top]</sub>

Optional Types

Applies to: 'Value option, Nullable<'Value>

Parquet supports both optional and required values. In F#, nullable values are not an idiomatic way to represent optionality - the preferred alternative being option types 'Value option, or in some cases the Nullable<'Value> type. This makes optional values much more explicit and discoverable.

Due to the above, the default approach for serialization of other supported types is to treat the values as required, even if they are implicitly nullable through being a reference type. This helps prevent null values from appearing when they aren't expected. Values can instead be serialized as optional values by wrapping them in one of the supported optional types.

Optional types can be deserialized from both optional and required values. When deserailzied from required values, they are guaranteed to have an associated value and will therefore never be 'null'.

Note that Parquet does not support multiple levels of optionality, so nested optional types such as 'Value option option are not supported. Attempting to serialize nested optional values will result in a SerializationException. Instead, the recommended approach for handling nested optional values is to add another level of nesting using an optional record containing a single optional field, for example:

// Nested options are not allowed by the Parquet format.
type IntOptionOption = int option option

// Instead, they can be represented using an optional
// record with an optional field value.
type IntOption = { Value: int option }
type IntOptionOption = IntOption option

<sub>[Return to top]</sub>

Discriminated Unions

Applies to: 'FSharpUnion

Discriminated unions can be used to represent a range of different types with varying complexity. Complex unions require a more flexible - and therefore more complex - serialization approach. Even though all unions could be serialized using this same flexible approach, it becomes fairly cumbersome and verbose for simpler unions. For this reason, Parquet.FSharp defines several different categories of union, each of which has a distinct use-case and is serialized in a different way.

<sub>[Return to top]</sub>

Enumeration Unions

The simplest type of union is one in which there are no associated data fields for any of the cases. This gives an enum-like type, but without an explicit backing value. An example is as follows:

type Shape =
    | Circle
    | Triangle
    | Square
    | Rectangle

Since none of the cases has any associated data fields, there is no additional nesting required to represent these union values. Enumeration unions are serialized as required string values. They can be deserialized from either required or optional string values, but because unions are treated as non-nullable in F#, any null values encountered during deserialization will result in a SerializationException.

<sub>[Return to top]</sub>

Single-Case Unions

Unions with a single case are treated separately from unions with multiple cases because they don't distinguish between two equivalent types. They are often used to wrap primitive data values to create a richer set of domain types and to provide encapsulation - see F# For Fun and Profit - Designing with types: Single case union types. Some examples are shown below:

type Age = Age of age:int
type Name = Name of firstName:string * lastName:string
type EmailAddress = EmailAddress of string

Since there is only a single case, there is no need to store the case name so only the fields are serailized. Single-case unions are serialized as required records, with one field for each associated data field. For example, the types above are serialized as if they were the following equivalent record types:

type Age = { age: int }
type Name = { firstName: string; lastName: string }

// Default field name assigned by F# compiler used, as none specified in type definition.
type EmailAddress = { Item1: string }

Single-case unions can be deserialized from record values containing the correct field definitions. These records can be either required or optional values, but because unions are treated as non-nullable in F#, any null values encountered during deserialization will result in a SerializationException.

<sub>[Return to top]</sub>

Multi-Case Unions

Unions with multiple cases where at least one case as one or more associated data fields are the most complex category of union and therefore require the most flexible approach. Some examples are shown below, taken from Microsoft Learn - F# Discriminated Unions:

type Shape =
    | Rectangle of width:float * length:float
    | Circle of radius:float
    | Prism of width:float * height:float * length:float

type BinaryTree =
    | Leaf
    | Node of value:int * left:BinaryTree * right:BinaryTree

Since there are multiple cases, the case name is serailized alongside the data fields as a string value. Like single-case unions, the case data fields are serialized as a record. However since each case contains its own distinct set of fields, each is serialized into its own independent record structure. The complete schema for a multi-case union consists of an outer record containing a required string field Type for the case name and one or more case data fields. Each case data field is itself a record, containing any associated data fields. Since only one case will have data for any given union value, the case data fields are optional. The following demonstrates the equivalent serialization structure for the union types above:

type Rectangle = { width: float; length: float }
type Circle = { radius: float }
type Prism = { width: float; height: float; length: float }
type Shape = {
    Type: string
    Rectangle: Rectangle option
    Circle: Circle option
    Prism: Prism option }

// The 'Leaf' case has no data fields so no inner record is required.
type Node = { value: int; left: BinaryTree; right: BinaryTree }
type BinaryTree = {
    Type: string
    Node: Node option }

Multi-case unions can be deserialized from record values containing the correct structure. These records can be either required or optional values, but because unions are treated as non-nullable in F#, any null values encountered during deserialization will result in a SerializationException.

<sub>[Return to top]</sub>

Roadmap

The following features and improvements are on the roadmap and may be implemented in the future, in no particular order.

<sub>[Return to top]</sub>

Extend Supported Types

The following types are not currently supported but will likely be added in the future:

  • Enums
  • DateOnly
  • TimeOnly
  • TimeSpan
  • Interval (from Parquet.Net)
  • BigDecimal (from Parquet.Net)
  • BigInteger (from System.Numerics)
  • Tuples
  • 'Element seq
  • 'Value voption
  • Map<'Key, 'Value>
  • Dictionary<'Key, 'Value>
  • Classes

<sub>[Return to top]</sub>

Serialization Options

The ability to specify options when serializing and deserializing to allow finer-grained control of serialization behaviour. This will hopefully include the ability to apply options to either all fields that are being serialized or just to specific fields. Configuration options could include the ability to:

  • Override field names and union case names.
  • Serialize fields that would normally be serialized as required fields as optional instead.
  • Allow serialization and deserialization of null values for reference types.
  • Treat date time values as local rather than UTC.
  • Allow millisecond and nanosecond precision for date times in addition to the default microsecond precision.
  • Specify the precision and scale of decimal values.

<sub>[Return to top]</sub>

Serialization Attributes

Attributes that can be applied to types and fields to allow the above serialization options to be defined as part of type definitions rather than requiring configuration at the point of serialization.

<sub>[Return to top]</sub>

Custom Serializers

Allow the definition of custom serializers and deserializers that can be used to override the default serialization behaviour provided by the library for specific types, or allow serialization of types that aren't supported by the library. The default serialization behaviour defined in the library is already set up in this way, but the list of registered converters is not yet configurable and just contains a default converter for each supported type.

<sub>[Return to top]</sub>

Improved Error Handling

For performance reasons, serialization is implemented by generating, compiling and executing Expression Trees, in a similar fashion to Parquet.Net. This means that errors are not always easy to trace back to the code that caused them. To improve on this, extra exception handling could be added into the generated expression trees to provide more information about the expression that was being executed when the exception occurred, e.g. to identify the specific converter function that was being called.

<sub>[Return to top]</sub>

Schema Evolution

Sometimes schema changes are required, and in some cases this could be done in a backwards compatible way, i.e. without breaking deserialization of Parquet files using the old schema. The following backward compatible changes could be allowed in future:

  • Adding optional record fields
  • Adding union cases
  • Adding optional union case fields

<sub>[Return to top]</sub>

Product Compatible and additional computed target framework versions.
.NET net5.0 was computed.  net5.0-windows was computed.  net6.0 was computed.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
.NET Core netcoreapp2.0 was computed.  netcoreapp2.1 was computed.  netcoreapp2.2 was computed.  netcoreapp3.0 was computed.  netcoreapp3.1 was computed. 
.NET Standard netstandard2.0 is compatible.  netstandard2.1 is compatible. 
.NET Framework net461 was computed.  net462 was computed.  net463 was computed.  net47 was computed.  net471 was computed.  net472 was computed.  net48 was computed.  net481 was computed. 
MonoAndroid monoandroid was computed. 
MonoMac monomac was computed. 
MonoTouch monotouch was computed. 
Tizen tizen40 was computed.  tizen60 was computed. 
Xamarin.iOS xamarinios was computed. 
Xamarin.Mac xamarinmac was computed. 
Xamarin.TVOS xamarintvos was computed. 
Xamarin.WatchOS xamarinwatchos was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.2.0 89 1/27/2026
0.1.1 93 1/4/2026
0.1.0 89 1/3/2026