FSharp.Avro.Apache.Tools 0.1.21

There is a newer version of this package available.
See the version list below for details.
dotnet tool install --global FSharp.Avro.Apache.Tools --version 0.1.21
This package contains a .NET tool you can call from the shell/command line.
dotnet new tool-manifest # if you are setting up this repo
dotnet tool install --local FSharp.Avro.Apache.Tools --version 0.1.21
This package contains a .NET tool you can call from the shell/command line.
#tool dotnet:?package=FSharp.Avro.Apache.Tools&version=0.1.21
nuke :add-package FSharp.Avro.Apache.Tools --version 0.1.21

F# Bindings for Apache Avro

This package provides a tool that generates F# types wrapping Apache Avro serialisation mechanics.

Usage

OPTIONS:

    --schema-file <file>  Path to .avsc file
    --output <file>       Output location
    --record-repr <repr>  Record representation, 'class' or 'record'
    --help                display this list of options.

Motivation

Avro already has an "official" codegen tool for .NET, but it comes with some disadvantages:

  • It does not support Avro Unions at the generated types level. An object type is used to unify the choices.

    For example, a field that is declareed in Avro as

    {"name": "Foo", "type": ["string", "int"]

    will be generated in C# as

    public object Foo { get; set; }

  • There is no support for optional types at the generated code level. Generated properties for types like ["null", "string"] and string will both be of type string in C#.

  • There is no structural equality provided for the generated types.

  • Generated types are extremely mutable.

  • Bugs like this one exist.

To make developers experience a bit better, this tiny library was born.

Goals

The goal of this library is to still utilise the "official" Apache Avro for the actual encoding/decoding Avro payloads, while providing developers with more structured and friendly types to mitigate issues above (as much as possible).

Compared to building a bottom-to-top FSharp Avro library (which may be considered as a next step) the approach of using Apache Avro library has its tradeoffs:

Pros
  • It is a drop-in replacement, we still stay within the Apache Avro ecosystem. All the existing codecs, libraries, etc. will still work with the types generated by this tool.
  • We can address inconveniences and improve developers experience by providing "better" types.
  • We can sometimes mitigate for some bugs and bad design choices that may be found in Apache Avro.
  • It is just easier to reuse and not to write some of the complex bits of handling Avro, such as binary serialisation, schema parsing, deconflicting values, etc. 😉
Cons (compared to a hypothetical solution that does not use Apache Avro lib)
  • We are still somehow a little bit not pure here and there. We can mitigate a lot of it, and make a lot of it conveniently hidden, but strictly speaking it is still there.

    For example, while generated types to Avro Records have immutable interface, they still need to implement ISpecificRecord and provide a way for mutation (via CLIMutable attribute) for Apache serialiser to work.

  • We inherit bugs from Apache Avro library. Some of them we can mitigate, some we cannot.

Code generation

F# code is generated as follows:

Records

The tool provides a choice between two representations to chose from: F# Record and .NET Class.

Consider this simple form of an Avro record:

{
    "type": "record",
    "name": "Person",
    "fields": [
        { "name": "name", "type": "string" },
        { "name": "age", "type": "int" }
    ]
}
Record representation

The generated type for the schema above is an F# record with CLIMutable that implements ISpecificRecord:

[<CLIMutable>]
type Person =
    { name: string
      age: int }

    static member _SCHEMA : Avro.Schema = ...

    interface Avro.Specific.ISpecificRecord with
        member this.Get(pos: int) = ...
        member this.Put(pos: int, value: obj) = ...

Class representation

The generated type for the schema above is a .NET type that provides the constructor and the structural equality.

It also has an unsafe default constructor (required by Apache Avro), but we make it inaccessible to F# developers.

[<Sealed>]
type Person(name: string, age: int) =
    let mutable __name = name
    let mutable __age = age

    [<CompilerMessage("This method is not intended for use from F#.", 10001, IsError = true, IsHidden = true)>]
    new () = Person(Unchecked.defaultof<string>, Unchecked.defaultof<int>)

    member this.name = __name
    member this.age = __age

    static member _SCHEMA : Avro.Schema = ...

    interface Avro.Specific.ISpecificRecord with
        member this.Get(pos: int) = ...
        member this.Put(pos: int, value: obj) = ...

    interface System.IEquatable<Person> with
            member this.Equals other = ...

    override this.Equals(other) = ...
    override this.GetHashCode() = ...

Enums

Unfortunately Apache Avro lib requires an Avro enum to be represented as .NET enum.

Because of that we cannot generate a nice discriminated union and have to fall back to generating enums:

Avro:

{
    "type": "enum",
    "name": "Suit",
    "symbols": ["SPADES", "HEARTS", "DIAMONDS", "CLUBS"]
}

F#:

type Suit =
    | SPADES = 0
    | HEARTS = 1
    | DIAMONDS = 2
    | CLUBS = 3

Arrays

The official C# codegen tools uses IList<T> for arrays.

This tool simply uses 'T array type.

Maps

The official C# codegen tool uses IDictionary<string, T> for maps.

This tool uses Map<string, 'T>.

Unions

The official C# codegen tool uses object to represent union types.

This tool uses F#'s Choice and Option types.

Examples:

Avro type F# Type
["null", "string"] string option
["int", "string"] Choice<int, string>
["string", "User", "Role"] Choice<string, User, Role>
["null", "string", "int"] Choice<string, int> option

Fixed

Considering this schema:

{
    "name": "md5",
    "type": { "type": "fixed", "size": 16, "name": "MD5" }
}

Unfortunately Apache Avro heavily relies on fixed types inheriting from SpecificFixed hierarchy, so that we cannot have a simple type MD5 = MD5 of byte array.

But this tool tries to mitigate this inconvenience and provides a slightly better developer experience:

type MD5 private (value: byte[]) =
    inherit Avro.Specific.SpecificFixed(uint 16)

    override this.Schema = ...
    static member _SCHEMA = ...

    // smart constructor
    static member Create(value) : Result<MD5, string> =
        match Array.length (value) with
        | 16 -> Ok(MD5 value)
        | _ -> Error "Fixed size value Test.AvroMsg.MD5 is required have length 16"

[<AutoOpen>]
module MD5 =
    let (|MD5|) (value: MD5) = value.Value

The generated type has its constructor hidden and provides a "smart constructor" (static Create function) instead to make sure that the declared size is respected, and that the values are correct by construction.

It also provides an active pattern to make pattern matching easier.

Primitive Types

There are no changes to what Apache Avro does, all the primitives are the same .NET primitives.

Logical Types

Apache Avro lib conveniently solves the logical types puzzle and this tool just relies on that solution without deviating from it.

Benchmarks and optimisations

For the performance reasons this tool can generate a little bit more tricky code compared to "straightforward" implementation, such as using CLIMutable or smartly cached reflection that is needed for implementing ISpecificRecord.

These tricks are typically internal (to the generated code) and are not exposed to developers using the result of this tool.

Populating a fairly complex Avro type (~15 properties, nested, has optionals and choices) yields these results:

Method Mean Error StdDev Ratio RatioSD
'C# Classes' 775.3 ns 13.47 ns 17.04 ns 1.00 0.00
'F# Classes' 1,111.3 ns 21.64 ns 21.25 ns 1.43 0.04
'F# Records' 1,209.1 ns 17.10 ns 16.00 ns 1.80 0.03

F# types are slower than C# ones, but perhaps because F# types do a bit more when checking types for inputs, etc. (C# classes just blindly cast values and leave unions as objects).

At this point we do not consider "just above microsecond" performance critical (being for a fairly complex data type, too) despite being almost 2x slower than C#.

But optimisations and hints are always welcome 😃

Known bugs and problems

The biggest one know by now is AVRO-3671. C# code that is generated with the official codegen tool cannot handle it and either crashes or uses wrong types. This tool tries to make the best effort to mitigate the issue. For example, in the case where C# code crashes, F# code will work and use the correct type. But this issue cannot be fully eliminated until AVRO-3671 is addressed.

Other bug reports and suggestions are appreciated and welcome!

Product Compatible and additional computed target framework versions.
.NET net6.0 is compatible.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

This package has no dependencies.

Version Downloads Last updated
1.0.1 608 12/5/2022
1.0.0 326 12/5/2022
0.1.26 324 12/5/2022
0.1.23 338 11/24/2022
0.1.22 352 11/17/2022
0.1.21 330 11/15/2022
0.1.20 296 11/15/2022
0.1.19 328 11/15/2022