Nemesis.TextParsers 0.11.42

Contains various parser optimized for speed and no allocation
     This package was built from the source at https://github.com/nemesissoft/Nemesis.TextParsers/tree/09da31e2f11d45826b5a828701d703d6af150a34

There is a newer version of this package available.
See the version list below for details.
Install-Package Nemesis.TextParsers -Version 0.11.42
dotnet add package Nemesis.TextParsers --version 0.11.42
<PackageReference Include="Nemesis.TextParsers" Version="0.11.42" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Nemesis.TextParsers --version 0.11.42
The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Nemesis.TextParsers

When stucked with a task of parsing various items form strings we ofter opt for TypeConverter (https://docs.microsoft.com/en-us/dotnet/api/system.componentmodel.typeconverter) ?

We tend to create methods like:

public static T FromString<T>(string text) =>
    (T)TypeDescriptor.GetConverter(typeof(T))
        .ConvertFromInvariantString(text);

or even create similar constructs to be in line with object oriented design:

public abstract class TextTypeConverter : TypeConverter
{
    public sealed override bool CanConvertFrom(ITypeDescriptorContextcontext, Type sourceType) =>
        sourceType == typeof(string) || base.CanConvertFrom(context, ourceType);

    public sealed override bool CanConvertTo(ITypeDescriptorContext ontext, Type destinationType) =>
        destinationType == typeof(string) || base.CanConvertTocontext, destinationType);
}

public abstract class BaseTextConverter<TValue> : TextTypeConverter
{
    public sealed override object ConvertFrom(ITypeDescriptorContext ontext, CultureInfo culture, object value) =>
        value is string text ? ParseString(text) : default;

    public abstract TValue ParseString(string text);
    

    public sealed override object ConvertTo(ITypeDescriptorContext ontext, CultureInfo culture, object value, Type estinationType) =>
        destinationType == typeof(string) ?
            FormatToString((TValue)value) :
            base.ConvertTo(context, culture, value, destinationType);

    public abstract string FormatToString(TValue value);
}

What is wrong with that? Well, nothing... except of performance.

TypeConverter was designed 15+ years ago when processing power tended to double every now and then and (in my opinion) it was more suited for creating GUI-like editors where performance usually is not an issue.
But imagine a service application like exchange trading suite that has to perform multiple operations per second and in such cases processor has more important thing to do than parsing strings.

Parser/formatter features

  1. as concise as possible - both JSON or XML exist but thay are not ready to be created from hand by human support
  2. works in various architectures supporting .Net Core and .Net Standard and is culture independent
  3. support for basic system types (C#-like type names):
    • string
    • bool
    • byte/sbyte
    • short/ushort
    • int/uint
    • long/ulong
    • float/double
    • decimal
    • BigInteger
    • TimeSpan
    • DateTime/DateTimeOffset
    • Guid
  4. supports pattern based parsing/formatting via ToString/FromText methods placed inside type or static/instance factory
  5. supports compound types:
    • KeyValuePair<,> and ValueTuple of arity 2-5 (1 is not a tuple, more than 5 warrants a dedicated type)
    • Enums (with number underlying types)
    • Nullables
    • Dictionaries (built-in i.e. SortedDictionary/SortedList and custom ones)
    • Arrays (including jagged arrays)
    • Standard collections and collection contracts (List vs IList vs IEnumerable)
    • User defined collections
    • everything mentioned above but combined with inner elements properly escaped in final string i.e. SortedDictionary&lt;char?, IList&lt;float[]&gt;&gt;
  6. ability to fallback to TypeConverter if no parsing/formatting strategy was found
  7. parsing is fast to while allocating as little memory as possible upon parsing. The follwing benchmark illustrates this speed via parsing 1000 element array

| Method | Mean | Ratio | Gen 0 | Gen 1 | Allocated | Remarks |
|--------------------------- |-------------|-------|----------|--------|-----------|-----------|
| RegEx parsing | 4,528.99 us | 44.98 | 492.1875 | - | 2089896 B | Regular expression with escaping support |
| StringSplitTest_KnownType | 93.41 us | 0.92 | 9.5215 | 0.1221 | 40032 B | string.Split(..).Select(text=>int.Parse(text)) |
|StringSplitTest_DynamicType | 474.73 us | 4.69 | 24.4141 | - | 104032 B | string.Split + TypeDescriptor.GetConverter |
| SpanSplitTest_NoAlloc | 101.00 us | 1.00 | - | - | - | "1|2|3".AsSpan().Tokenize() |
| SpanSplitTest_Alloc | 101.38 us | 1.00 | 0.8545 | - | 4024 B | "1|2|3".AsSpan().Tokenize(); var array = new int[1000];|

  1. provides basic building blocks for parser's callers to be able to create their own transformers/factories
    • LeanCollection that can store 1,2,3 or more elements
    • string.Split equivalent is provided to accept faster representaion of string - ReadOnlySpan&lt;char&gt;. Supports both standard and custom escaping sequences
    • access to every implemented parser/formatter
  2. basic LINQ support
var avg = SpanCollectionSerializer.DefaultInstance.ParseStream<double>("1|2|3".AsSpan()).Average();
  1. basic support for GUI editors for compound types like collections/dictionaries
  2. lean/frugal implementation of StringBuilder - ValueSequenceBuilder
Span<char> initialBuffer = stackalloc char[32];
var accumulator = new ValueSequenceBuilder<char>initialBuffer);
using (var enumerator = coll.GetEnumerator())
    while (enumerator.MoveNext())
        FormatElement(formatter, enumerator.Current, ref ccumulator);
var text = accumulator.AsSpanTo(accumulator.Length > 0 ? ccumulator.Length - 1 : 0).ToString();
accumulator.Dispose();

Continuous Integration

|Branch |Status |
|---------------|:--------:|
|master build |Build status |
|Tests | Tests |
|Code size | Code size |
|Issues | Issues |
|GitHub stars | GitHub stars |
|GitHub commit activity| Activity |

Nuget package

Nuget package
Version
Downloads

Nemesis.TextParsers

When stucked with a task of parsing various items form strings we ofter opt for TypeConverter (https://docs.microsoft.com/en-us/dotnet/api/system.componentmodel.typeconverter) ?

We tend to create methods like:

public static T FromString<T>(string text) =>
    (T)TypeDescriptor.GetConverter(typeof(T))
        .ConvertFromInvariantString(text);

or even create similar constructs to be in line with object oriented design:

public abstract class TextTypeConverter : TypeConverter
{
    public sealed override bool CanConvertFrom(ITypeDescriptorContextcontext, Type sourceType) =>
        sourceType == typeof(string) || base.CanConvertFrom(context, ourceType);

    public sealed override bool CanConvertTo(ITypeDescriptorContext ontext, Type destinationType) =>
        destinationType == typeof(string) || base.CanConvertTocontext, destinationType);
}

public abstract class BaseTextConverter<TValue> : TextTypeConverter
{
    public sealed override object ConvertFrom(ITypeDescriptorContext ontext, CultureInfo culture, object value) =>
        value is string text ? ParseString(text) : default;

    public abstract TValue ParseString(string text);
    

    public sealed override object ConvertTo(ITypeDescriptorContext ontext, CultureInfo culture, object value, Type estinationType) =>
        destinationType == typeof(string) ?
            FormatToString((TValue)value) :
            base.ConvertTo(context, culture, value, destinationType);

    public abstract string FormatToString(TValue value);
}

What is wrong with that? Well, nothing... except of performance.

TypeConverter was designed 15+ years ago when processing power tended to double every now and then and (in my opinion) it was more suited for creating GUI-like editors where performance usually is not an issue.
But imagine a service application like exchange trading suite that has to perform multiple operations per second and in such cases processor has more important thing to do than parsing strings.

Parser/formatter features

  1. as concise as possible - both JSON or XML exist but thay are not ready to be created from hand by human support
  2. works in various architectures supporting .Net Core and .Net Standard and is culture independent
  3. support for basic system types (C#-like type names):
    • string
    • bool
    • byte/sbyte
    • short/ushort
    • int/uint
    • long/ulong
    • float/double
    • decimal
    • BigInteger
    • TimeSpan
    • DateTime/DateTimeOffset
    • Guid
  4. supports pattern based parsing/formatting via ToString/FromText methods placed inside type or static/instance factory
  5. supports compound types:
    • KeyValuePair<,> and ValueTuple of arity 2-5 (1 is not a tuple, more than 5 warrants a dedicated type)
    • Enums (with number underlying types)
    • Nullables
    • Dictionaries (built-in i.e. SortedDictionary/SortedList and custom ones)
    • Arrays (including jagged arrays)
    • Standard collections and collection contracts (List vs IList vs IEnumerable)
    • User defined collections
    • everything mentioned above but combined with inner elements properly escaped in final string i.e. SortedDictionary&lt;char?, IList&lt;float[]&gt;&gt;
  6. ability to fallback to TypeConverter if no parsing/formatting strategy was found
  7. parsing is fast to while allocating as little memory as possible upon parsing. The follwing benchmark illustrates this speed via parsing 1000 element array

| Method | Mean | Ratio | Gen 0 | Gen 1 | Allocated | Remarks |
|--------------------------- |-------------|-------|----------|--------|-----------|-----------|
| RegEx parsing | 4,528.99 us | 44.98 | 492.1875 | - | 2089896 B | Regular expression with escaping support |
| StringSplitTest_KnownType | 93.41 us | 0.92 | 9.5215 | 0.1221 | 40032 B | string.Split(..).Select(text=>int.Parse(text)) |
|StringSplitTest_DynamicType | 474.73 us | 4.69 | 24.4141 | - | 104032 B | string.Split + TypeDescriptor.GetConverter |
| SpanSplitTest_NoAlloc | 101.00 us | 1.00 | - | - | - | "1|2|3".AsSpan().Tokenize() |
| SpanSplitTest_Alloc | 101.38 us | 1.00 | 0.8545 | - | 4024 B | "1|2|3".AsSpan().Tokenize(); var array = new int[1000];|

  1. provides basic building blocks for parser's callers to be able to create their own transformers/factories
    • LeanCollection that can store 1,2,3 or more elements
    • string.Split equivalent is provided to accept faster representaion of string - ReadOnlySpan&lt;char&gt;. Supports both standard and custom escaping sequences
    • access to every implemented parser/formatter
  2. basic LINQ support
var avg = SpanCollectionSerializer.DefaultInstance.ParseStream<double>("1|2|3".AsSpan()).Average();
  1. basic support for GUI editors for compound types like collections/dictionaries
  2. lean/frugal implementation of StringBuilder - ValueSequenceBuilder
Span<char> initialBuffer = stackalloc char[32];
var accumulator = new ValueSequenceBuilder<char>initialBuffer);
using (var enumerator = coll.GetEnumerator())
    while (enumerator.MoveNext())
        FormatElement(formatter, enumerator.Current, ref ccumulator);
var text = accumulator.AsSpanTo(accumulator.Length > 0 ? ccumulator.Length - 1 : 0).ToString();
accumulator.Dispose();

Continuous Integration

|Branch |Status |
|---------------|:--------:|
|master build |Build status |
|Tests | Tests |
|Code size | Code size |
|Issues | Issues |
|GitHub stars | GitHub stars |
|GitHub commit activity| Activity |

Nuget package

Nuget package
Version
Downloads

This package is not used by any popular GitHub repositories.

Version History

Version Downloads Last updated
1.5.1 0 3/29/2020
1.5.0 0 3/28/2020
1.4.1 53 3/23/2020
1.3.2 49 3/19/2020
1.3.0 66 3/16/2020
1.2.0 99 3/15/2020
1.1.3 153 3/14/2020
1.1.2 134 2/27/2020
1.1.1 44 2/26/2020
1.1.0 116 2/26/2020
1.0.6 145 2/25/2020
1.0.4 42 2/25/2020
1.0.3 77 2/18/2020
1.0.2 91 11/8/2019
1.0.1 56 11/6/2019
1.0.0 74 9/25/2019
0.11.50 67 9/25/2019
0.11.47 70 9/25/2019
0.11.46 70 9/23/2019
0.11.42 75 9/18/2019
0.11.41 69 9/18/2019
0.11.40 69 9/18/2019
0.11.39 73 9/18/2019
0.11.38 76 9/18/2019
0.11.37 74 9/18/2019
0.11.36 73 9/17/2019
0.11.35 74 9/17/2019
0.11.34 75 9/17/2019
0.11.33 76 9/17/2019
0.9.32 73 9/17/2019
0.9.31 81 9/11/2019
0.9.30 84 9/9/2019
0.9.29 77 9/6/2019
0.9.28 97 8/3/2019
0.9.27 95 8/3/2019
0.9.26 92 8/1/2019
0.9.25 98 7/21/2019
0.9.24 92 7/19/2019
0.9.22 115 6/14/2019
0.9.21 99 6/13/2019
0.9.20 118 6/9/2019
0.9.19 117 6/7/2019
0.9.18 131 6/5/2019
0.9.15 111 5/29/2019
0.9.14 119 5/29/2019
0.9.13 115 5/28/2019
0.9.12 118 5/27/2019
0.9.10 121 5/21/2019
0.9.8 130 5/7/2019
0.9.7 137 5/5/2019
0.9.6 134 5/5/2019
0.9.5 123 5/5/2019
Show less