SplitDotNet 0.1.0

dotnet add package SplitDotNet --version 0.1.0
                    
NuGet\Install-Package SplitDotNet -Version 0.1.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="SplitDotNet" Version="0.1.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="SplitDotNet" Version="0.1.0" />
                    
Directory.Packages.props
<PackageReference Include="SplitDotNet" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add SplitDotNet --version 0.1.0
                    
#r "nuget: SplitDotNet, 0.1.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package SplitDotNet@0.1.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=SplitDotNet&version=0.1.0
                    
Install as a Cake Addin
#tool nuget:?package=SplitDotNet&version=0.1.0
                    
Install as a Cake Tool

Split.net

A more efficient splitter for bytes and strings, with a focus on zero allocation, in C#.

Usage


using Split.Extensions;

var example = "Hello, 🌏 world. 你好, 世界. ";

var splits = example.Split(" ");

foreach (var split in splits)
{
    Console.WriteLine(split);
}

/*
Hello,🌏
world.
你好,
世界.
*/

Performance

This package exists to save allocations on the hot path, if you are using something like strings.Split from the standard library. Benchmarks on ~100K of text:

| Method            | Mean      | Error    | StdDev   | Throughput | Gen0    | Gen1   | Gen2   | Allocated |
|------------------ |----------:|---------:|---------:|----------- |--------:|-------:|-------:|----------:|
| Split.net         |  91.68 us | 0.804 us | 0.712 us |  1.19 GB/s |       - |      - |      - |         - |

Standard library:

| Method            | Mean      | Error    | StdDev   | Throughput | Gen0    | Gen1   | Gen2   | Allocated |
|------------------ |----------:|---------:|---------:|----------- |--------:|-------:|-------:|----------:|
| string.Split      | 106.40 us | 0.138 us | 0.108 us |  1.02 GB/s | 49.3164 | 0.3662 | 0.1221 |  413352 B |

Techniques

This package does two things to achieve zero allocations. First, it lazily iterates over the splits, instead of collecting them into an array.

Second, each split is a Span, which is a "view" into the underlying string or byte[], and stays on the stack.

Data types

using Split;

You'll find Split.Bytes(), which can accept byte[], (ReadOnly)Span<byte> and Stream. If you want to split on a multi-byte rune (Unicode codepoint), you'll need to get its encoding first, using something like Encoding.UTF8.GetBytes().

You'll find Split.Chars(), which can accept string, char[], (ReadOnly)Span<char> and TextReader/StreamReader.

using Split.Extensions;

Each of the above types will have .SplitOn() methods. I chose that so as not to conflict with .Split().

Testing

We test that Split.net returns identical results to string.Split, including various edge cases.

Prior art

These are not original ideas! Here are a few other examples with a similar approach:

Each of the above is in the same ballpark of throughput and allocation as this package.

Why use Split.net, then?

You might like the UTF-8 support, SplitAny, streams & readers, or heck maybe you just like the API. Feedback welcome.

By the way

If you are splitting in order to get "words" from natural text, you may wish to use the Unicode definition of word boundaries, which I've implemented in this package.

I've also implemented these ideas in Go.

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
  • net8.0

    • No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.1.0 2,996 8/3/2024