SplitDotNet 0.1.0
dotnet add package SplitDotNet --version 0.1.0
NuGet\Install-Package SplitDotNet -Version 0.1.0
<PackageReference Include="SplitDotNet" Version="0.1.0" />
<PackageVersion Include="SplitDotNet" Version="0.1.0" />
<PackageReference Include="SplitDotNet" />
paket add SplitDotNet --version 0.1.0
#r "nuget: SplitDotNet, 0.1.0"
#:package SplitDotNet@0.1.0
#addin nuget:?package=SplitDotNet&version=0.1.0
#tool nuget:?package=SplitDotNet&version=0.1.0
Split.net
A more efficient splitter for bytes and strings, with a focus on zero allocation, in C#.
Usage
using Split.Extensions;
var example = "Hello, 🌏 world. 你好, 世界. ";
var splits = example.Split(" ");
foreach (var split in splits)
{
Console.WriteLine(split);
}
/*
Hello,🌏
world.
你好,
世界.
*/
Performance
This package exists to save allocations on the hot path, if you are using something like strings.Split from the standard library. Benchmarks on ~100K of text:
| Method | Mean | Error | StdDev | Throughput | Gen0 | Gen1 | Gen2 | Allocated |
|------------------ |----------:|---------:|---------:|----------- |--------:|-------:|-------:|----------:|
| Split.net | 91.68 us | 0.804 us | 0.712 us | 1.19 GB/s | - | - | - | - |
Standard library:
| Method | Mean | Error | StdDev | Throughput | Gen0 | Gen1 | Gen2 | Allocated |
|------------------ |----------:|---------:|---------:|----------- |--------:|-------:|-------:|----------:|
| string.Split | 106.40 us | 0.138 us | 0.108 us | 1.02 GB/s | 49.3164 | 0.3662 | 0.1221 | 413352 B |
Techniques
This package does two things to achieve zero allocations. First, it lazily iterates over the splits, instead of collecting them into an array.
Second, each split is a Span, which is a "view" into the underlying string or byte[], and stays on the stack.
Data types
using Split;
You'll find Split.Bytes(), which can accept byte[], (ReadOnly)Span<byte> and Stream. If you want to split on a multi-byte rune (Unicode codepoint), you'll need to get its encoding first, using something like Encoding.UTF8.GetBytes().
You'll find Split.Chars(), which can accept string, char[], (ReadOnly)Span<char> and TextReader/StreamReader.
using Split.Extensions;
Each of the above types will have .SplitOn() methods. I chose that so as not to conflict with .Split().
Testing
We test that Split.net returns identical results to string.Split, including various edge cases.
Prior art
These are not original ideas! Here are a few other examples with a similar approach:
SpanSplitEnumerator(This Split.net package started as a fork ofSpanSplitEnumerator)
Each of the above is in the same ballpark of throughput and allocation as this package.
Why use Split.net, then?
You might like the UTF-8 support, SplitAny, streams & readers, or heck maybe you just like the API. Feedback welcome.
By the way
If you are splitting in order to get "words" from natural text, you may wish to use the Unicode definition of word boundaries, which I've implemented in this package.
I've also implemented these ideas in Go.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- No dependencies.
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 0.1.0 | 2,996 | 8/3/2024 |