SpiseMisu.Text.Dstring
0.11.18
dotnet add package SpiseMisu.Text.Dstring --version 0.11.18
NuGet\Install-Package SpiseMisu.Text.Dstring -Version 0.11.18
<PackageReference Include="SpiseMisu.Text.Dstring" Version="0.11.18" />
<PackageVersion Include="SpiseMisu.Text.Dstring" Version="0.11.18" />
<PackageReference Include="SpiseMisu.Text.Dstring" />
paket add SpiseMisu.Text.Dstring --version 0.11.18
#r "nuget: SpiseMisu.Text.Dstring, 0.11.18"
#:package SpiseMisu.Text.Dstring@0.11.18
#addin nuget:?package=SpiseMisu.Text.Dstring&version=0.11.18
#tool nuget:?package=SpiseMisu.Text.Dstring&version=0.11.18
SpiseMisu.Text.Dstring
A Danish string is a German string alike implementation for .NET, managed memory optimized.
A dstring
consists of 16-bytes
(128-bits
) of continuous memory, where:
The firstbyte
, stores abitmask
for the seven nextbytes
as well as abyte []
pointerThe first
byte
, uses a4-bit
bitmask
to store the length of thedstring
prefix, as well as another4-bit
bitmask
to store flags forformat-and-encoding
. Once the upperbound length of thedstring
prefix length is reached, a3-bit
bitmask
withcompression
flags is available:# Upperbound length of eight (compression flags are available) +--------+ |▭▭▭▭■□□□| +--------+ # Lenth of five (compression flags are NOT available) +--------+ |▭▭▭▭□■□■| +--------+
and
# A byte[] (dbytes) aka Extended ASCII +--------+ |□□□□▭▭▭▭| isExtASCII...: Encoded bytes in [0x00 - 0xFF] +--------+ # Format +--------+ |□□□■▭▭▭▭| isBin....: Ex: 1001010101… (log 02. / log 02. = 1.0-bit => 08 vals in 01-byte +--------+ |□□■□▭▭▭▭| isDig....: Ex: 0123456789… (log 10. / log 02. = 3.3-bit => 09 vals in 03-bytes +--------+ |□□■■▭▭▭▭| isHex....: Ex: AF332EC219… (log 16. / log 02. = 4.0-bit => 02 vals in 01-byte +--------+ |□■□□▭▭▭▭| isISO8601 (TBC) +--------+ |□■□■▭▭▭▭| isUUID...: Ex: d6c3ff78-0546-42dd-abc8-24a9e74ccf90 => 36 vals in 16-byte +--------+ |□■■□▭▭▭▭| isF064...: Ex: 1. / 3. = 0.3333333333 => 01 val in 08-bytes (fixed) +--------+ |□■■■▭▭▭▭| isD128...: Ex: 1m / 3m = 0.3333333333333333333333333333M => 01 val in 16-bytes (fixed) +--------+ |■□□□▭▭▭▭| isJSON...: Ex: [{"foo":42}] +--------+ |■□□■▭▭▭▭| isJSONL..: Ex: [{"foo":42}]\n[{"bar":43}] +--------+ # Format and Encoding placeholders +--------+ |■□■□▭▭▭▭| PlaceholderF10 (placeholder for future formats/encodings) +--------+ |■□■■▭▭▭▭| PlaceholderF11 (placeholder for future formats/encodings) +--------+ |■■□□▭▭▭▭| PlaceholderF12 (placeholder for future formats/encodings) +--------+ # Encoding. Default is multi-byte Unicode for optimal storage +--------+ |■■□■▭▭▭▭| isASCII......: Encoded bytes in [0x00 - 0x7F] +--------+ |■■■□▭▭▭▭| isUTF8.......: Encoded bytes as multiple UTF8 single-bytes +--------+ |■■■■▭▭▭▭| isUnicode....: Encoded bytes as multi-byte Unicode +--------+ bit-mask
and
# Default is uncompressed +--------+ |▭▭▭▭■□□□| Uncompressed +--------+ # Compression algorithms, with streaming support +--------+ |▭▭▭▭■□□■| Deflate +--------+ |▭▭▭▭■□■□| GZip +--------+ |▭▭▭▭■□■■| ZLib +--------+ |▭▭▭▭■■□□| Brotli +--------+ # Compression algorithms placeholders +--------+ |▭▭▭▭■■□■| PlaceholderF05 +--------+ |▭▭▭▭■■■□| PlaceholderF06 +--------+ |▭▭▭▭■■■■| PlaceholderF07 +--------+ bit-mask
The next seven
bytes
, store each of the seven firstbytes
of adstring
. If thedstring
is less than sevenbytes
, then the remainingbytes
will be instantiated to adefault
value of zeroFinally, the last
bytes
, contain ax64-pointer
(8-bytes
) to abyte []
(on theheap
) for the rest of thebytes
in thedstring
. If thedstring
is less than eightbytes
, thebyte []
will not be instantiated (null
value)
- Example of a 4-byte
dstring
("test"). No heap allocation:
+--------+----+----+----+----+----+----+----+----------+
|□□□□□■□□|0x74|0x65|0x73|0x74|0x00|0x00|0x00| <NULL> |
+--------+----+----+----+----+----+----+----+----------+
bit-mask b0 b1 b2 b3 b4 b5 b6 pointer
—— —— —— ——
- Example of a +8-byte
dstring
("Danish string") + heap allocation:
0x551A4290 (byte[] on heap)
|
v
+--------+----+----+---+----+----------+ +----+----+---+----+
|□□□□■□□□|0x44|0x61| … |0x20|0x551A4290| ---> |0x73|0x74| … |0x67|
+--------+----+----+---+----+----------+ +----+----+---+----+
bit-mask b0 b1 … b6 pointer b7 b8 … bn
—— —— —— ——————— —— —— ——
- Example of an array of nine
dstring
:
extra allocated byte arrays on heap ----+------------+------------+
| | |
v | |
0x6796EE96 | |
+-+----+-----------------------+ | | |
|i|memo| continuous memory | v | |
+-+----+--------+---+----------+ +---+ v |
|0|0x00|□□□□■□□□| … |0x6796EE96| -----> | … | 0x53EB31F6 |
+-+----+--------+---+----------+ +---+ | |
|1|0x10|□□□□□□■□| … | <NULL> | v |
+-+----+--------+---+----------+ +---+ v
|2|0x20|□□□□■□□□| … |0x53EB31F6| ------------------> | … | 0x4A424B5E
+-+----+--------+---+----------+ +---+ |
|…|0x…0|□□□□□■□■| … | <NULL> | v
+-+----+--------+---+----------+ +---+
|8|0x80|□□□□■□□□| … |0x4A424B5E| -------------------------------> | … |
+-+----+--------+---+----------+ +---+
Project structure
├── SpiseMisu.Text.Dstring
│ ├── lib
│ │ └── utils.fs
│ ├── SpiseMisu.Text.Dstring.fsproj
│ └── dstring.fs
├── SpiseMisu.Text.Dstring.Perfs
│ ├── SpiseMisu.Text.Dstring.Perfs.fsproj
│ └── program.fs
├── SpiseMisu.Text.Dstring.Tests
│ ├── SpiseMisu.Text.Dstring.Tests.fsproj
│ ├── program.fs
│ └── tests.fs
├── demo
│ └── dstring.fsx
├── imgs
│ ├── docs
│ ├── licenses
│ └── nuget
├── SpiseMisu.Text.Dstring.sln
├── global.json
├── license.txt
├── license_cil-bytecode_agpl-3.0-only.txt
├── license_knowhow_cc-by-nc-nd-40.txt
├── readme.md
└── todo.org
Memory layout
Heap dump with dotnet-dump
mini-guide
In
./SpiseMisu.Text.Dstring.Perfs/program.fs > x.GlobalCleanup () =
outcommentSystem.Threading.Thread.Sleep(15_000 (* 15 secs *))
Execute
./dotnet-cli-pidof.sh
and you will see all thedotnet
apps running. Look for the ones ending withSpiseMisu.Text.Dstring.Perfs-Job-OVERNF-1/bin/Release/net8.0
.Now wait for the job, you want to make the memory dump for, reaches the clean-up section:
// AfterActualRun
Execute
dotnet-dump collect --type Heap --process-id 2456129
and you will see:
// AfterActualRun
WorkloadResult 1: 2 op, 507459083.00 ns, 253.7295 ms/op
// GC: 8 7 0 207217488 2
// Threading: 0 0 2
[createdump] Gathering state for process 2456129 dotnet
[createdump] Writing minidump with heap to file ~/…/SpiseMisu.Text.Dstring/core_20251004_170724
[createdump] Written 596156416 bytes (145546 pages) to core file
[createdump] Target process is alive
[createdump] Dump successfully written in 306ms
Investigate by typing:
dotnet-dump analyze core_20251004_170724
In the tool, type:
dumpheap -stat
and you will see:
…
561d22bacde0 13,565 539,936 Free
7f54cec830c0 1 8,000,024 System.Int64[]
7f54cec82ee8 1 16,000,024 SpiseMisu.Text+Dstring[]
7f54cec82010 2 16,000,048 System.Byte[][]
7f54ce9aeb48 34 24,004,640 System.String[]
7f54ce90d7c8 3,000,708 158,772,680 System.String
7f54ceb75950 5,000,005 209,002,292 System.Byte[]
Total 8,015,865 objects, 432,486,422 bytes
- See details for a given memory address:
dumpheap -mt 7f54cec82ee8
Address MT Size
7f14ce800048 7f54cec82ee8 16,000,024
- You can now drill further by typing:
dumparray -length 5 7f14ce800048
Name: SpiseMisu.Text+Dstring[]
MethodTable: 00007f54cec82ee8
EEClass: 00007f54cec82e60
Size: 16000024(0xf42418) bytes
Array: Rank 1, Number of elements 1000000, Type VALUETYPE
Element Methodtable: 00007f54cec82db0
[0] 00007f14ce800058
[1] 00007f14ce800068
[2] 00007f14ce800078
[3] 00007f14ce800088
[4] 00007f14ce800098
- And now we can see the contents of some of the (struct) elements in our
array by typing:
db -c 80 00007f14ce800058
(16-byte element x 5 = 80-bytes):
00007f14ce800058: 30 6b 22 ce 14 7f 00 00 08 73 9a ac 37 c9 be ba 0k"......s..7...
00007f14ce800068: 58 6b 22 ce 14 7f 00 00 08 53 d1 20 a4 46 a1 86 Xk"......S. .F..
00007f14ce800078: 80 6b 22 ce 14 7f 00 00 08 44 8f d6 ea 76 37 34 .k"......D...v74
00007f14ce800088: a8 6b 22 ce 14 7f 00 00 08 5b c1 41 f8 f9 bd 58 .k"......[.A...X
00007f14ce800098: d0 6b 22 ce 14 7f 00 00 08 50 72 ef 42 a5 6a 2a .k"......Pr.B.j*
which show a similar pattern as the hex dumper (Dstring.Memory.dump
):
0112748739DB99|00001000|↔|00007F536E755118|459055102CAE09F54B
01E606DBB4F6FA|00001000|↔|00007F536E754DD8|4BBC8ED0A25F0B8755
07BDEDF50B83AC|00001000|↔|00007F536E754DB0|43A0DFEEA191AEA2A3
0C5FB78013D42F|00001000|↔|00007F536E754CC0|41854A8815FE6E6A3C
1F3A8D9CC33F5E|00001000|↔|00007F536E7550F0|4BA36307910E82AB70
NOTE: In the performance
benchmark
Guid's are byte[]-reversed.
> 0112748739DB99|08|↔|00007F536E755118
(byte reversed becomes)
> 18 51 75 6E 53 7F 00 00|08|99 DB 39 87 74 12 01
(and compared to `dotnet-dump`)
< 30 6b 22 ce 14 7f 00 00 08 73 9a ac 37 c9 be ba
- Once you are done, clean the
core_[DATESTAMP]_[TIMESTAMP]
files
Benchmarks
// * Summary *
BenchmarkDotNet v0.15.4, Linux NixOS 25.05 (Warbler)
12th Gen Intel Core i7-12800H 0.40GHz, 1 CPU, 20 logical and 14 physical cores
.NET SDK 8.0.414
[Host] : .NET 8.0.20 (8.0.20, 8.0.2025.41914), X64 RyuJIT x86-64-v3 DEBUG
Job-OVERNF : .NET 8.0.20 (8.0.20, 8.0.2025.41914), X64 RyuJIT x86-64-v3
Job=Job-OVERNF Runtime=.NET 8.0 IterationCount=1
LaunchCount=0 WarmupCount=0 Error=NA
| Method | N | Mean | Ratio | Allocated | Alloc Ratio |
|--------------------------------------------------- |-------- |-----------:|-------:|----------:|------------:|
| 'Array.zeroCreate<string> x.N' | 1000000 | 2.183 ms | 1.00 | 7.63 MB | 1.00 |
| 'Array.zeroCreate<dstring> x.N' | 1000000 | 5.296 ms | 2.43 | 15.26 MB | 2.00 |
| 'x.guids |> Array.map Encoding.ASCII.GetString' | 1000000 | 121.282 ms | 55.57 | 61.04 MB | 8.00 |
| 'x.guids |> Array.map Dstring.Bytes.toDstring' | 1000000 | 63.640 ms | 29.16 | 53.41 MB | 7.00 |
| 'x.sha256s |> Array.map Encoding.ASCII.GetString' | 1000000 | 215.073 ms | 98.54 | 91.55 MB | 12.00 |
| 'x.sha256s |> Array.map Dstring.Bytes.toDstring' | 1000000 | 76.005 ms | 34.82 | 68.66 MB | 9.00 |
| 'x.strings |> Array.sort' | 1000000 | 264.986 ms | 121.41 | 7.63 MB | 1.00 |
| 'x.strings |> Array.sortDescending' | 1000000 | 288.462 ms | 132.17 | 7.63 MB | 1.00 |
| 'x.strings |> Array.map Dstring.UTF8.fromString' | 1000000 | 112.914 ms | 51.74 | 53.41 MB | 7.00 |
| 'x.dstrings |> Array.map Dstring.UTF8.toString' | 1000000 | 252.340 ms | 115.62 | 98.81 MB | 12.95 |
| 'x.dstrings |> Dstring.Array.sort' | 1000000 | 174.879 ms | 80.13 | 15.26 MB | 2.00 |
| 'x.dstrings |> Dstring.Array.sortDescending' | 1000000 | 180.526 ms | 82.71 | 15.26 MB | 2.00 |
| 'x.dstrings |> Dstring.Array.sortPrefix' | 1000000 | 155.760 ms | 71.37 | 15.26 MB | 2.00 |
| 'x.dstrings |> Dstring.Array.sortPrefixDescending' | 1000000 | 157.594 ms | 72.21 | 15.26 MB | 2.00 |
// * Hints *
HideColumnsAnalyser
Summary -> Hidden columns: Error
// * Legends *
N : Value of the 'N' parameter
Mean : Arithmetic mean of all measurements
Ratio : Mean of the ratio distribution ([Current]/[Baseline])
Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
Alloc Ratio : Allocated memory ratio distribution ([Current]/[Baseline])
1 ms : 1 Millisecond (0.001 sec)
Licenses
Source code in this repository is ONLY covered by a Server Side Public License, v 1
while the rest (knowhow
, text
, media
, …), is covered by the
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
license.
However, as it's not permitted to deploy a nuget
package with non OSI
nor
FSF
licenses:
Pushing SpiseMisu.Text.Dstring.0.11.0.nupkg to 'https://www.nuget.org/api/v2/package'...
PUT https://www.nuget.org/api/v2/package/
BadRequest https://www.nuget.org/api/v2/package/ 846ms
error: Response status code does not indicate success: 400 (License expression must only contain licenses that are approved by Open Source Initiative or Free Software Foundation. Unsupported licenses: SSPL-1.0.).
The CIL-bytecode
content of the nuget
package is therefore dual-licensed
under the GNU Affero General Public License v3.0 only
and the
rest (knowhow
, text
, media
, …), is covered by the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
license.
For more info on compatible nuget
packages licenses, see SPDX License
List.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- FSharp.Core (>= 8.0.403)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.