QuickBench 0.1.0
dotnet add package QuickBench --version 0.1.0
NuGet\Install-Package QuickBench -Version 0.1.0
<PackageReference Include="QuickBench" Version="0.1.0" />
<PackageVersion Include="QuickBench" Version="0.1.0" />
<PackageReference Include="QuickBench" />
paket add QuickBench --version 0.1.0
#r "nuget: QuickBench, 0.1.0"
#:package QuickBench@0.1.0
#addin nuget:?package=QuickBench&version=0.1.0
#tool nuget:?package=QuickBench&version=0.1.0
QuickBench
Give it seconds and operations — get ±1% results with machine-independent relative units.
QuickBenchBuilder.Create("MyBench")
.Add("Sort", () => Array.Sort(data.Clone() as int[]))
.Add("Search", () => Array.BinarySearch(sorted, target))
.Run(seconds: 15)
.PrintText();
QuickBench — MyBench
======================================================================
Baseline: 92.3 ± 0.2 μs → 1🍌 = 9.2 ns
Mode: Quick (15s) Rounds: 152
✅ Baseline stable — results reliable
| μs | 🍌 | KB
-----------+----------+----------+----------
Sort | 12.4 | 1,345 | 0.10
Search | 0.03 | 3.3 | 0.00
μs — wall-clock time, comparable within one run. 🍌 Bananas — CPU-normalized units, comparable across runs, power modes, machines.
API
Single operation
QuickBenchBuilder.Create(() => Array.Sort(data.Clone() as int[]))
.Run(seconds: 15).PrintText();
Multiple operations with tags
QuickBenchBuilder.Create("Algo")
.Add("BubbleSort", () => BubbleSort(data))
.Add("QuickSort", () => QuickSort(data))
.Add("MergeSort", () => MergeSort(data))
.Run(seconds: 30).PrintText();
One tag → flat table: each tag = row.
Two tags (row × column)
var builder = QuickBenchBuilder.Create("Database");
foreach (var q in simpleQueries) {
builder.Add("Simple", "Query", () => db.Execute(q));
builder.Add("Simple", "Serialize", () => Serialize(q));
}
foreach (var q in complexQueries) {
builder.Add("Complex", "Query", () => db.Execute(q));
builder.Add("Complex", "Serialize", () => Serialize(q));
}
builder.Run(seconds: 60).PrintText();
Absolute (μs per op):
| Query | Serialize
-----------+----------+----------
Simple | 0.84 | 2.10
Complex | 45.2 | 12.8
Two tags → matrix: first tag = row, second tag = column.
Three tags (section × row × column)
builder.Add("v1", "Simple", "Build", () => BuildV1(s));
builder.Add("v2", "Simple", "Build", () => BuildV2(s));
Three tags → grouped tables: first tag = section header.
Custom operation
public class HttpBenchOp : IBenchOperation {
public string[] Tags => new[] { "API", "GET /users" };
public void Perform() => _client.GetAsync(_url).Wait();
}
builder.Add(new HttpBenchOp());
JSON output
var report = builder.Run(seconds: 30);
report.SaveJson("bench.json");
string json = report.ToJson();
Algorithm
Pipeline
1. Parallel warmup (2s) All CPU cores run arithmetic loop simultaneously.
2. Calibration Measure each slot's duration, compute batch sizes.
3. JIT warmup Run all operations ~1s to trigger .NET tiered compilation.
4. Measurement Shuffled round-robin, baseline interleaved every 10th round.
5. Memory measurement GC.GetTotalAllocatedBytes before/after, median of 5 runs.
6. Statistics Trimmed mean, block CI, practical CI (×4), propagation.
1. Parallel warmup
All CPU cores run the baseline arithmetic loop for 2 seconds simultaneously. This:
- Heats the CPU to thermal steady state (prevents frequency drift during measurement)
- Forces OS scheduler to assign the benchmark thread to a performance core
- Eliminates turbo boost transients
2. Calibration
For each slot, the engine runs the operation for ~0.5s and measures average duration. From this it computes batch size — how many times to call the operation per measurement to reach ~10ms total. A 10μs operation gets batch=1000; a 100ms operation gets batch=1.
3. JIT warmup
All operations run for ~1 second total (scaled by calibrated speed). This triggers .NET's tiered JIT: Tier0 → Tier1 compilation. Minimum 30 iterations per operation (JIT Tier1 threshold).
4. Shuffled round-robin
Operations are grouped into slots by their tag combination. Each round measures all slots once. Between rounds, slot order is shuffled:
Round 1: [Parse] [Baseline] [Build] [Run]
Round 2: [Build] [Run] [Parse] [Baseline]
Round 3: [Baseline] [Parse] [Run] [Build]
Without shuffling, Build always runs after Parse with warm caches. Shuffling randomizes this — each slot sees a mix of warm and cold cache states, producing a realistic average.
Baseline is measured every 10th round at a random position within the round. This tracks CPU frequency changes without polluting L1/L2 cache every round.
5. Memory measurement
After timing completes, each slot runs once more bracketed by GC.GetTotalAllocatedBytes(true). Repeated 5 times, median taken. Reports KB allocated per operation.
6. Statistics
Drop first 10% of each slot's samples. Removes residual JIT warmup effects.
Trimmed mean (15%): sort samples, discard bottom 15% and top 15%, mean the remaining 70%. Rejects GC spikes and OS interrupts while preserving more information than median.
Block CI: samples are split into ~10 blocks of equal size. Each block's mean is computed. Standard CI formula applied to block means with t-distribution correction for small N. Blocks account for temporal autocorrelation — consecutive samples within a block may correlate (thermal drift), but block means are approximately independent.
Practical CI (×4): block CI multiplied by 4. Empirical correction calibrated on Apple M4 Pro: covers ~90% of run-to-run variance on thermally stable systems.
Propagation for bananas: banana CI combines operation uncertainty and baseline uncertainty:
δ(🍌) = 🍌 × √( (δ_op / op)² + (δ_baseline / baseline)² )
Stability detection
If baseline practical CI exceeds 3% of baseline mean → warning:
⚠️ Baseline unstable — results unreliable
Indicates background CPU load or thermal instability during measurement.
Baseline and bananas
A fixed arithmetic loop (80K iterations of integer multiply + XOR, [NoInlining]) runs as a regular slot. It defines bananas:
1 🍌 = baseline_time / 10000
Bananas normalize CPU frequency differences. On M4 Pro: ~60μs (Hi Power) / ~92μs (Low Power) → same ~2,560 🍌 on both.
Accuracy
| Duration | Practical CI | Rounds | Use case |
|---|---|---|---|
| 7s | ±3% | ~80 | Smoke test |
| 15s | ±1.5% | ~160 | Quick check |
| 30s | ±1% | ~320 | Development |
| 60s | ±0.5% | ~650 | Pre-commit |
| 120s+ | ±0.2% | ~1300 | Release |
Requirements
.NET 6.0+. Zero dependencies.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 is compatible. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net6.0
- No dependencies.
-
net7.0
- No dependencies.
-
net8.0
- No dependencies.
-
net9.0
- No dependencies.
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 0.1.0 | 107 | 4/3/2026 |