DotCompute.Backends.CPU 0.6.2

dotnet add package DotCompute.Backends.CPU --version 0.6.2
                    
NuGet\Install-Package DotCompute.Backends.CPU -Version 0.6.2
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="DotCompute.Backends.CPU" Version="0.6.2" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="DotCompute.Backends.CPU" Version="0.6.2" />
                    
Directory.Packages.props
<PackageReference Include="DotCompute.Backends.CPU" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add DotCompute.Backends.CPU --version 0.6.2
                    
#r "nuget: DotCompute.Backends.CPU, 0.6.2"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package DotCompute.Backends.CPU@0.6.2
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=DotCompute.Backends.CPU&version=0.6.2
                    
Install as a Cake Addin
#tool nuget:?package=DotCompute.Backends.CPU&version=0.6.2
                    
Install as a Cake Tool

DotCompute.Backends.CPU

Production-ready CPU compute backend with SIMD vectorization for .NET 9+.

Status: ✅ Production Ready

The CPU backend provides high-performance compute acceleration through:

  • SIMD Vectorization: AVX512/AVX2/NEON instruction sets
  • Multi-threading: Work-stealing thread pool
  • Memory Optimization: NUMA-aware allocation
  • Native AOT: Full compatibility with Native AOT compilation

Key Features

SIMD Acceleration

  • AVX512: Best performance on Intel Ice Lake+ and AMD Zen 4+
  • AVX2: Wide compatibility on modern Intel/AMD processors
  • NEON: ARM64 support for Apple Silicon and ARM servers
  • Automatic Detection: Runtime detection of optimal instruction set

Performance

  • 8-23x Speedup: Achieved on vectorizable operations
  • Memory Bandwidth: 95%+ of theoretical peak utilization
  • Thread Scaling: Near-linear scaling to CPU core count
  • Low Overhead: Sub-microsecond kernel launch latency

Installation

dotnet add package DotCompute.Backends.CPU --version 0.5.3

Usage

Basic Setup

using DotCompute.Backends.CPU;
using Microsoft.Extensions.Logging;

var logger = LoggerFactory.Create(builder => builder.AddConsole())
    .CreateLogger<CpuAccelerator>();

var accelerator = new CpuAccelerator(logger);
await accelerator.InitializeAsync();

Service Registration

services.AddSingleton<IAccelerator, CpuAccelerator>();
// OR
services.AddCpuBackend();

Kernel Execution

var kernelDef = new KernelDefinition
{
    Name = "VectorAdd",
    Source = "/* OpenCL C kernel source */",
    EntryPoint = "vector_add"
};

var compiledKernel = await accelerator.CompileKernelAsync(kernelDef);
await compiledKernel.ExecuteAsync(parameters);

Architecture

SIMD Dispatcher

Automatically selects the best available SIMD instruction set:

  1. Detection: Runtime CPU capability detection
  2. Dispatch: Function pointer selection to optimized kernels
  3. Fallback: Scalar implementation for unsupported hardware

Thread Pool

  • Work-Stealing: Efficient load balancing across cores
  • Thread-Local Storage: Minimizes synchronization overhead
  • Adaptive Sizing: Scales with workload and system load

Memory Management

  • NUMA Awareness: Memory allocation respects CPU topology
  • Cache Optimization: Data layout for optimal cache usage
  • Memory Pooling: Reuse allocations to reduce overhead

Performance Benchmarks

Tested on Intel Core Ultra 7 165H with 16 threads:

Operation Elements CPU Time SIMD Time Speedup
Vector Add 1M floats 4.33ms 187μs 23x
Matrix Mult 512x512 2,340ms 89ms 26x
Dot Product 1M floats 2.1ms 156μs 13.4x

System Requirements

Minimum

  • .NET 9.0 or later
  • x64 or ARM64 processor
  • 2GB RAM
  • Modern CPU with AVX2+ (Intel Haswell+ / AMD Excavator+)
  • 8+ CPU cores for optimal threading performance
  • 16GB+ RAM for large datasets

Supported Platforms

  • Windows: x64, ARM64
  • Linux: x64, ARM64
  • macOS: x64 (Intel), ARM64 (Apple Silicon)

Build Configuration

The CPU backend automatically configures itself based on the target platform:

<PropertyGroup Condition="'$(TargetArchitecture)' == 'x64'">
  <DefineConstants>$(DefineConstants);ENABLE_AVX2;ENABLE_AVX512</DefineConstants>
</PropertyGroup>

<PropertyGroup Condition="'$(TargetArchitecture)' == 'arm64'">
  <DefineConstants>$(DefineConstants);ENABLE_NEON</DefineConstants>
</PropertyGroup>

Troubleshooting

Performance Issues

  1. Check SIMD Support: Verify CPU supports AVX2/AVX512
  2. Memory Alignment: Ensure data is properly aligned for SIMD
  3. Thread Count: Match thread count to physical cores
  4. Memory Bandwidth: Monitor memory utilization during execution

Compatibility Issues

  1. Native AOT: Ensure all types are AOT-compatible
  2. Platform Support: Verify target platform support
  3. Dependencies: Check for missing runtime dependencies

Documentation & Resources

Comprehensive documentation is available for DotCompute:

Architecture Documentation

Developer Guides

Examples

API Documentation

Support

Contributing

The CPU backend welcomes contributions in:

  • New SIMD instruction set support (e.g., AVX-512 variants)
  • Platform-specific optimizations
  • Kernel compilation improvements
  • Performance benchmarks and analysis

See CONTRIBUTING.md for guidelines.

Product Compatible and additional computed target framework versions.
.NET net9.0 is compatible.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (3)

Showing the top 3 NuGet packages that depend on DotCompute.Backends.CPU:

Package Downloads
DotCompute.Linq

GPU-accelerated LINQ extensions for DotCompute. Transparent GPU execution for LINQ queries with automatic kernel generation, fusion optimization, and Reactive Extensions support.

Orleans.GpuBridge.Grains

Orleans grain implementations for GPU Bridge - GPU-accelerated batch, stream, and resident grains

Orleans.GpuBridge.Backends.DotCompute

DotCompute backend provider for Orleans.GpuBridge.Core - Enables GPU acceleration via CUDA, OpenCL, Metal, and CPU with attribute-based kernel definition.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.6.2 94 2/9/2026
0.5.3 221 2/2/2026
0.5.2 626 12/8/2025
0.5.1 540 11/28/2025
0.5.0 201 11/27/2025
0.4.2-rc2 385 11/11/2025
0.4.1-rc2 324 11/6/2025