DotCompute.Backends.CPU 0.6.2

.NET 9.0

dotnet add package DotCompute.Backends.CPU --version 0.6.2

NuGet\Install-Package DotCompute.Backends.CPU -Version 0.6.2

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="DotCompute.Backends.CPU" Version="0.6.2" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="DotCompute.Backends.CPU" Version="0.6.2" />
                    

                            Directory.Packages.props

<PackageReference Include="DotCompute.Backends.CPU" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add DotCompute.Backends.CPU --version 0.6.2

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: DotCompute.Backends.CPU, 0.6.2"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package DotCompute.Backends.CPU@0.6.2

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=DotCompute.Backends.CPU&version=0.6.2
                    

                            Install as a Cake Addin

#tool nuget:?package=DotCompute.Backends.CPU&version=0.6.2
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

DotCompute.Backends.CPU

Production-ready CPU compute backend with SIMD vectorization for .NET 9+.

Status: ✅ Production Ready

The CPU backend provides high-performance compute acceleration through:

SIMD Vectorization: AVX512/AVX2/NEON instruction sets
Multi-threading: Work-stealing thread pool
Memory Optimization: NUMA-aware allocation
Native AOT: Full compatibility with Native AOT compilation

Key Features

SIMD Acceleration

AVX512: Best performance on Intel Ice Lake+ and AMD Zen 4+
AVX2: Wide compatibility on modern Intel/AMD processors
NEON: ARM64 support for Apple Silicon and ARM servers
Automatic Detection: Runtime detection of optimal instruction set

Performance

8-23x Speedup: Achieved on vectorizable operations
Memory Bandwidth: 95%+ of theoretical peak utilization
Thread Scaling: Near-linear scaling to CPU core count
Low Overhead: Sub-microsecond kernel launch latency

Installation

dotnet add package DotCompute.Backends.CPU --version 0.5.3

Usage

Basic Setup

using DotCompute.Backends.CPU;
using Microsoft.Extensions.Logging;

var logger = LoggerFactory.Create(builder => builder.AddConsole())
    .CreateLogger<CpuAccelerator>();

var accelerator = new CpuAccelerator(logger);
await accelerator.InitializeAsync();

Service Registration

services.AddSingleton<IAccelerator, CpuAccelerator>();
// OR
services.AddCpuBackend();

Kernel Execution

var kernelDef = new KernelDefinition
{
    Name = "VectorAdd",
    Source = "/* OpenCL C kernel source */",
    EntryPoint = "vector_add"
};

var compiledKernel = await accelerator.CompileKernelAsync(kernelDef);
await compiledKernel.ExecuteAsync(parameters);

Architecture

SIMD Dispatcher

Automatically selects the best available SIMD instruction set:

Detection: Runtime CPU capability detection
Dispatch: Function pointer selection to optimized kernels
Fallback: Scalar implementation for unsupported hardware

Thread Pool

Work-Stealing: Efficient load balancing across cores
Thread-Local Storage: Minimizes synchronization overhead
Adaptive Sizing: Scales with workload and system load

Memory Management

NUMA Awareness: Memory allocation respects CPU topology
Cache Optimization: Data layout for optimal cache usage
Memory Pooling: Reuse allocations to reduce overhead

Performance Benchmarks

Tested on Intel Core Ultra 7 165H with 16 threads:

Operation	Elements	CPU Time	SIMD Time	Speedup
Vector Add	1M floats	4.33ms	187μs	23x
Matrix Mult	512x512	2,340ms	89ms	26x
Dot Product	1M floats	2.1ms	156μs	13.4x

System Requirements

Minimum

.NET 9.0 or later
x64 or ARM64 processor
2GB RAM

Supported Platforms

Windows: x64, ARM64
Linux: x64, ARM64
macOS: x64 (Intel), ARM64 (Apple Silicon)

Build Configuration

The CPU backend automatically configures itself based on the target platform:

<PropertyGroup Condition="'$(TargetArchitecture)' == 'x64'">
  <DefineConstants>$(DefineConstants);ENABLE_AVX2;ENABLE_AVX512</DefineConstants>
</PropertyGroup>

<PropertyGroup Condition="'$(TargetArchitecture)' == 'arm64'">
  <DefineConstants>$(DefineConstants);ENABLE_NEON</DefineConstants>
</PropertyGroup>

Troubleshooting

Performance Issues

Check SIMD Support: Verify CPU supports AVX2/AVX512
Memory Alignment: Ensure data is properly aligned for SIMD
Thread Count: Match thread count to physical cores
Memory Bandwidth: Monitor memory utilization during execution

Compatibility Issues

Native AOT: Ensure all types are AOT-compatible
Platform Support: Verify target platform support
Dependencies: Check for missing runtime dependencies

Documentation & Resources

Comprehensive documentation is available for DotCompute:

Architecture Documentation

Backend Integration - CPU SIMD implementation details
System Overview - Architecture and design principles

Developer Guides

Getting Started - Installation and quick start
Backend Selection - When to use CPU vs GPU
Performance Tuning - SIMD optimization techniques (3.7x measured speedup)
Kernel Development - Writing efficient kernels

Examples

Basic Vector Operations - CPU SIMD examples
Matrix Operations - Optimized CPU implementations

API Documentation

API Reference - Complete API documentation

Support

Documentation: Comprehensive Guides
Issues: GitHub Issues
Discussions: GitHub Discussions

Contributing

The CPU backend welcomes contributions in:

New SIMD instruction set support (e.g., AVX-512 variants)
Platform-specific optimizations
Kernel compilation improvements
Performance benchmarks and analysis

See CONTRIBUTING.md for guidelines.

Product	Compatible and additional computed target framework versions.
.NET	net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net9.0
- DotCompute.Abstractions (>= 0.6.2)
- DotCompute.Core (>= 0.6.2)
- DotCompute.Memory (>= 0.6.2)
- DotCompute.Plugins (>= 0.6.2)
- Microsoft.CodeAnalysis.CSharp (>= 5.0.0)
- Microsoft.Extensions.Configuration.Abstractions (>= 10.0.2)
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 10.0.2)
- Microsoft.Extensions.Logging.Abstractions (>= 10.0.2)
- Microsoft.Extensions.Options (>= 10.0.2)
- Microsoft.NET.ILLink.Tasks (>= 9.0.12)
- System.Diagnostics.PerformanceCounter (>= 10.0.2)
- System.Management (>= 10.0.2)
- System.Threading.Channels (>= 10.0.2)

NuGet packages (3)

Showing the top 3 NuGet packages that depend on DotCompute.Backends.CPU:

Package	Downloads
DotCompute.Linq GPU-accelerated LINQ extensions for DotCompute. Transparent GPU execution for LINQ queries with automatic kernel generation, fusion optimization, and Reactive Extensions support.	2.3K
Orleans.GpuBridge.Grains Orleans grain implementations for GPU Bridge - GPU-accelerated batch, stream, and resident grains	1.1K
Orleans.GpuBridge.Backends.DotCompute DotCompute backend provider for Orleans.GpuBridge.Core - Enables GPU acceleration via CUDA, OpenCL, Metal, and CPU with attribute-based kernel definition.	1.1K

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
0.6.2	94	2/9/2026
0.5.3	221	2/2/2026
0.5.2	626	12/8/2025
0.5.1	540	11/28/2025
0.5.0	201	11/27/2025
0.4.2-rc2	385	11/11/2025
0.4.1-rc2	324	11/6/2025

Total 3.6K

Current version 94

Per day average 21

dotcompute cpu simd avx avx2 avx512 neon vectorization parallel native-aot ring-kernel production-ready