Cgml 1.1.0

.NET 6.0

dotnet add package Cgml --version 1.1.0

NuGet\Install-Package Cgml -Version 1.1.0

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="Cgml" Version="1.1.0" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

paket add Cgml --version 1.1.0

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: Cgml, 1.1.0"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

// Install Cgml as a Cake Addin
#addin nuget:?package=Cgml&version=1.1.0

// Install Cgml as a Cake Tool
#tool nuget:?package=Cgml&version=1.1.0

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

GPU-targeted vendor-agnostic AI library for Windows.

For unmanaged interop, it depends on ComLightInterop library.

This library doesn’t include any compute shaders, and is not specific to any ML model. ML models are expected to be implemented at the higher level of the stack, in a project which consumes this DLL.

Instead, this project only contains low-level utilities to initialize Direct3D 11, create a set of compute shaders implementing a model, move tensors data between system memory and VRAM, and dispatch compute shaders passing tensors to read and write, and a single constant buffer.

It also implements serializer which keeps multiple tensors in a single ZIP archive, and a few more utility functions and classes.

Because the underlying Cgml.dll C++ DLL is only built for Win64 platform, this library will only run when used from a 64-bit process.

Tensor Conventions

This library uses programmer-friendly approach to these tensors. This differs from Python libraries like PyTorch and NumPy which were designed for mathematical conventions but then in practice they almost exclusively using negative numbers to index tensor dimensions, to count them from the right.

By default, all tensors are row major. A matrix with 3 columns and 2 rows is represented as a tensor of size [ 3, 2, 1, 1 ]

The length of the shape is irrelevant in CGML. A matrix with 7 columns and 1 row is indistinguishable from a vector of length 7. This behaviour is by design.

There’s a limit for the number of dimensions in a tensor, that limit is 4. This allows to keep tensor sizes in 128-bit SIMD vectors. This also eliminates dynamic memory allocations and pointer chasing while manipulating shapes of the tensors.

For hardware compatibility reasons, there’s no support for FP64 floats or int64 integers in the tensors. Luckily, for ML applications FP32 floats and 32-bit integers are sufficient.

FP16 Conventions

The library has some support for both flavours of FP16, IEEE 754 and BF16. However, they both need special handling in the HLSL shaders on your side.

Half-precision floating-point

IEEE 754 FP16 tensors are exposed to the shaders as Buffer<float> objects for inputs, or RWBuffer<float> objects for outputs.

The problem with that, RWBuffer<float> unordered access views are always rounding towards zero when storing values into FP16 buffers. This is documented by Microsoft, but that’s probably not what you want.

Instead, your shaders should round to nearest FP16 when storing values into the output FP16 tensors. Here’s a function for that, which I carefully unit-tested against vcvtps2ph CPU instruction on full range of floats, excluding NAN values.

// This function rounds FP32 value to the nearest FP16, using bankers rounding
// When GPUs are converting FP32 to FP16, they always truncate towards 0, documented there:
// https://learn.microsoft.com/en-us/windows/win32/direct3d10/d3d10-graphics-programming-guide-resources-data-conversion#conververting-from-a-higher-range-representation-to-a-lower-range-representation
inline float roundFp16Nearest( float src )
{
	[branch]
	if( abs( src ) < 65520.0f )
	{
		const uint truncatedFp16 = f32tof16( src );
		const float truncated = f16tof32( truncatedFp16 );
		const float next = f16tof32( truncatedFp16 + 1 );

		const float errTrunc = abs( src - truncated );
		const float errNext = abs( src - next );

		if( errTrunc < errNext )
		{
			// Truncated was closer to the source
			return truncated;
		}
		else if( errTrunc > errNext )
		{
			// Truncated + 1 was closer to the source
			return next;
		}
		else
		{
			// Exactly half, doing banker's rounding to nearest even
			return ( 0 == ( truncatedFp16 & 1 ) ) ? truncated : next;
		}
	}
	else
	{
		// Return +inf or -inf depending on the sign bit of the input
		// Note this destroys NAN values, converting them to inf as well
		uint u = asuint( src );
		u &= 0x80000000u;
		u |= 0x7f800000u;
		return asfloat( u );
	}
}

bfloat16 floating point

BF16 tensors are exposed to the shaders as Buffer<uint> objects for inputs, or RWBuffer<uint> objects for outputs. You should convert these elements to/from floats yourself.

Converting BF16 to float only takes a single bitwise shift instruction: asfloat( bf << 16 )

Downcasting is harder due to rounding. Here’s one possible HLSL implementation.

inline uint roundBf16Nearest( float f )
{
	// Scalar version:
	// uint32_t rounding_bias = ((U32 >> 16) & 1) + UINT32_C(0x7FFF);
	// output_row[ col ] = static_cast<uint16_t>( ( U32 + rounding_bias ) >> 16 );
	const uint u = asuint( f );
	const uint bias = ( u & 0x10000u ) ? 0x8000 : 0x7FFF;
	return ( u + bias ) >> 16;
}

Product	Compatible and additional computed target framework versions.
.NET	net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net6.0
- ComLightInterop (>= 1.3.8)

NuGet packages (2)

Showing the top 2 NuGet packages that depend on Cgml:

Package	Downloads
Cgml.TorchLoader PyTorch model importer for Cgml library	293
Cgml.MistralModel Inference of Mistral language model	289

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last updated
1.1.0	177	1/7/2024
1.0.0	141	12/11/2023

New API to decode images, and process them into CGML tensors.

Integrated RenderDoc debugger: launch your app from RenderDoc and hold F12 to capture GPU compute calls.

Total 318

Current version 177

Per day average 2

gpgpu AI ML