SharpInference.Cli 0.7.1

There is a newer prerelease version of this package available.
See the version list below for details.

dotnet tool install --global SharpInference.Cli --version 0.7.1

This package contains a .NET tool you can call from the shell/command line.

dotnet new tool-manifest
                    

                            if you are setting up this repo

dotnet tool install --local SharpInference.Cli --version 0.7.1

This package contains a .NET tool you can call from the shell/command line.

#tool dotnet:?package=SharpInference.Cli&version=0.7.1

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

nuke :add-package SharpInference.Cli --version 0.7.1

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

SharpInference.Cli

sharpi-cli — a command-line tool for LLM inference and image generation, powered by SharpInference. Reads GGUF models and runs transformer inference on CPU (AVX2/AVX-512 SIMD) or GPU (Vulkan / CUDA).

Install

dotnet tool install -g SharpInference.Cli

Or update:

dotnet tool update -g SharpInference.Cli

Usage

# Text generation (CPU)
sharpi-cli -m models/SmolLM2-1.7B-Instruct-Q4_K_M.gguf -p "Once upon a time" --temp 0.7

# All layers on GPU (Vulkan or CUDA, auto-selected)
sharpi-cli -m models/Qwen3-8B-Q4_K_M.gguf -p "Explain mmap" -g -1

# Interactive chat (omit -p to enter chat mode)
sharpi-cli -m models/Qwen3-8B-Q4_K_M.gguf

# Image generation (Z-Image-Turbo, requires CUDA)
sharpi-cli image \
  -m models/z_image_turbo-Q5_K_M.gguf \
  --vae models/z-image-turbo/vae \
  --qwen-encoder models/Z-Image-AbliteratedV1.Q5_K_M.gguf \
  --qwen-tokenizer models/z-image-turbo/tokenizer/tokenizer.json \
  -p "a serene mountain lake at sunrise" -W 512 -H 512 --steps 4 -o out.png

Flag names are intentionally compatible with llama.cpp / llama-cli.

Flag	Default	Description
`-m, --model`	auto-detect	Path to GGUF model file
`-p, --prompt`	(interactive)	Input prompt; omit to enter chat
`-n, --n-predict`	`512`	Maximum tokens to generate
`--temp`	`0.7`	Sampling temperature (`0` = greedy)
`--top-k`	`40`	Top-k sampling
`--top-p`	`0.95`	Top-p nucleus sampling
`--min-p`	`0.05`	Min-p sampling
`-g, --n-gpu-layers`	`0`	Layers on GPU (`0` = CPU only, `-1` = all)
`-c, --ctx-size`	model default	Context / max sequence length
`--tq`	off	TurboQuant KV cache compression (3-bit, ~5× VRAM reduction)

Run sharpi-cli --help for the full reference.

Requirements

.NET 10 runtime (the tool installs framework-dependent)
x86-64 CPU with AVX2 support
For GPU inference: Vulkan-capable GPU (any vendor) or NVIDIA GPU with CUDA 11.x / 12.x

License

Product	Compatible and additional computed target framework versions.
.NET	net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

This package has no dependencies.