SharpInference.Cli 0.7.1

There is a newer prerelease version of this package available.
See the version list below for details.
dotnet tool install --global SharpInference.Cli --version 0.7.1
                    
This package contains a .NET tool you can call from the shell/command line.
dotnet new tool-manifest
                    
if you are setting up this repo
dotnet tool install --local SharpInference.Cli --version 0.7.1
                    
This package contains a .NET tool you can call from the shell/command line.
#tool dotnet:?package=SharpInference.Cli&version=0.7.1
                    
nuke :add-package SharpInference.Cli --version 0.7.1
                    

SharpInference.Cli

sharpi-cli — a command-line tool for LLM inference and image generation, powered by SharpInference. Reads GGUF models and runs transformer inference on CPU (AVX2/AVX-512 SIMD) or GPU (Vulkan / CUDA).

Install

dotnet tool install -g SharpInference.Cli

Or update:

dotnet tool update -g SharpInference.Cli

Usage

# Text generation (CPU)
sharpi-cli -m models/SmolLM2-1.7B-Instruct-Q4_K_M.gguf -p "Once upon a time" --temp 0.7

# All layers on GPU (Vulkan or CUDA, auto-selected)
sharpi-cli -m models/Qwen3-8B-Q4_K_M.gguf -p "Explain mmap" -g -1

# Interactive chat (omit -p to enter chat mode)
sharpi-cli -m models/Qwen3-8B-Q4_K_M.gguf

# Image generation (Z-Image-Turbo, requires CUDA)
sharpi-cli image \
  -m models/z_image_turbo-Q5_K_M.gguf \
  --vae models/z-image-turbo/vae \
  --qwen-encoder models/Z-Image-AbliteratedV1.Q5_K_M.gguf \
  --qwen-tokenizer models/z-image-turbo/tokenizer/tokenizer.json \
  -p "a serene mountain lake at sunrise" -W 512 -H 512 --steps 4 -o out.png

Flag names are intentionally compatible with llama.cpp / llama-cli.

Flag Default Description
-m, --model auto-detect Path to GGUF model file
-p, --prompt (interactive) Input prompt; omit to enter chat
-n, --n-predict 512 Maximum tokens to generate
--temp 0.7 Sampling temperature (0 = greedy)
--top-k 40 Top-k sampling
--top-p 0.95 Top-p nucleus sampling
--min-p 0.05 Min-p sampling
-g, --n-gpu-layers 0 Layers on GPU (0 = CPU only, -1 = all)
-c, --ctx-size model default Context / max sequence length
--tq off TurboQuant KV cache compression (3-bit, ~5× VRAM reduction)

Run sharpi-cli --help for the full reference.

Requirements

  • .NET 10 runtime (the tool installs framework-dependent)
  • x86-64 CPU with AVX2 support
  • For GPU inference: Vulkan-capable GPU (any vendor) or NVIDIA GPU with CUDA 11.x / 12.x

License

MIT. Copyright (c) 2026 Pekka Heikura.

Product Compatible and additional computed target framework versions.
.NET net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

This package has no dependencies.

Version Downloads Last Updated
0.7.2-alpha.0.2 0 6/5/2026
0.7.2-alpha.0.1 0 6/5/2026
0.7.1 38 6/4/2026
0.7.1-alpha.0.1 26 6/4/2026
0.7.0 31 6/4/2026
0.6.1-alpha.0.9 23 6/4/2026
0.6.1-alpha.0.8 28 6/4/2026
0.6.1-alpha.0.7 39 6/4/2026
0.6.1-alpha.0.6 32 6/4/2026
0.6.1-alpha.0.5 43 6/4/2026
0.6.1-alpha.0.4 40 6/3/2026
0.6.1-alpha.0.3 44 6/3/2026
0.6.1-alpha.0.2 35 6/3/2026
0.6.1-alpha.0.1 33 6/3/2026
0.6.0 89 6/3/2026
0.5.1-alpha.0.44 36 6/3/2026
0.5.1-alpha.0.43 41 6/2/2026
0.5.1-alpha.0.42 37 6/2/2026
0.5.1-alpha.0.23 53 5/31/2026
0.5.1-alpha.0.22 50 5/31/2026
Loading failed