VoxCli 0.1.2
dotnet tool install --global VoxCli --version 0.1.2
dotnet new tool-manifest
dotnet tool install --local VoxCli --version 0.1.2
#tool dotnet:?package=VoxCli&version=0.1.2
nuke :add-package VoxCli --version 0.1.2
vox
A small CLI for transcribing audio files locally with Whisper, built around recording workflows (notably Apple Voice Memos). Distributed as a .NET global tool.
vox orchestrates two existing CLIs — whisper-cli (whisper.cpp) and ffmpeg — rather than bundling its own engine. It also ships an MCP mode so it can be used directly from Claude Code and other MCP clients.
See PLAN.md for design decisions, the full CLI shape, configuration, model handling, and the index.
Prerequisites
brew install whisper-cpp ffmpeg
Both must be in PATH. vox refuses to run with a clear error if either is missing.
Install
dotnet tool install --global VoxCli
Quick start
First-time setup picks source directories (defaults to the Apple Voice Memos shared container on macOS), an output directory, and a default model:
vox setup
Then, the recording-workflow commands:
vox listen # pick a recording from configured sources
vox listen --latest # transcribe the most recent recording (same as 'vox latest')
vox latest # transcribe the most recent recording
vox list # list recordings, marking transcribed/untranscribed
vox find <query> # find transcripts whose preview matches <query>
vox find --latest # print the path of the most recent transcript
(Running plain vox with no arguments prints help.)
Or transcribe a specific file:
vox path/to/recording.m4a # transcribe → prints output path
vox path/to/recording.m4a --model small --format txt --language en
vox help # full usage
On success, vox prints the absolute output path on stdout and nothing else. Errors and progress go to stderr. This makes it pipe-friendly:
TRANSCRIPT=$(vox recording.m4a) && cat "$TRANSCRIPT"
Models
The default model is large-v3 (OpenAI's released large checkpoint, ~2.9 GB). It is downloaded lazily on the first transcription that needs it; subsequent runs reuse the cached file. Change the default with vox config set-default-model <name> (e.g. small, medium, large-v3-turbo).
vox models list # list models present locally
vox models download <name> # pre-fetch a model
vox models remove <name> # delete a model from the vox cache
vox models path # print the vox model cache directory
Vox first looks in its own cache (~/.cache/voxcli/models), then in known reuse paths (~/.cache/openwhispr/whisper-models, ~/.cache/whisper.cpp, $WHISPER_CPP_MODEL_DIR), and downloads from HuggingFace only as a last resort.
Voice Activity Detection
Whisper occasionally falls into hallucination loops on audio with long silent stretches — repeating the same sentence hundreds of times, or emitting boilerplate phrases it learned from YouTube training data. To prevent this, vox passes the Silero VAD model to whisper-cli to trim silence before transcription. VAD is on by default.
The VAD model (~2 MB) is downloaded once on first use and cached at ~/.cache/voxcli/vad/.
vox <file> --no-vad # disable VAD for a single run
vox <file> --vad # force VAD on even if disabled in config
vox config set-vad off # disable persistently
vox config set-vad on # re-enable persistently
MCP mode
claude mcp add vox -- vox --mcp
vox exposes tools so Claude can list recordings, transcribe, and look up existing transcripts without being told about the CLI.
Build from source
dotnet build # build
dotnet test # run tests
dotnet run --project VoxCli.Cli -- <args> # run from source
dotnet pack VoxCli.Cli/VoxCli.Cli.csproj --configuration Release # build the global-tool nupkg
License
MIT.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
This package has no dependencies.