Ocr.Vellum.Cli 0.2.6

.NET 9.0

dotnet tool install --global Ocr.Vellum.Cli --version 0.2.6

This package contains a .NET tool you can call from the shell/command line.

dotnet new tool-manifest
                    

                            if you are setting up this repo

dotnet tool install --local Ocr.Vellum.Cli --version 0.2.6

This package contains a .NET tool you can call from the shell/command line.

#tool dotnet:?package=Ocr.Vellum.Cli&version=0.2.6

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

nuke :add-package Ocr.Vellum.Cli --version 0.2.6

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Vellum

Chrome screen-ai OCR for .NET 9. Loads Google Chrome's built-in OCR engine (chrome_screen_ai.dll / libchromescreenai.so) directly from managed code and extracts text from PDFs and images — no browser window, no Tesseract, no cloud API.

A .NET port of sergiocorreia/clv-locro. NuGet: Ocr.Vellum (library) and Ocr.Vellum.Cli (global tool).

Note — the native Chrome OCR library is not redistributed. You need either a local Chrome install (the wrapper finds it automatically) or a one-time vellum download copy step. See Screen AI binaries below.

Features

Fast, local OCR via the same on-device engine Chrome ships for accessibility.
PDF and image input — .pdf, .jpg, .jpeg, .png, .webp, .bmp, .tiff, .tif, .gif.
Structured results — pages → blocks → lines → words, each with bounding box and confidence.
Searchable PDF output — overlay an invisible text layer on the original pages.
Interactive HTML report — single self-contained file with hoverable word boxes and a synchronised sidebar (AWS Textract-style viewer).
ASP.NET Core ready — services.AddVellumOcr() with lazy initialisation; your app starts even before the binaries are installed.
Cross-platform — win-x64 and linux-x64.

Install

Library

dotnet add package Ocr.Vellum

The namespace is Vellum — using Vellum; — the Ocr. prefix is just the NuGet package id (the plain Vellum id was already taken).

CLI (global tool)

dotnet tool install -g Ocr.Vellum.Cli

Exposes a vellum command with three subcommands: ocr, download, export.

Screen AI binaries — one-time setup

The Chrome OCR library (~33 MB DLL plus ~100–250 MB of TFLite models) is licensed to ship only with Chrome. Vellum does not redistribute it. Pick one of:

Option	Setup
A. Use Chrome in place	Install Chrome, open `chrome://components`, trigger Screen AI download. Vellum finds it.
B. Copy out of Chrome	`vellum download` — copies into `%LOCALAPPDATA%\vellum\<version>\` (Windows) or `~/.local/share/vellum/<version>/` (Linux).
C. Ship a portable zip	On a machine with the component installed, run `vellum export -o bundle.zip`. On target machines, drop it at `~/Dropbox/bin/screen-ai-{windows,linux}.zip` or pass `--zip`.

Discovery order used at first OCR call: Chrome → Vellum cache → Dropbox zip → Omaha server (currently restricted to Chrome).

Quick start

Library

using Vellum;

using var ai = new ScreenAI();               // auto-discovers the component
var result = ai.Ocr("invoice.pdf");

Console.WriteLine(result.ToText());           // plain text, pages joined by \f
File.WriteAllText("invoice.json", result.ToJson());

CLI

vellum ocr document.pdf                    # writes document_ocr.txt + document_ocr.json
vellum ocr photo.jpg --text                # pipe-friendly text to stdout
vellum ocr big.pdf -p 1-5,10                # specific pages
vellum ocr scan.png --light                 # smaller/faster model
vellum ocr doc.pdf -s doc_searchable.pdf    # invisible text overlay
vellum ocr doc.pdf --html doc.html          # interactive HTML report

Full page-spec grammar: 1, 1-10, 1,3,5, 1-5,10-12 — ranges and single pages, comma-separated.

ASP.NET Core / generic host

AddVellumOcr registers an IOcrEngine singleton. Startup succeeds even when the native binaries aren't installed yet — the DLL is only loaded on the first OCR call, which makes it safe to wire up and deploy to machines that will pick up Chrome / the model cache later.

using Vellum;

builder.Services.AddVellumOcr(options =>
{
    options.LightMode       = false;   // true = smaller, faster model
    options.AutoDownload    = true;    // copy from Chrome on first use if missing
    options.SerializeCalls  = true;    // (default) lock around PerformOCR for thread safety
    // options.ModelDir     = "/opt/screen_ai/140.20";   // pin explicitly if you want
});

Inject and use anywhere:

app.MapPost("/ocr", async (IOcrEngine ocr, IFormFile file, CancellationToken ct) =>
{
    await ocr.EnsureReadyAsync(ct);            // optional warm-up

    var tmp = Path.GetTempFileName();
    await using (var fs = File.Create(tmp))
        await file.CopyToAsync(fs, ct);

    try
    {
        // OcrAsync awaits the serialisation semaphore non-blockingly and
        // runs the CPU-bound OCR work on the thread pool — the request
        // thread is returned while the native DLL is busy.
        var result = await ocr.OcrAsync(tmp, ct: ct);
        return Results.Text(result.ToText());
    }
    finally { File.Delete(tmp); }
});

IsReady tells you whether the library is loaded; EnsureReadyAsync(ct) is a good fit for a health check or a BackgroundService warm-up.

Output formats

`OcrResult` (model)

OcrResult
└─ Pages: IReadOnlyList<OcrPage>
   └─ OcrPage { PageNumber, Width, Height, Blocks }
      └─ OcrBlock { BlockType, Lines }
         └─ OcrLine { Text, BoundingBox?, Words }
            └─ OcrWord { Text, Confidence?, BoundingBox? }

result.ToText() — plain text; pages separated by \f (form feed).
result.ToJson(indented: true) — structured JSON.

JSON

{
  "pages": [
    {
      "pageNumber": 1,
      "width": 1377,
      "height": 2048,
      "blocks": [
        {
          "blockType": "paragraph",
          "lines": [
            {
              "text": "Hello world",
              "boundingBox": { "x": 50, "y": 100, "width": 300, "height": 30 },
              "words": [
                { "text": "Hello", "confidence": 0.98, "boundingBox": { "x": 50, "y": 100, "width": 120, "height": 30 } }
              ]
            }
          ]
        }
      ]
    }
  ]
}

Coordinates are in pixels of the image actually OCR'd (which may have been downscaled to fit the library's 2048 px maximum).

HTML report

vellum ocr file --html out.html produces a single self-contained HTML file:

Original page image on the left (inlined as base64 PNG).
Transparent SVG rectangles over every word; hovering highlights the matching sidebar entry and scrolls it into view.
Sidebar on the right with blocks → lines → words; hovering in either direction highlights the other.
Top-bar filter input that dims non-matching words everywhere.

Caveat: size is roughly page_count × image_size × 1.33 (base64 overhead). Fine for up to a few dozen pages; large PDFs can produce very large HTML files.

Searchable PDF

vellum ocr scanned.pdf -s scanned_searchable.pdf
vellum ocr scanned.pdf -s scanned_searchable.pdf -p 1-10

Writes a copy of the input PDF with invisible (fully-transparent) text layered over each word, so the file is selectable and searchable in any PDF viewer while still rendering visually identical to the original.

CLI reference

`vellum ocr <file> [options]`

Option	Description
`-o`, `--output-dir <dir>`	Directory for `_ocr.txt` / `_ocr.json` (default: same as input)
`--text`	Print extracted text to stdout instead of writing files
`-p`, `--pages <spec>`	Pages to OCR (PDF only): `1`, `1-10`, `1,3,5`, `1-5,10-12`
`--light`	Use the smaller / faster model
`-s`, `--searchable-pdf <p>`	Write a searchable PDF to the given path (PDF input only)
`--html <path>`	Write a self-contained interactive HTML report
`-v`, `--verbose`	Debug-level logging, including native library output

`vellum download [options]`

Copies the screen-ai component from Chrome's user-data directory into Vellum's cache. Falls back to Dropbox zip and then Omaha if Chrome isn't installed.

Option	Description
`--model-dir <dir>`	Override the cache location (default `%LOCALAPPDATA%/vellum` or `~/.local/share/vellum`)
`-v`, `--verbose`	Verbose logging

`vellum export [options]`

Packages the installed component as a portable zip for other machines.

Option	Description
`-o`, `--output <path>`	Output zip path (default `~/Dropbox/bin/screen-ai-{platform}.zip`)
`-v`, `--verbose`	Verbose logging

Threading and lifecycle

The native library spawns its own worker threads but does not document concurrent PerformOCR safety. Vellum's LazyOcrEngine serialises calls under a SemaphoreSlim by default (VellumOptions.SerializeCalls = true). Concurrent requests queue up in FIFO order; they do not crash, but throughput is effectively one OCR at a time per process. The async methods (OcrAsync, OcrBitmapAsync, …) use WaitAsync + Task.Run, so queued requests do not tie up thread-pool threads. Scale horizontally (multiple processes behind a load balancer) when one process isn't enough.
The native library keeps process-global state. Do not create and dispose multiple ScreenAI instances in the same process — the second InitOCR will crash. Use AddVellumOcr() (singleton) in DI.
Max supported image dimension is 2048 px. Larger images are downscaled before OCR.

Building from source

git clone https://github.com/<you>/vellum.git
cd vellum
dotnet build
dotnet test
dotnet pack -c Release           # → Ocr.Vellum.*.nupkg + Ocr.Vellum.Cli.*.nupkg

Run the CLI from source:

dotnet run --project src/Vellum.Cli -- ocr sample.pdf

PDF rasterisation comes from the PDF2SVG.PopplerCairo.Bindings NuGet package (Poppler/Cairo-backed, ships prebuilt Win-x64 / Linux-x64 natives). It's pulled in as a transitive dependency — nothing to clone, nothing to build.

How it works

Interop — NativeLibrary.Load resolves the exported C functions; delegate* unmanaged[Cdecl] function pointers call them without marshalling.
SkBitmap layout — PerformOCR accepts a C++ SkBitmap&; Vellum rebuilds its 56-byte memory layout (and a 104-byte fake SkPixelRef to pass the null check) as [StructLayout(LayoutKind.Explicit)] structs. The exact offsets were reverse-engineered empirically by the Python project; see CHROME_SCREEN_AI_DLL.md upstream.
Model files — the DLL reads models via host-provided callbacks; Vellum supplies them from whatever directory the component lives in.
Protobuf — PerformOCR returns a serialised chrome_screen_ai.VisualAnnotation message; Vellum decodes it directly with a ~150-line wire-format parser, no .proto compilation step.
PDF rasterisation — pages are rendered to PNG via pdf2svg_poppler_cairo then decoded into SKBitmap before OCR.

Project layout

ocr_playground/
├── src/
│   ├── Vellum/                       # library (NuGet: Ocr.Vellum)
│   │   ├── ScreenAI.cs               # public facade
│   │   ├── LazyOcrEngine.cs          # DI-friendly lazy wrapper
│   │   ├── IOcrEngine.cs             # shared abstraction
│   │   ├── VellumOptions.cs          # options for AddVellumOcr
│   │   ├── ServiceCollectionExtensions.cs
│   │   ├── Models/OcrModels.cs       # records
│   │   ├── Protobuf/VisualAnnotationParser.cs
│   │   ├── Interop/                  # SkBitmap structs, native bindings, Linux stubs
│   │   ├── Imaging/                  # SkiaSharp + pdf2svg wrappers
│   │   ├── Download/ComponentDownloader.cs
│   │   ├── Reporting/HtmlReportBuilder.cs
│   │   └── Platform/PlatformPaths.cs
│   ├── Vellum.Cli/                   # `vellum` global tool (NuGet: Ocr.Vellum.Cli)
│   └── native/chromium_stubs/        # C source for Linux link-time stubs
└── tests/Vellum.Tests/               # xUnit tests (no native DLL required)

License and attribution

Vellum is MIT-licensed, matching the upstream Python project.

Chrome screen-ai library (chrome_screen_ai.dll / libchromescreenai.so): © Google, distributed as a Chrome component. Not redistributed by this package. See Chromium's licenses for terms of use.
pdf2svg_poppler_cairo: Forevka/pdf2svg_poppler_cairo, bundling Poppler (GPL) and Cairo (LGPL) natives.

Product	Compatible and additional computed target framework versions.
.NET	net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

This package has no dependencies.

Version	Downloads	Last Updated
0.2.6	233	4/24/2026
0.2.5	105	4/24/2026
0.2.4	91	4/24/2026
0.2.3	100	4/23/2026
0.2.2	112	4/23/2026
0.2.1	102	4/23/2026
0.2.0	111	4/23/2026
0.1.3	105	4/23/2026
0.1.2	107	4/23/2026
0.1.0	120	4/23/2026

Total 1.2K

Current version 233

Per day average 26

ocr chrome screen-ai cli dotnet-tool

Original Python implementation (c) Sergio Correia, MIT. .NET port follows the MIT license.

Ocr.Vellum.Cli 0.2.6

Vellum

Features

Install

Library

CLI (global tool)

Screen AI binaries — one-time setup

Quick start

Library

CLI

ASP.NET Core / generic host

Output formats

OcrResult (model)

JSON

HTML report

Searchable PDF

CLI reference

vellum ocr <file> [options]

vellum download [options]

vellum export [options]

Threading and lifecycle

Building from source

How it works

Project layout

License and attribution

`OcrResult` (model)

`vellum ocr <file> [options]`

`vellum download [options]`

`vellum export [options]`