Ocr.Vellum.Cli 0.2.6

dotnet tool install --global Ocr.Vellum.Cli --version 0.2.6
                    
This package contains a .NET tool you can call from the shell/command line.
dotnet new tool-manifest
                    
if you are setting up this repo
dotnet tool install --local Ocr.Vellum.Cli --version 0.2.6
                    
This package contains a .NET tool you can call from the shell/command line.
#tool dotnet:?package=Ocr.Vellum.Cli&version=0.2.6
                    
nuke :add-package Ocr.Vellum.Cli --version 0.2.6
                    

Vellum

Chrome screen-ai OCR for .NET 9. Loads Google Chrome's built-in OCR engine (chrome_screen_ai.dll / libchromescreenai.so) directly from managed code and extracts text from PDFs and images — no browser window, no Tesseract, no cloud API.

A .NET port of sergiocorreia/clv-locro. NuGet: Ocr.Vellum (library) and Ocr.Vellum.Cli (global tool).

Note — the native Chrome OCR library is not redistributed. You need either a local Chrome install (the wrapper finds it automatically) or a one-time vellum download copy step. See Screen AI binaries below.


Features

  • Fast, local OCR via the same on-device engine Chrome ships for accessibility.
  • PDF and image input.pdf, .jpg, .jpeg, .png, .webp, .bmp, .tiff, .tif, .gif.
  • Structured results — pages → blocks → lines → words, each with bounding box and confidence.
  • Searchable PDF output — overlay an invisible text layer on the original pages.
  • Interactive HTML report — single self-contained file with hoverable word boxes and a synchronised sidebar (AWS Textract-style viewer).
  • ASP.NET Core readyservices.AddVellumOcr() with lazy initialisation; your app starts even before the binaries are installed.
  • Cross-platformwin-x64 and linux-x64.

Install

Library

dotnet add package Ocr.Vellum

The namespace is Vellumusing Vellum; — the Ocr. prefix is just the NuGet package id (the plain Vellum id was already taken).

CLI (global tool)

dotnet tool install -g Ocr.Vellum.Cli

Exposes a vellum command with three subcommands: ocr, download, export.


Screen AI binaries — one-time setup

The Chrome OCR library (~33 MB DLL plus ~100–250 MB of TFLite models) is licensed to ship only with Chrome. Vellum does not redistribute it. Pick one of:

Option Setup
A. Use Chrome in place Install Chrome, open chrome://components, trigger Screen AI download. Vellum finds it.
B. Copy out of Chrome vellum download — copies into %LOCALAPPDATA%\vellum\<version>\ (Windows) or ~/.local/share/vellum/<version>/ (Linux).
C. Ship a portable zip On a machine with the component installed, run vellum export -o bundle.zip. On target machines, drop it at ~/Dropbox/bin/screen-ai-{windows,linux}.zip or pass --zip.

Discovery order used at first OCR call: Chrome → Vellum cache → Dropbox zip → Omaha server (currently restricted to Chrome).


Quick start

Library

using Vellum;

using var ai = new ScreenAI();               // auto-discovers the component
var result = ai.Ocr("invoice.pdf");

Console.WriteLine(result.ToText());           // plain text, pages joined by \f
File.WriteAllText("invoice.json", result.ToJson());

CLI

vellum ocr document.pdf                    # writes document_ocr.txt + document_ocr.json
vellum ocr photo.jpg --text                # pipe-friendly text to stdout
vellum ocr big.pdf -p 1-5,10                # specific pages
vellum ocr scan.png --light                 # smaller/faster model
vellum ocr doc.pdf -s doc_searchable.pdf    # invisible text overlay
vellum ocr doc.pdf --html doc.html          # interactive HTML report

Full page-spec grammar: 1, 1-10, 1,3,5, 1-5,10-12 — ranges and single pages, comma-separated.


ASP.NET Core / generic host

AddVellumOcr registers an IOcrEngine singleton. Startup succeeds even when the native binaries aren't installed yet — the DLL is only loaded on the first OCR call, which makes it safe to wire up and deploy to machines that will pick up Chrome / the model cache later.

using Vellum;

builder.Services.AddVellumOcr(options =>
{
    options.LightMode       = false;   // true = smaller, faster model
    options.AutoDownload    = true;    // copy from Chrome on first use if missing
    options.SerializeCalls  = true;    // (default) lock around PerformOCR for thread safety
    // options.ModelDir     = "/opt/screen_ai/140.20";   // pin explicitly if you want
});

Inject and use anywhere:

app.MapPost("/ocr", async (IOcrEngine ocr, IFormFile file, CancellationToken ct) =>
{
    await ocr.EnsureReadyAsync(ct);            // optional warm-up

    var tmp = Path.GetTempFileName();
    await using (var fs = File.Create(tmp))
        await file.CopyToAsync(fs, ct);

    try
    {
        // OcrAsync awaits the serialisation semaphore non-blockingly and
        // runs the CPU-bound OCR work on the thread pool — the request
        // thread is returned while the native DLL is busy.
        var result = await ocr.OcrAsync(tmp, ct: ct);
        return Results.Text(result.ToText());
    }
    finally { File.Delete(tmp); }
});

IsReady tells you whether the library is loaded; EnsureReadyAsync(ct) is a good fit for a health check or a BackgroundService warm-up.


Output formats

OcrResult (model)

OcrResult
└─ Pages: IReadOnlyList<OcrPage>
   └─ OcrPage { PageNumber, Width, Height, Blocks }
      └─ OcrBlock { BlockType, Lines }
         └─ OcrLine { Text, BoundingBox?, Words }
            └─ OcrWord { Text, Confidence?, BoundingBox? }
  • result.ToText() — plain text; pages separated by \f (form feed).
  • result.ToJson(indented: true) — structured JSON.

JSON

{
  "pages": [
    {
      "pageNumber": 1,
      "width": 1377,
      "height": 2048,
      "blocks": [
        {
          "blockType": "paragraph",
          "lines": [
            {
              "text": "Hello world",
              "boundingBox": { "x": 50, "y": 100, "width": 300, "height": 30 },
              "words": [
                { "text": "Hello", "confidence": 0.98, "boundingBox": { "x": 50, "y": 100, "width": 120, "height": 30 } }
              ]
            }
          ]
        }
      ]
    }
  ]
}

Coordinates are in pixels of the image actually OCR'd (which may have been downscaled to fit the library's 2048 px maximum).

HTML report

vellum ocr file --html out.html produces a single self-contained HTML file:

  • Original page image on the left (inlined as base64 PNG).
  • Transparent SVG rectangles over every word; hovering highlights the matching sidebar entry and scrolls it into view.
  • Sidebar on the right with blocks → lines → words; hovering in either direction highlights the other.
  • Top-bar filter input that dims non-matching words everywhere.

Caveat: size is roughly page_count × image_size × 1.33 (base64 overhead). Fine for up to a few dozen pages; large PDFs can produce very large HTML files.

Searchable PDF

vellum ocr scanned.pdf -s scanned_searchable.pdf
vellum ocr scanned.pdf -s scanned_searchable.pdf -p 1-10

Writes a copy of the input PDF with invisible (fully-transparent) text layered over each word, so the file is selectable and searchable in any PDF viewer while still rendering visually identical to the original.


CLI reference

vellum ocr <file> [options]

Option Description
-o, --output-dir <dir> Directory for *_ocr.txt / *_ocr.json (default: same as input)
--text Print extracted text to stdout instead of writing files
-p, --pages <spec> Pages to OCR (PDF only): 1, 1-10, 1,3,5, 1-5,10-12
--light Use the smaller / faster model
-s, --searchable-pdf <p> Write a searchable PDF to the given path (PDF input only)
--html <path> Write a self-contained interactive HTML report
-v, --verbose Debug-level logging, including native library output

vellum download [options]

Copies the screen-ai component from Chrome's user-data directory into Vellum's cache. Falls back to Dropbox zip and then Omaha if Chrome isn't installed.

Option Description
--model-dir <dir> Override the cache location (default %LOCALAPPDATA%/vellum or ~/.local/share/vellum)
-v, --verbose Verbose logging

vellum export [options]

Packages the installed component as a portable zip for other machines.

Option Description
-o, --output <path> Output zip path (default ~/Dropbox/bin/screen-ai-{platform}.zip)
-v, --verbose Verbose logging

Threading and lifecycle

  • The native library spawns its own worker threads but does not document concurrent PerformOCR safety. Vellum's LazyOcrEngine serialises calls under a SemaphoreSlim by default (VellumOptions.SerializeCalls = true). Concurrent requests queue up in FIFO order; they do not crash, but throughput is effectively one OCR at a time per process. The async methods (OcrAsync, OcrBitmapAsync, …) use WaitAsync + Task.Run, so queued requests do not tie up thread-pool threads. Scale horizontally (multiple processes behind a load balancer) when one process isn't enough.
  • The native library keeps process-global state. Do not create and dispose multiple ScreenAI instances in the same process — the second InitOCR will crash. Use AddVellumOcr() (singleton) in DI.
  • Max supported image dimension is 2048 px. Larger images are downscaled before OCR.

Building from source

git clone https://github.com/<you>/vellum.git
cd vellum
dotnet build
dotnet test
dotnet pack -c Release           # → Ocr.Vellum.*.nupkg + Ocr.Vellum.Cli.*.nupkg

Run the CLI from source:

dotnet run --project src/Vellum.Cli -- ocr sample.pdf

PDF rasterisation comes from the PDF2SVG.PopplerCairo.Bindings NuGet package (Poppler/Cairo-backed, ships prebuilt Win-x64 / Linux-x64 natives). It's pulled in as a transitive dependency — nothing to clone, nothing to build.


How it works

  • InteropNativeLibrary.Load resolves the exported C functions; delegate* unmanaged[Cdecl] function pointers call them without marshalling.
  • SkBitmap layoutPerformOCR accepts a C++ SkBitmap&; Vellum rebuilds its 56-byte memory layout (and a 104-byte fake SkPixelRef to pass the null check) as [StructLayout(LayoutKind.Explicit)] structs. The exact offsets were reverse-engineered empirically by the Python project; see CHROME_SCREEN_AI_DLL.md upstream.
  • Model files — the DLL reads models via host-provided callbacks; Vellum supplies them from whatever directory the component lives in.
  • ProtobufPerformOCR returns a serialised chrome_screen_ai.VisualAnnotation message; Vellum decodes it directly with a ~150-line wire-format parser, no .proto compilation step.
  • PDF rasterisation — pages are rendered to PNG via pdf2svg_poppler_cairo then decoded into SKBitmap before OCR.

Project layout

ocr_playground/
├── src/
│   ├── Vellum/                       # library (NuGet: Ocr.Vellum)
│   │   ├── ScreenAI.cs               # public facade
│   │   ├── LazyOcrEngine.cs          # DI-friendly lazy wrapper
│   │   ├── IOcrEngine.cs             # shared abstraction
│   │   ├── VellumOptions.cs          # options for AddVellumOcr
│   │   ├── ServiceCollectionExtensions.cs
│   │   ├── Models/OcrModels.cs       # records
│   │   ├── Protobuf/VisualAnnotationParser.cs
│   │   ├── Interop/                  # SkBitmap structs, native bindings, Linux stubs
│   │   ├── Imaging/                  # SkiaSharp + pdf2svg wrappers
│   │   ├── Download/ComponentDownloader.cs
│   │   ├── Reporting/HtmlReportBuilder.cs
│   │   └── Platform/PlatformPaths.cs
│   ├── Vellum.Cli/                   # `vellum` global tool (NuGet: Ocr.Vellum.Cli)
│   └── native/chromium_stubs/        # C source for Linux link-time stubs
└── tests/Vellum.Tests/               # xUnit tests (no native DLL required)

License and attribution

Vellum is MIT-licensed, matching the upstream Python project.

  • Python implementation and reverse-engineering notes: © Sergio Correia, sergiocorreia/clv-locro, MIT.
  • Chrome screen-ai library (chrome_screen_ai.dll / libchromescreenai.so): © Google, distributed as a Chrome component. Not redistributed by this package. See Chromium's licenses for terms of use.
  • pdf2svg_poppler_cairo: Forevka/pdf2svg_poppler_cairo, bundling Poppler (GPL) and Cairo (LGPL) natives.
Product Compatible and additional computed target framework versions.
.NET net9.0 is compatible.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

This package has no dependencies.

Version Downloads Last Updated
0.2.6 233 4/24/2026
0.2.5 105 4/24/2026
0.2.4 91 4/24/2026
0.2.3 100 4/23/2026
0.2.2 112 4/23/2026
0.2.1 102 4/23/2026
0.2.0 111 4/23/2026
0.1.3 105 4/23/2026
0.1.2 107 4/23/2026
0.1.0 120 4/23/2026