Ocr.Vellum 0.2.6

dotnet add package Ocr.Vellum --version 0.2.6
                    
NuGet\Install-Package Ocr.Vellum -Version 0.2.6
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Ocr.Vellum" Version="0.2.6" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Ocr.Vellum" Version="0.2.6" />
                    
Directory.Packages.props
<PackageReference Include="Ocr.Vellum" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Ocr.Vellum --version 0.2.6
                    
#r "nuget: Ocr.Vellum, 0.2.6"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package Ocr.Vellum@0.2.6
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=Ocr.Vellum&version=0.2.6
                    
Install as a Cake Addin
#tool nuget:?package=Ocr.Vellum&version=0.2.6
                    
Install as a Cake Tool

Vellum

Chrome screen-ai OCR for .NET 9. Loads Google Chrome's built-in OCR engine (chrome_screen_ai.dll / libchromescreenai.so) directly from managed code and extracts text from PDFs and images — no browser window, no Tesseract, no cloud API.

A .NET port of sergiocorreia/clv-locro. NuGet: Ocr.Vellum (library) and Ocr.Vellum.Cli (global tool).

Note — the native Chrome OCR library is not redistributed. You need either a local Chrome install (the wrapper finds it automatically) or a one-time vellum download copy step. See Screen AI binaries below.


Features

  • Fast, local OCR via the same on-device engine Chrome ships for accessibility.
  • PDF and image input.pdf, .jpg, .jpeg, .png, .webp, .bmp, .tiff, .tif, .gif.
  • Structured results — pages → blocks → lines → words, each with bounding box and confidence.
  • Searchable PDF output — overlay an invisible text layer on the original pages.
  • Interactive HTML report — single self-contained file with hoverable word boxes and a synchronised sidebar (AWS Textract-style viewer).
  • ASP.NET Core readyservices.AddVellumOcr() with lazy initialisation; your app starts even before the binaries are installed.
  • Cross-platformwin-x64 and linux-x64.

Install

Library

dotnet add package Ocr.Vellum

The namespace is Vellumusing Vellum; — the Ocr. prefix is just the NuGet package id (the plain Vellum id was already taken).

CLI (global tool)

dotnet tool install -g Ocr.Vellum.Cli

Exposes a vellum command with three subcommands: ocr, download, export.


Screen AI binaries — one-time setup

The Chrome OCR library (~33 MB DLL plus ~100–250 MB of TFLite models) is licensed to ship only with Chrome. Vellum does not redistribute it. Pick one of:

Option Setup
A. Use Chrome in place Install Chrome, open chrome://components, trigger Screen AI download. Vellum finds it.
B. Copy out of Chrome vellum download — copies into %LOCALAPPDATA%\vellum\<version>\ (Windows) or ~/.local/share/vellum/<version>/ (Linux).
C. Ship a portable zip On a machine with the component installed, run vellum export -o bundle.zip. On target machines, drop it at ~/Dropbox/bin/screen-ai-{windows,linux}.zip or pass --zip.

Discovery order used at first OCR call: Chrome → Vellum cache → Dropbox zip → Omaha server (currently restricted to Chrome).


Quick start

Library

using Vellum;

using var ai = new ScreenAI();               // auto-discovers the component
var result = ai.Ocr("invoice.pdf");

Console.WriteLine(result.ToText());           // plain text, pages joined by \f
File.WriteAllText("invoice.json", result.ToJson());

CLI

vellum ocr document.pdf                    # writes document_ocr.txt + document_ocr.json
vellum ocr photo.jpg --text                # pipe-friendly text to stdout
vellum ocr big.pdf -p 1-5,10                # specific pages
vellum ocr scan.png --light                 # smaller/faster model
vellum ocr doc.pdf -s doc_searchable.pdf    # invisible text overlay
vellum ocr doc.pdf --html doc.html          # interactive HTML report

Full page-spec grammar: 1, 1-10, 1,3,5, 1-5,10-12 — ranges and single pages, comma-separated.


ASP.NET Core / generic host

AddVellumOcr registers an IOcrEngine singleton. Startup succeeds even when the native binaries aren't installed yet — the DLL is only loaded on the first OCR call, which makes it safe to wire up and deploy to machines that will pick up Chrome / the model cache later.

using Vellum;

builder.Services.AddVellumOcr(options =>
{
    options.LightMode       = false;   // true = smaller, faster model
    options.AutoDownload    = true;    // copy from Chrome on first use if missing
    options.SerializeCalls  = true;    // (default) lock around PerformOCR for thread safety
    // options.ModelDir     = "/opt/screen_ai/140.20";   // pin explicitly if you want
});

Inject and use anywhere:

app.MapPost("/ocr", async (IOcrEngine ocr, IFormFile file, CancellationToken ct) =>
{
    await ocr.EnsureReadyAsync(ct);            // optional warm-up

    var tmp = Path.GetTempFileName();
    await using (var fs = File.Create(tmp))
        await file.CopyToAsync(fs, ct);

    try
    {
        // OcrAsync awaits the serialisation semaphore non-blockingly and
        // runs the CPU-bound OCR work on the thread pool — the request
        // thread is returned while the native DLL is busy.
        var result = await ocr.OcrAsync(tmp, ct: ct);
        return Results.Text(result.ToText());
    }
    finally { File.Delete(tmp); }
});

IsReady tells you whether the library is loaded; EnsureReadyAsync(ct) is a good fit for a health check or a BackgroundService warm-up.


Output formats

OcrResult (model)

OcrResult
└─ Pages: IReadOnlyList<OcrPage>
   └─ OcrPage { PageNumber, Width, Height, Blocks }
      └─ OcrBlock { BlockType, Lines }
         └─ OcrLine { Text, BoundingBox?, Words }
            └─ OcrWord { Text, Confidence?, BoundingBox? }
  • result.ToText() — plain text; pages separated by \f (form feed).
  • result.ToJson(indented: true) — structured JSON.

JSON

{
  "pages": [
    {
      "pageNumber": 1,
      "width": 1377,
      "height": 2048,
      "blocks": [
        {
          "blockType": "paragraph",
          "lines": [
            {
              "text": "Hello world",
              "boundingBox": { "x": 50, "y": 100, "width": 300, "height": 30 },
              "words": [
                { "text": "Hello", "confidence": 0.98, "boundingBox": { "x": 50, "y": 100, "width": 120, "height": 30 } }
              ]
            }
          ]
        }
      ]
    }
  ]
}

Coordinates are in pixels of the image actually OCR'd (which may have been downscaled to fit the library's 2048 px maximum).

HTML report

vellum ocr file --html out.html produces a single self-contained HTML file:

  • Original page image on the left (inlined as base64 PNG).
  • Transparent SVG rectangles over every word; hovering highlights the matching sidebar entry and scrolls it into view.
  • Sidebar on the right with blocks → lines → words; hovering in either direction highlights the other.
  • Top-bar filter input that dims non-matching words everywhere.

Caveat: size is roughly page_count × image_size × 1.33 (base64 overhead). Fine for up to a few dozen pages; large PDFs can produce very large HTML files.

Searchable PDF

vellum ocr scanned.pdf -s scanned_searchable.pdf
vellum ocr scanned.pdf -s scanned_searchable.pdf -p 1-10

Writes a copy of the input PDF with invisible (fully-transparent) text layered over each word, so the file is selectable and searchable in any PDF viewer while still rendering visually identical to the original.


CLI reference

vellum ocr <file> [options]

Option Description
-o, --output-dir <dir> Directory for *_ocr.txt / *_ocr.json (default: same as input)
--text Print extracted text to stdout instead of writing files
-p, --pages <spec> Pages to OCR (PDF only): 1, 1-10, 1,3,5, 1-5,10-12
--light Use the smaller / faster model
-s, --searchable-pdf <p> Write a searchable PDF to the given path (PDF input only)
--html <path> Write a self-contained interactive HTML report
-v, --verbose Debug-level logging, including native library output

vellum download [options]

Copies the screen-ai component from Chrome's user-data directory into Vellum's cache. Falls back to Dropbox zip and then Omaha if Chrome isn't installed.

Option Description
--model-dir <dir> Override the cache location (default %LOCALAPPDATA%/vellum or ~/.local/share/vellum)
-v, --verbose Verbose logging

vellum export [options]

Packages the installed component as a portable zip for other machines.

Option Description
-o, --output <path> Output zip path (default ~/Dropbox/bin/screen-ai-{platform}.zip)
-v, --verbose Verbose logging

Threading and lifecycle

  • The native library spawns its own worker threads but does not document concurrent PerformOCR safety. Vellum's LazyOcrEngine serialises calls under a SemaphoreSlim by default (VellumOptions.SerializeCalls = true). Concurrent requests queue up in FIFO order; they do not crash, but throughput is effectively one OCR at a time per process. The async methods (OcrAsync, OcrBitmapAsync, …) use WaitAsync + Task.Run, so queued requests do not tie up thread-pool threads. Scale horizontally (multiple processes behind a load balancer) when one process isn't enough.
  • The native library keeps process-global state. Do not create and dispose multiple ScreenAI instances in the same process — the second InitOCR will crash. Use AddVellumOcr() (singleton) in DI.
  • Max supported image dimension is 2048 px. Larger images are downscaled before OCR.

Building from source

git clone https://github.com/<you>/vellum.git
cd vellum
dotnet build
dotnet test
dotnet pack -c Release           # → Ocr.Vellum.*.nupkg + Ocr.Vellum.Cli.*.nupkg

Run the CLI from source:

dotnet run --project src/Vellum.Cli -- ocr sample.pdf

PDF rasterisation comes from the PDF2SVG.PopplerCairo.Bindings NuGet package (Poppler/Cairo-backed, ships prebuilt Win-x64 / Linux-x64 natives). It's pulled in as a transitive dependency — nothing to clone, nothing to build.


How it works

  • InteropNativeLibrary.Load resolves the exported C functions; delegate* unmanaged[Cdecl] function pointers call them without marshalling.
  • SkBitmap layoutPerformOCR accepts a C++ SkBitmap&; Vellum rebuilds its 56-byte memory layout (and a 104-byte fake SkPixelRef to pass the null check) as [StructLayout(LayoutKind.Explicit)] structs. The exact offsets were reverse-engineered empirically by the Python project; see CHROME_SCREEN_AI_DLL.md upstream.
  • Model files — the DLL reads models via host-provided callbacks; Vellum supplies them from whatever directory the component lives in.
  • ProtobufPerformOCR returns a serialised chrome_screen_ai.VisualAnnotation message; Vellum decodes it directly with a ~150-line wire-format parser, no .proto compilation step.
  • PDF rasterisation — pages are rendered to PNG via pdf2svg_poppler_cairo then decoded into SKBitmap before OCR.

Project layout

ocr_playground/
├── src/
│   ├── Vellum/                       # library (NuGet: Ocr.Vellum)
│   │   ├── ScreenAI.cs               # public facade
│   │   ├── LazyOcrEngine.cs          # DI-friendly lazy wrapper
│   │   ├── IOcrEngine.cs             # shared abstraction
│   │   ├── VellumOptions.cs          # options for AddVellumOcr
│   │   ├── ServiceCollectionExtensions.cs
│   │   ├── Models/OcrModels.cs       # records
│   │   ├── Protobuf/VisualAnnotationParser.cs
│   │   ├── Interop/                  # SkBitmap structs, native bindings, Linux stubs
│   │   ├── Imaging/                  # SkiaSharp + pdf2svg wrappers
│   │   ├── Download/ComponentDownloader.cs
│   │   ├── Reporting/HtmlReportBuilder.cs
│   │   └── Platform/PlatformPaths.cs
│   ├── Vellum.Cli/                   # `vellum` global tool (NuGet: Ocr.Vellum.Cli)
│   └── native/chromium_stubs/        # C source for Linux link-time stubs
└── tests/Vellum.Tests/               # xUnit tests (no native DLL required)

License and attribution

Vellum is MIT-licensed, matching the upstream Python project.

  • Python implementation and reverse-engineering notes: © Sergio Correia, sergiocorreia/clv-locro, MIT.
  • Chrome screen-ai library (chrome_screen_ai.dll / libchromescreenai.so): © Google, distributed as a Chrome component. Not redistributed by this package. See Chromium's licenses for terms of use.
  • pdf2svg_poppler_cairo: Forevka/pdf2svg_poppler_cairo, bundling Poppler (GPL) and Cairo (LGPL) natives.
Product Compatible and additional computed target framework versions.
.NET net9.0 is compatible.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.2.6 387 4/24/2026
0.2.5 108 4/24/2026
0.2.4 106 4/24/2026
0.2.3 112 4/23/2026
0.2.2 126 4/23/2026
0.2.1 135 4/23/2026
0.2.0 107 4/23/2026
0.1.3 108 4/23/2026
0.1.2 115 4/23/2026
0.1.0 117 4/23/2026