Ocr.Vellum 0.2.6

.NET 9.0

dotnet add package Ocr.Vellum --version 0.2.6

NuGet\Install-Package Ocr.Vellum -Version 0.2.6

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="Ocr.Vellum" Version="0.2.6" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="Ocr.Vellum" Version="0.2.6" />
                    

                            Directory.Packages.props

<PackageReference Include="Ocr.Vellum" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add Ocr.Vellum --version 0.2.6

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: Ocr.Vellum, 0.2.6"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package Ocr.Vellum@0.2.6

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=Ocr.Vellum&version=0.2.6
                    

                            Install as a Cake Addin

#tool nuget:?package=Ocr.Vellum&version=0.2.6
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Vellum

Chrome screen-ai OCR for .NET 9. Loads Google Chrome's built-in OCR engine (chrome_screen_ai.dll / libchromescreenai.so) directly from managed code and extracts text from PDFs and images — no browser window, no Tesseract, no cloud API.

A .NET port of sergiocorreia/clv-locro. NuGet: Ocr.Vellum (library) and Ocr.Vellum.Cli (global tool).

Note — the native Chrome OCR library is not redistributed. You need either a local Chrome install (the wrapper finds it automatically) or a one-time vellum download copy step. See Screen AI binaries below.

Features

Fast, local OCR via the same on-device engine Chrome ships for accessibility.
PDF and image input — .pdf, .jpg, .jpeg, .png, .webp, .bmp, .tiff, .tif, .gif.
Structured results — pages → blocks → lines → words, each with bounding box and confidence.
Searchable PDF output — overlay an invisible text layer on the original pages.
Interactive HTML report — single self-contained file with hoverable word boxes and a synchronised sidebar (AWS Textract-style viewer).
ASP.NET Core ready — services.AddVellumOcr() with lazy initialisation; your app starts even before the binaries are installed.
Cross-platform — win-x64 and linux-x64.

Install

Library

dotnet add package Ocr.Vellum

The namespace is Vellum — using Vellum; — the Ocr. prefix is just the NuGet package id (the plain Vellum id was already taken).

CLI (global tool)

dotnet tool install -g Ocr.Vellum.Cli

Exposes a vellum command with three subcommands: ocr, download, export.

Screen AI binaries — one-time setup

The Chrome OCR library (~33 MB DLL plus ~100–250 MB of TFLite models) is licensed to ship only with Chrome. Vellum does not redistribute it. Pick one of:

Option	Setup
A. Use Chrome in place	Install Chrome, open `chrome://components`, trigger Screen AI download. Vellum finds it.
B. Copy out of Chrome	`vellum download` — copies into `%LOCALAPPDATA%\vellum\<version>\` (Windows) or `~/.local/share/vellum/<version>/` (Linux).
C. Ship a portable zip	On a machine with the component installed, run `vellum export -o bundle.zip`. On target machines, drop it at `~/Dropbox/bin/screen-ai-{windows,linux}.zip` or pass `--zip`.

Discovery order used at first OCR call: Chrome → Vellum cache → Dropbox zip → Omaha server (currently restricted to Chrome).

Quick start

Library

using Vellum;

using var ai = new ScreenAI();               // auto-discovers the component
var result = ai.Ocr("invoice.pdf");

Console.WriteLine(result.ToText());           // plain text, pages joined by \f
File.WriteAllText("invoice.json", result.ToJson());

CLI

vellum ocr document.pdf                    # writes document_ocr.txt + document_ocr.json
vellum ocr photo.jpg --text                # pipe-friendly text to stdout
vellum ocr big.pdf -p 1-5,10                # specific pages
vellum ocr scan.png --light                 # smaller/faster model
vellum ocr doc.pdf -s doc_searchable.pdf    # invisible text overlay
vellum ocr doc.pdf --html doc.html          # interactive HTML report

Full page-spec grammar: 1, 1-10, 1,3,5, 1-5,10-12 — ranges and single pages, comma-separated.

ASP.NET Core / generic host

AddVellumOcr registers an IOcrEngine singleton. Startup succeeds even when the native binaries aren't installed yet — the DLL is only loaded on the first OCR call, which makes it safe to wire up and deploy to machines that will pick up Chrome / the model cache later.

using Vellum;

builder.Services.AddVellumOcr(options =>
{
    options.LightMode       = false;   // true = smaller, faster model
    options.AutoDownload    = true;    // copy from Chrome on first use if missing
    options.SerializeCalls  = true;    // (default) lock around PerformOCR for thread safety
    // options.ModelDir     = "/opt/screen_ai/140.20";   // pin explicitly if you want
});

Inject and use anywhere:

app.MapPost("/ocr", async (IOcrEngine ocr, IFormFile file, CancellationToken ct) =>
{
    await ocr.EnsureReadyAsync(ct);            // optional warm-up

    var tmp = Path.GetTempFileName();
    await using (var fs = File.Create(tmp))
        await file.CopyToAsync(fs, ct);

    try
    {
        // OcrAsync awaits the serialisation semaphore non-blockingly and
        // runs the CPU-bound OCR work on the thread pool — the request
        // thread is returned while the native DLL is busy.
        var result = await ocr.OcrAsync(tmp, ct: ct);
        return Results.Text(result.ToText());
    }
    finally { File.Delete(tmp); }
});

IsReady tells you whether the library is loaded; EnsureReadyAsync(ct) is a good fit for a health check or a BackgroundService warm-up.

Output formats

`OcrResult` (model)

OcrResult
└─ Pages: IReadOnlyList<OcrPage>
   └─ OcrPage { PageNumber, Width, Height, Blocks }
      └─ OcrBlock { BlockType, Lines }
         └─ OcrLine { Text, BoundingBox?, Words }
            └─ OcrWord { Text, Confidence?, BoundingBox? }

result.ToText() — plain text; pages separated by \f (form feed).
result.ToJson(indented: true) — structured JSON.

JSON

{
  "pages": [
    {
      "pageNumber": 1,
      "width": 1377,
      "height": 2048,
      "blocks": [
        {
          "blockType": "paragraph",
          "lines": [
            {
              "text": "Hello world",
              "boundingBox": { "x": 50, "y": 100, "width": 300, "height": 30 },
              "words": [
                { "text": "Hello", "confidence": 0.98, "boundingBox": { "x": 50, "y": 100, "width": 120, "height": 30 } }
              ]
            }
          ]
        }
      ]
    }
  ]
}

Coordinates are in pixels of the image actually OCR'd (which may have been downscaled to fit the library's 2048 px maximum).

HTML report

vellum ocr file --html out.html produces a single self-contained HTML file:

Original page image on the left (inlined as base64 PNG).
Transparent SVG rectangles over every word; hovering highlights the matching sidebar entry and scrolls it into view.
Sidebar on the right with blocks → lines → words; hovering in either direction highlights the other.
Top-bar filter input that dims non-matching words everywhere.

Caveat: size is roughly page_count × image_size × 1.33 (base64 overhead). Fine for up to a few dozen pages; large PDFs can produce very large HTML files.

Searchable PDF

vellum ocr scanned.pdf -s scanned_searchable.pdf
vellum ocr scanned.pdf -s scanned_searchable.pdf -p 1-10

Writes a copy of the input PDF with invisible (fully-transparent) text layered over each word, so the file is selectable and searchable in any PDF viewer while still rendering visually identical to the original.

CLI reference

`vellum ocr <file> [options]`

Option	Description
`-o`, `--output-dir <dir>`	Directory for `_ocr.txt` / `_ocr.json` (default: same as input)
`--text`	Print extracted text to stdout instead of writing files
`-p`, `--pages <spec>`	Pages to OCR (PDF only): `1`, `1-10`, `1,3,5`, `1-5,10-12`
`--light`	Use the smaller / faster model
`-s`, `--searchable-pdf <p>`	Write a searchable PDF to the given path (PDF input only)
`--html <path>`	Write a self-contained interactive HTML report
`-v`, `--verbose`	Debug-level logging, including native library output

`vellum download [options]`

Copies the screen-ai component from Chrome's user-data directory into Vellum's cache. Falls back to Dropbox zip and then Omaha if Chrome isn't installed.

Option	Description
`--model-dir <dir>`	Override the cache location (default `%LOCALAPPDATA%/vellum` or `~/.local/share/vellum`)
`-v`, `--verbose`	Verbose logging

`vellum export [options]`

Packages the installed component as a portable zip for other machines.

Option	Description
`-o`, `--output <path>`	Output zip path (default `~/Dropbox/bin/screen-ai-{platform}.zip`)
`-v`, `--verbose`	Verbose logging

Threading and lifecycle

The native library spawns its own worker threads but does not document concurrent PerformOCR safety. Vellum's LazyOcrEngine serialises calls under a SemaphoreSlim by default (VellumOptions.SerializeCalls = true). Concurrent requests queue up in FIFO order; they do not crash, but throughput is effectively one OCR at a time per process. The async methods (OcrAsync, OcrBitmapAsync, …) use WaitAsync + Task.Run, so queued requests do not tie up thread-pool threads. Scale horizontally (multiple processes behind a load balancer) when one process isn't enough.
The native library keeps process-global state. Do not create and dispose multiple ScreenAI instances in the same process — the second InitOCR will crash. Use AddVellumOcr() (singleton) in DI.
Max supported image dimension is 2048 px. Larger images are downscaled before OCR.

Building from source

git clone https://github.com/<you>/vellum.git
cd vellum
dotnet build
dotnet test
dotnet pack -c Release           # → Ocr.Vellum.*.nupkg + Ocr.Vellum.Cli.*.nupkg

Run the CLI from source:

dotnet run --project src/Vellum.Cli -- ocr sample.pdf

PDF rasterisation comes from the PDF2SVG.PopplerCairo.Bindings NuGet package (Poppler/Cairo-backed, ships prebuilt Win-x64 / Linux-x64 natives). It's pulled in as a transitive dependency — nothing to clone, nothing to build.

How it works

Interop — NativeLibrary.Load resolves the exported C functions; delegate* unmanaged[Cdecl] function pointers call them without marshalling.
SkBitmap layout — PerformOCR accepts a C++ SkBitmap&; Vellum rebuilds its 56-byte memory layout (and a 104-byte fake SkPixelRef to pass the null check) as [StructLayout(LayoutKind.Explicit)] structs. The exact offsets were reverse-engineered empirically by the Python project; see CHROME_SCREEN_AI_DLL.md upstream.
Model files — the DLL reads models via host-provided callbacks; Vellum supplies them from whatever directory the component lives in.
Protobuf — PerformOCR returns a serialised chrome_screen_ai.VisualAnnotation message; Vellum decodes it directly with a ~150-line wire-format parser, no .proto compilation step.
PDF rasterisation — pages are rendered to PNG via pdf2svg_poppler_cairo then decoded into SKBitmap before OCR.

Project layout

ocr_playground/
├── src/
│   ├── Vellum/                       # library (NuGet: Ocr.Vellum)
│   │   ├── ScreenAI.cs               # public facade
│   │   ├── LazyOcrEngine.cs          # DI-friendly lazy wrapper
│   │   ├── IOcrEngine.cs             # shared abstraction
│   │   ├── VellumOptions.cs          # options for AddVellumOcr
│   │   ├── ServiceCollectionExtensions.cs
│   │   ├── Models/OcrModels.cs       # records
│   │   ├── Protobuf/VisualAnnotationParser.cs
│   │   ├── Interop/                  # SkBitmap structs, native bindings, Linux stubs
│   │   ├── Imaging/                  # SkiaSharp + pdf2svg wrappers
│   │   ├── Download/ComponentDownloader.cs
│   │   ├── Reporting/HtmlReportBuilder.cs
│   │   └── Platform/PlatformPaths.cs
│   ├── Vellum.Cli/                   # `vellum` global tool (NuGet: Ocr.Vellum.Cli)
│   └── native/chromium_stubs/        # C source for Linux link-time stubs
└── tests/Vellum.Tests/               # xUnit tests (no native DLL required)

License and attribution

Vellum is MIT-licensed, matching the upstream Python project.

Chrome screen-ai library (chrome_screen_ai.dll / libchromescreenai.so): © Google, distributed as a Chrome component. Not redistributed by this package. See Chromium's licenses for terms of use.
pdf2svg_poppler_cairo: Forevka/pdf2svg_poppler_cairo, bundling Poppler (GPL) and Cairo (LGPL) natives.

Product	Compatible and additional computed target framework versions.
.NET	net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net9.0
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 9.0.0)
- Microsoft.Extensions.Logging.Abstractions (>= 9.0.0)
- PDF2SVG.PopplerCairo.Bindings (>= 1.0.2)
- PdfSharpCore (>= 1.3.65)
- SkiaSharp (>= 3.116.1)
- SkiaSharp.NativeAssets.Linux (>= 3.116.1)
- SkiaSharp.NativeAssets.Win32 (>= 3.116.1)

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
0.2.6	387	4/24/2026
0.2.5	108	4/24/2026
0.2.4	106	4/24/2026
0.2.3	112	4/23/2026
0.2.2	126	4/23/2026
0.2.1	135	4/23/2026
0.2.0	107	4/23/2026
0.1.3	108	4/23/2026
0.1.2	115	4/23/2026
0.1.0	117	4/23/2026

Total 1.4K

Current version 387

Per day average 29

ocr chrome screen-ai skia pdf tesseract-alternative

Original Python implementation (c) Sergio Correia, MIT. .NET port follows the MIT license.

Ocr.Vellum 0.2.6

Vellum

Features

Install

Library

CLI (global tool)

Screen AI binaries — one-time setup

Quick start

Library

CLI

ASP.NET Core / generic host

Output formats

OcrResult (model)

JSON

HTML report

Searchable PDF

CLI reference

vellum ocr <file> [options]

vellum download [options]

vellum export [options]

Threading and lifecycle

Building from source

How it works

Project layout

License and attribution

net9.0

NuGet packages

GitHub repositories

`OcrResult` (model)

`vellum ocr <file> [options]`

`vellum download [options]`

`vellum export [options]`