IronOcr 2021.9.0

Install-Package IronOcr -Version 2021.9.0
dotnet add package IronOcr --version 2021.9.0
<PackageReference Include="IronOcr" Version="2021.9.0" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add IronOcr --version 2021.9.0
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
#r "nuget: IronOcr, 2021.9.0"
#r directive can be used in F# Interactive, C# scripting and .NET Interactive. Copy this into the interactive tool or source code of the script to reference the package.
// Install IronOcr as a Cake Addin
#addin nuget:?package=IronOcr&version=2021.9.0

// Install IronOcr as a Cake Tool
#tool nuget:?package=IronOcr&version=2021.9.0
The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Passing Nuget Downloads Support

IronOCR is an advanced OCR (Optical Character Recognition) library for C# and .NET

IronOCR provides Tesseract OCR on Mac, Windows, Linux, Azure and Docker for:

  • .Net Framework 4.0 +
  • .Net Standard 2.0 +
  • .Net Core 2.0 +
  • .Net 5
  • Mono for MacOS and Linux
  • Xamarin for MacOS

IronOCR reads Text, Barcodes & QR from all major image and PDF formats using the latest Tesseract 5 engine. This library adds OCR functionality to Desktop, Console and Web applications in minutes.

IronOCR's Unique Features

  • Pure .Net OCR API
  • All OCR tasks run locally (no SAAS)
  • 125 languages
  • Barcode & QR Code reading
  • Corrects low quality, noisy and distorted scans
  • Performance tuned above and beyond any other known build of Tesseract OCR.
  • Reads PDFs and multi-page TIFFs
  • Can save any OCR Scan to a searchable PDF document or XHTML

Data Output Options Include

Output Plain Text, Barcode Data and an OCR Result class containing paragraphs, lines, words, and characters.

International Language Support

125 Languages supported including Arabic, Chinese, English, Finnish, French, German, Hebrew, Italian, Japanese, Korean, Portuguese, Russian, Spanish... Custom language packs can also be created.

Licensing & Support available

For code examples, documentation & more visit http://ironsoftware.com/csharp/ocr/ Email: developers@ironsoftware.com

Get Started Code Example

string Result = new IronOcr.IronTesseract().Read("scan.pdf").Text;

Why C# developers choose IronOCR over Vanilla Tesseract:

  • Achieve 99.8%+ OCR accuracy without using external web services.
  • Includes for Tesseract 5 , 4 and 3 Engines out of the box.
  • Blazing Speed and MultiThreading
  • MVC, WebApp, Desktop, Console & Server Application compatible
  • No Exes or C++ code to work with
  • Full PDF OCR support
  • To perform OCR an almost any Image file or PDF
  • Full .Net Core, Standard and FrameWork support
  • Deploy on Windows, Mac, Linux, Azure, Docker, Lambda, AWS
  • Read barcodes and QR codes
  • Export OCR as to XHTML
  • Export OCR to searchable PDF documents
  • Multithreading support
  • 125 international languages all managed via Nuget or OcrData files
  • Extract Images, Coordinates, Statistics and Fonts. Not just text.
  • Can be used to redistribute Tesseract OCR inside commercial & proprietary applications.
  • Supports: windows Linux Mac Azure AWS Docker

IronOCR shines when working with real world images and imperfect documents such as photographs, or scans of low resolution which may have digital noise or imperfections. Other free OCR libraries for the .NET platform such other .Net Tesseract APIs and web services do not perform so well on these real world use cases.  

OCR with Tesseract 5 - Start Coding in C#

These code examples below shows how easy it is to read text from an image using C# or VB .NET.

Configurable Hello World

using IronOcr;

var Ocr = new IronTesseract();
using (var Input = new OcrInput()){
    Input.AddImage("images/sample.jpeg")
    //... you can add any number of images
    var Result = Ocr.Read(Input);
    Console.WriteLine(Result.Text);
}

 

C# PDF OCR

The same approach can similarly be used to extract text from any PDF document.

var Ocr = new IronTesseract();
using (var input = new OcrInput())
{

    input.AddPdf("example.pdf","password");
    // We can also select specific PDF page numnbers to OCR
        
    var Result = Ocr.Read(input);

    Console.WriteLine(Result.Text);
    Console.WriteLine($"{Result.Pages.Count()} Pages");
    // 1 page for every page of the PDF
}

 

C# OCR Working Code Examples

NuGet packages (131)

Showing the top 5 NuGet packages that depend on IronOcr:

Package Downloads
IronOcr.Languages.Hebrew

The IronOCR engine adds OCR (Optical Character Recognition) functionality to Web, Desktop, and Console applications. IronOCR reads Barcode and QR codes. Ocr Dictionaries in this package: * Hebrew * HebrewBest * HebrewFast * HebrewAlphabet * HebrewAlphabetBest * HebrewAlphabetFast ==================================== OCR בשפה העברית ב- C# & .NET. אופטימיזציה של C# Tesseract 5 OCR בנפרד .NET OCR API. ממיר מסמכים, תמונות ו- PDF לסורק לטקסט. דוגמאות C# ו- VB: https://ironsoftware.com/csharp/ocr/languages/ ==================================== This package installs IronOCR and also Hebrew support including: * Hebrew (also known as עברית) OCR for screenshots, cameras, images files, tiffs and PDFs in .NET * Custom OCR that can significantly out-perform Tesseract CLI on real world documents * Can read scans with distortion, skewing, low resolution & contrast, and digital noise * Also supports Tesseract 3, 4 and 5 in Hebrew * Support for 125 total international languages available Additional Features Include: * Barcode & QR Reading * Output of searchable, search-engine indexable PDF documents * Inspect fonts, headings, paragraphs, lines, words, and characters as structured data Supports: * .NET Framework (4.5+) * .NET Core (2.0+) * .NET Standard (2.0+) Works on: * Windows * MacOS * Linux * Docker * Azure and other Cloud hosting platforms * Web, Console, WinForms, WPF and Services Reads: - Images - TIFFS - PDFs - Screenshots - Scans - Barcodes - QR codes Commercial support available. Email: developers@ironsoftware.com C# & VB Examples: https://ironsoftware.com/csharp/ocr/languages/

IronOcr.Languages.ChineseTraditional

Simplified Chinese Language Pack for the Iron OCR C# & VB.Net library. The OCR engine adds OCR functionality to Desktop, Console and Web applications. IronOCR reads Barcode and QR codes. IronOCR supports Console Applications, ASP.NET Web Applications, MVC, and Desktop Applications written in all .Net languages. The Library preprocesses images to help read scans with low resolution & contrast, distortion, and heavy background noise. Output can be in plain text or through the advanced object model to extract headings, paragraphs, lines, words, and characters from a page's content. Other language packs and C# / VB.net code examples available at http://ironsoftware.com/csharp/ocr/ Product & licensing support on email at developers@ironsoftware.com

IronOcr.Languages.Arabic

The IronOCR engine adds OCR (Optical Character Recognition) functionality to Web, Desktop, and Console applications. IronOCR reads Barcode and QR codes. Ocr Dictionaries in this package: * Arabic * ArabicBest * ArabicFast * ArabicAlphabet * ArabicAlphabetBest * ArabicAlphabetFast ==================================== OCR للغة العربية في C# & .NET. محسن C# Tesseract 5 OCR في .NET OCR API مستقل. يحول مستندات الماسح الضوئي والصور و PDF إلى نص. أمثلة على C# و VB: https://ironsoftware.com/csharp/ocr/languages/ ==================================== This package installs IronOCR and also Arabic support including: * Arabic (also known as العربية) OCR for screenshots, cameras, images files, tiffs and PDFs in .NET * Custom OCR that can significantly out-perform Tesseract CLI on real world documents * Can read scans with distortion, skewing, low resolution & contrast, and digital noise * Also supports Tesseract 3, 4 and 5 in Arabic * Support for 125 total international languages available Additional Features Include: * Barcode & QR Reading * Output of searchable, search-engine indexable PDF documents * Inspect fonts, headings, paragraphs, lines, words, and characters as structured data Supports: * .NET Framework (4.5+) * .NET Core (2.0+) * .NET Standard (2.0+) Works on: * Windows * MacOS * Linux * Docker * Azure and other Cloud hosting platforms * Web, Console, WinForms, WPF and Services Reads: - Images - TIFFS - PDFs - Screenshots - Scans - Barcodes - QR codes Commercial support available. Email: developers@ironsoftware.com C# & VB Examples: https://ironsoftware.com/csharp/ocr/languages/

IronOcr.Languages.Japanese

The IronOCR engine adds OCR (Optical Character Recognition) functionality to Web, Desktop, and Console applications. IronOCR reads Barcode and QR codes. Ocr Dictionaries in this package: * JapaneseAlphabet * JapaneseAlphabetBest * JapaneseAlphabetFast * JapaneseVerticalAlphabet * JapaneseVerticalAlphabetBest * JapaneseVerticalAlphabetFast * Japanese * JapaneseBest * JapaneseFast * JapaneseVertical * JapaneseVerticalBest * JapaneseVerticalFast ==================================== C#および.NETの日本語OCR。スタンドアロンの.NETOCR APIで最適化されたC#Tesseract 5OCR。スキャナーのドキュメント、画像、PDFをテキストに変換します。 C#とVBの例:https://ironsoftware.com/csharp/ocr/languages/ ==================================== This package installs IronOCR and also Japanese support including: * Japanese (also known as 日本語 (にほんご)) OCR for screenshots, cameras, images files, tiffs and PDFs in .NET * Custom OCR that can significantly out-perform Tesseract CLI on real world documents * Can read scans with distortion, skewing, low resolution & contrast, and digital noise * Also supports Tesseract 3, 4 and 5 in Japanese * Support for 125 total international languages available Additional Features Include: * Barcode & QR Reading * Output of searchable, search-engine indexable PDF documents * Inspect fonts, headings, paragraphs, lines, words, and characters as structured data Supports: * .NET Framework (4.5+) * .NET Core (2.0+) * .NET Standard (2.0+) Works on: * Windows * MacOS * Linux * Docker * Azure and other Cloud hosting platforms * Web, Console, WinForms, WPF and Services Reads: - Images - TIFFS - PDFs - Screenshots - Scans - Barcodes - QR codes Commercial support available. Email: developers@ironsoftware.com C# & VB Examples: https://ironsoftware.com/csharp/ocr/languages/

IronOcr.Languages.Portuguese

The IronOCR engine adds OCR (Optical Character Recognition) functionality to Web, Desktop, and Console applications. IronOCR reads Barcode and QR codes. Ocr Dictionaries in this package: * Portuguese * PortugueseBest * PortugueseFast ==================================== OCR em português em C# e .NET. OCR C# Tesseract 5 otimizado em uma API .NET OCR independente. Converte documentos do scanner, imagens e PDF em texto. Exemplos C# e VB: https://ironsoftware.com/csharp/ocr/languages/ ==================================== This package installs IronOCR and also Portuguese support including: * Portuguese (also known as Português) OCR for screenshots, cameras, images files, tiffs and PDFs in .NET * Custom OCR that can significantly out-perform Tesseract CLI on real world documents * Can read scans with distortion, skewing, low resolution & contrast, and digital noise * Also supports Tesseract 3, 4 and 5 in Portuguese * Support for 125 total international languages available Additional Features Include: * Barcode & QR Reading * Output of searchable, search-engine indexable PDF documents * Inspect fonts, headings, paragraphs, lines, words, and characters as structured data Supports: * .NET Framework (4.5+) * .NET Core (2.0+) * .NET Standard (2.0+) Works on: * Windows * MacOS * Linux * Docker * Azure and other Cloud hosting platforms * Web, Console, WinForms, WPF and Services Reads: - Images - TIFFS - PDFs - Screenshots - Scans - Barcodes - QR codes Commercial support available. Email: developers@ironsoftware.com C# & VB Examples: https://ironsoftware.com/csharp/ocr/languages/

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
2021.9.0 7,348 8/24/2021
2021.6.0 8,991 6/24/2021
2021.2.1 19,657 2/24/2021
2020.12.2 6,578 12/14/2020
2020.11.2 12,871 11/13/2020
4.4.0 163,362 6/21/2018
4.3.0.1 17,362 4/9/2018
4.2.2.51 3,072 1/22/2018
4.2.2.1 2,073 12/1/2017
4.2.1.5 2,575 9/9/2017
4.1.1 2,834 8/4/2017
4.0.10 1,577 1/12/2017
4.0.9 932 12/20/2016

* Bug Fixed: Azure Function Compatibility
* Bug Fixed: Works with read-only OCR dictionaries
* Bug Fixed: Now works with .Net 5.07
* Bug Fixed: License Keys reading from project configuration files (edge casess)
* Feature:  Color replacement methods added to OcrInput
* Feature:  Load specfic frames from TIFF and PDF files
* Improved: Updated System.Drawing.Common