TikaOnDotnet.TextExtractor 1.17.1

Classes for running Apache Tika through **TikaOnDotNet**. Just use TextExtractor.Extract() and you'll be on your way.

Install-Package TikaOnDotnet.TextExtractor -Version 1.17.1
Release Notes

- Add new overloads to the `TextExtractor.Extract` allowing users to provide their own extraction result assemblers. Example:
public class CustomResult
public string Text { get; set; }
public IDictionary&lt;string, string[]&gt; Metadata { get; set; }
public static CustomResult CreateCustomResult(string text, Metadata metadata)
var metaDataDictionary = metadata.names().ToDictionary(name =&gt; name, metadata.getValues);
return new CustomResult
Metadata = metaDataDictionary,
Text = text,
public void should_extract_author_list_from_pdf()
var textExtractionResult = new TextExtractor().Extract("file_with_authors.pdf", CreateCustomResult);
textExtractionResult.Metadata["meta:author"].Should().ContainInOrder("Fred Jones, M. D.", "Donald Evans D. M.");

Package Downloads
Contribution project for Sitecore ContentSearch
An examine indexer that uses Apache TIKA
This package combine many open sources packages and allow one interface to read may types of content files. for example:use open.xml to read docx file
This package makes it possible to index and search a wide variety of filetypes in Umbraco, including .pdf and .docx
The wrapper library that provides smart extension methods to convert document formats to high quality text.

