TikaOnDotnet.TextExtractor 1.17.1

Classes for running Apache Tika through **TikaOnDotNet**. Just use TextExtractor.Extract() and you'll be on your way.

Install-Package TikaOnDotnet.TextExtractor -Version 1.17.1
dotnet add package TikaOnDotnet.TextExtractor --version 1.17.1
paket add TikaOnDotnet.TextExtractor --version 1.17.1
The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Release Notes

- Add new overloads to the `TextExtractor.Extract` allowing users to provide their own extraction result assemblers. Example:
```cs
public class CustomResult
{
public string Text { get; set; }
public IDictionary<string, string[]> Metadata { get; set; }
}
public static CustomResult CreateCustomResult(string text, Metadata metadata)
{
var metaDataDictionary = metadata.names().ToDictionary(name => name, metadata.getValues);
return new CustomResult
{
Metadata = metaDataDictionary,
Text = text,
};
}
[Test]
public void should_extract_author_list_from_pdf()
{
var textExtractionResult = new TextExtractor().Extract("file_with_authors.pdf", CreateCustomResult);
textExtractionResult.Metadata["meta:author"].Should().ContainInOrder("Fred Jones, M. D.", "Donald Evans D. M.");
}
```

Version History

Version Downloads Last updated
1.17.1 (current) 1,569 4/3/2018
1.17.0 2,349 2/15/2018
1.16.0 6,000 7/30/2017
1.15.0 111 7/30/2017
1.14.2 2,724 4/22/2017
1.14.2-pre 101 4/15/2017
1.14.1 4,688 1/13/2017
1.14.0 758 12/8/2016
1.13.1 1,680 8/16/2016
1.13.0 696 6/30/2016
1.12.2 722 4/12/2016
1.12.1 164 4/12/2016
1.12.0 193 4/11/2016