Toxy is a .NET data/text extraction framework similar to Apache Tika in Java. It supports a lot of popular formats such as docx, xlsx, xls, pdf, csv, txt, epub, html and so on.

Install-Package Toxy -Version
dotnet add package Toxy --version
<PackageReference Include="Toxy" Version="" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Toxy --version
The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Release Notes

1. Update PDF extraction license with commercial one - TextSharp license
2. support .msg file extraction (only support windows platform)
3. support RTF extraction with html content

1. fix PDF extraction issue
2. fix some Word extraction issue
3. Excel, Word document streams are not closed after opening by WorkbookFactory

Showing the top 1 GitHub repositories that depend on Toxy:

Repository Stars
.NET based webcrawler

Version History

Version Downloads Last updated 14,331 3/5/2016
1.6.1 676 3/5/2016
1.4.0 2,300 3/9/2015