Toxy is a .NET data/text extraction framework similar to Apache Tika in Java. It supports a lot of popular formats such as docx, xlsx, xls, pdf, csv, txt, epub, html and so on.
The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page. Boilerpipe.Net is a port of the Java boilerpipe library.
Find and extract translatable strings from C# and F# sources
Finds localizable messages in *.fs and *.cs files by looking for calls such as I18n.Translate("message") in those sources. Puts unique messages into specified JSON file (updates it if neccessary). Class name, method name and other things are configurable