MinerUSharp 0.1.2
dotnet add package MinerUSharp --version 0.1.2
NuGet\Install-Package MinerUSharp -Version 0.1.2
<PackageReference Include="MinerUSharp" Version="0.1.2" />
<PackageVersion Include="MinerUSharp" Version="0.1.2" />
<PackageReference Include="MinerUSharp" />
paket add MinerUSharp --version 0.1.2
#r "nuget: MinerUSharp, 0.1.2"
#:package MinerUSharp@0.1.2
#addin nuget:?package=MinerUSharp&version=0.1.2
#tool nuget:?package=MinerUSharp&version=0.1.2
MinerUSharp
A C# client library for the self-hosted MinerU API.
Requires a self-hosted MinerU API instance.
See the offical MinerU page for instructions on self-hosting or use the wrapper MinerUHost for automatic setup.
Installation
The easiest way to install and use the MinerUSharp library is through NuGet.
dotnet add package MinerUSharp
That's it! You've now installed MinerUSharp.
Don't forget to also install some way of hosting the MinerU API.
The official page has instructions. You can also use MinerUHost which automates both setup and process management for the underlying Mineru Python service. It is available as a NuGET package and as a prebuilt standalone application through GitHub Releases.
Usage
Basic Usage
Import the following namespaces:
using MinerUSharp;
using MinerUSharp.Models;
Then use it like this:
using MineruClient client = new MineruClient("http://localhost:8080");
using FileStream fileStream = File.OpenRead("document.pdf");
MineruRequest request = new MineruRequest
{
Files = new[] { fileStream },
LanguageList = new[] { "en", "ch" },
StartPageId = 1,
EndPageId = 10,
ReturnMarkdown = true,
};
using MineruResponse response = await client.ParseFileAsync(request);
string markdown = await response.ReadAsMarkdownAsync();
Fluent API
You can also use it with fluent API. This example does the same thing as the example above:
using MineruClient client = new MineruClient("http://localhost:8080");
using FileStream fileStream = File.OpenRead("document.pdf");
MineruRequest request = MineruRequest.Create(fileStream)
.WithLanguages("en", "ch")
.WithMarkdownResponse()
.WithPageRange(startPage: 1, endPage: 10)
.Build();
using MineruResponse response = await client.ParseFileAsync(request);
string markdown = await response.ReadAsMarkdownAsync();
"Advanced Usage"
The quotes are because it's not really that "advanced", but here are some more detailed code snippets:
Dependency Injection
// Program.cs or Startup.cs
services.AddMineruClient("http://localhost:8080");
// In your service
public class DocumentService
{
private readonly IMineruClient _client;
public DocumentService(IMineruClient client)
{
_client = client;
}
public async Task<string> ParseDocumentAsync(Stream documentStream)
{
MineruRequest request = MineruRequest.Create(documentStream)
.WithMarkdownResponse()
.Build();
using MineruResponse response = await _client.ParseFileAsync(request);
return await response.ReadAsMarkdownAsync();
}
}
Response Options
using MineruResponse response = await client.ParseFileAsync(request);
// Read as markdown (extracts md_content from the JSON response)
string markdown = await response.ReadAsMarkdownAsync();
// Read as strongly-typed response body
MineruResponseBody body = await response.ReadAsResponseBodyAsync();
string markdownFromFirstFile = body.Results["file0"].MarkdownContent;
// Read as raw JSON element
JsonElement json = await response.ReadAsJsonAsync();
// Read as bytes
byte[] bytes = await response.ReadAsBytesAsync();
// Save to file
await response.SaveToFileAsync("output.md");
// Get raw stream for custom processing
Stream stream = response.GetContentStream();
Additional Options
- The
MineruClientconstructor accepts an optionalHttpClientparameter for custom HTTP client configuration. - The
ParseFileAsyncmethod accepts an optionalCancellationTokenparameter for cancellation support.
The underlying MinerU Python API doesn't seem to correctly handle cancelled requests. It seems to continue processing them until they are finished, at least at the time of writing this. The cancellation can still be used to free up your own thread that's calling the Python process, but the Python process will still continue in the background until it's done.
using HttpClient httpClient = new HttpClient();
using MineruClient client = new MineruClient("http://localhost:8080", httpClient);
// The timeout of 5000 ms shown below is very short and not recommended for real scenarios, it's just an example
// that shows that a cancellation token can be sent. It should probably come from a user controlled source.
using CancellationTokenSource cts = new CancellationTokenSource(millisecondsDelay: 5000);
using MineruResponse response = await client.ParseFileAsync(request, cts.Token);
Requirements
- .NET 8.0 or later
- MinerU API server that can be accessed
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 8.0.0)
- Microsoft.Extensions.Http (>= 8.0.0)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.