WebReaper.AI
11.1.1
dotnet add package WebReaper.AI --version 11.1.1
NuGet\Install-Package WebReaper.AI -Version 11.1.1
<PackageReference Include="WebReaper.AI" Version="11.1.1" />
<PackageVersion Include="WebReaper.AI" Version="11.1.1" />
<PackageReference Include="WebReaper.AI" />
paket add WebReaper.AI --version 11.1.1
#r "nuget: WebReaper.AI, 11.1.1"
#:package WebReaper.AI@11.1.1
#addin nuget:?package=WebReaper.AI&version=11.1.1
#tool nuget:?package=WebReaper.AI&version=11.1.1
WebReaper.AI
LLM extraction satellite for WebReaper. Bring your own model: OpenAI, Anthropic, Ollama, Azure OpenAI, llamafile, anything implementing IChatClient (Microsoft.Extensions.AI).
Install
dotnet add package WebReaper.AI
What's in this package
Per ADR-0044, ADR-0046, ADR-0047, ADR-0050, ADR-0051, ADR-0064, ADR-0067:
| Builder call | Purpose |
|---|---|
.WithLlmExtractor(chatClient) |
Pure-LLM IContentExtractor; bypasses the deterministic fold |
.WithLlmFallback(chatClient) |
Deterministic primary, LLM only when a field returns empty |
.WithLlmSelfHealing(chatClient) |
LLM proposes a repaired CSS selector once per (Schema, field), cached |
.WithLlmSchemaInferrer(chatClient) |
Synthesize a Schema from a URL with no hand-authored selectors |
.WithLlmAgentBrain(chatClient) |
LLM-powered IAgentBrain for the autonomous agent driver |
.WithLlmActionResolver(chatClient) |
Resolves PageAction.SemanticAct("click 'sign in'") to concrete arms |
.UseAi(chatClient) |
One-line aggregator wiring the recommended subset of the above |
Quick start
using Microsoft.Extensions.AI;
using OpenAI;
using WebReaper.AI;
using WebReaper.Builders;
IChatClient chatClient = new OpenAIClient("sk-...").AsChatClient("gpt-4o-mini");
var engine = await ScraperEngineBuilder
.Crawl("https://example.com")
.Extract(schema)
.WithLlmFallback(chatClient)
.WriteToJsonFile("out.jsonl")
.BuildAsync();
await engine.RunAsync();
The deterministic fold runs first. The LLM fires only when a field returns empty (a selector drifted, the page changed). Stable pages cost zero LLM calls.
Design
WebReaper.AI is a satellite package per ADR-0009: the LLM dependencies (Microsoft.Extensions.AI.Abstractions) stay quarantined off the dependency-light, Native-AOT-clean WebReaper core. The consumer's IChatClient makes the actual LLM call.
LlmCall<T> (the shared mechanism, ADR-0059) handles JSON-mode parsing with bounded retry, system-prompt caching (ADR-0065), and cost telemetry (ADR-0066).
See also
- Main repo: github.com/pavlovtech/WebReaper
- All AI features showcased in
Examples/WebReaper.AiNativeShowcase - Schema inference demos in
Examples/WebReaper.SchemaInferenceShowcase - License: MIT (ADR-0017)
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- Microsoft.Extensions.AI.Abstractions (>= 10.3.0)
- WebReaper (>= 11.1.1)
NuGet packages (1)
Showing the top 1 NuGet packages that depend on WebReaper.AI:
| Package | Downloads |
|---|---|
|
WebReaper.Mcp
MCP (Model Context Protocol) server satellite for WebReaper. Exposes scrape / map / extract as MCP tools over stdio for Cursor / Claude Desktop / Copilot Studio. Interop adapter; the primary agent surface remains the WebReaper CLI (ADR-0043). Satellite (ADR-0009); heavy MCP SDK deps quarantined; core stays AOT-clean. |
GitHub repositories
This package is not used by any popular GitHub repositories.
10.1.0 (minor): new public LlmToolArguments helper (TryGetString / TryGetInt for tool-call argument extraction). The Microsoft.Extensions.AI.Abstractions floor moves off preview to stable 10.3.0. The brain (13) and resolver (9) tool registries plus their parse dispatch are now derived views of one PageActionTools.Arms list (ADR-0078); the resolver gains the three new form-action tools (Fill, Press, ScrollIntoView). 10.0.1: NuGet metadata polish. Adds PackageReadmeFile so the README shows on the package page. Removes em-dashes from Description and release notes. No code changes. 10.0.0: initial release. LLM-backed IContentExtractor satellite (ADR-0044) bound to Microsoft.Extensions.AI.Abstractions (IChatClient). Bring your own model: OpenAI, Anthropic, Ollama, anything implementing IChatClient. Ships LlmContentExtractor (Markdown pre-clean by default, deterministic Temperature=0, MaxTokens=4096), a Schema-to-JSON-Schema bridge, and ScraperEngineBuilder.WithLlmExtractor. Also ships LlmSelectorRepairer + WithLlmSelfHealing for the SelfHealingContentExtractor (ADR-0047): LLM proposes selectors on a fold failure, deterministic re-run validates, per-crawl cache persists the patch. New satellite package per ADR-0009: the consumer's IChatClient makes the LLM call, so the satellite quarantines AI deps off the dependency-light, Native-AOT-clean core. Requires WebReaper 10.0.0.