Zaiets.WebScraper.Lite 1.0.0

.NET 10.0

dotnet add package Zaiets.WebScraper.Lite --version 1.0.0

NuGet\Install-Package Zaiets.WebScraper.Lite -Version 1.0.0

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="Zaiets.WebScraper.Lite" Version="1.0.0" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="Zaiets.WebScraper.Lite" Version="1.0.0" />
                    

                            Directory.Packages.props

<PackageReference Include="Zaiets.WebScraper.Lite" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add Zaiets.WebScraper.Lite --version 1.0.0

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: Zaiets.WebScraper.Lite, 1.0.0"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package Zaiets.WebScraper.Lite@1.0.0

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=Zaiets.WebScraper.Lite&version=1.0.0
                    

                            Install as a Cake Addin

#tool nuget:?package=Zaiets.WebScraper.Lite&version=1.0.0
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Zaiets.WebScraper.Lite

A lightweight, production-ready .NET 10 web scraping library with CSS selectors, XPath querying, rate limiting, automatic retry with exponential back-off, and proxy rotation.

Installation

dotnet add package Zaiets.WebScraper.Lite

Quick Start

using Zaiets.WebScraper.Lite;

// Create with defaults
using var scraper = new WebScraper();

var result = await scraper.FetchAsync("https://example.com");
if (result.IsSuccess)
{
    var parser = new HtmlParser(result.Document);
    Console.WriteLine(parser.Title());
    Console.WriteLine(parser.Text("h1"));
}

Fluent Builder

using var scraper = new ScraperBuilder()
    .WithRateLimit(TimeSpan.FromSeconds(2))   // min 2s between requests to same host
    .WithConcurrency(8)                        // up to 8 parallel requests
    .WithRetry(retryCount: 3, baseDelay: TimeSpan.FromSeconds(1))
    .WithProxies(
        "http://proxy1:8080",
        "http://proxy2:8080"
    )
    .WithUserAgent("MyBot/1.0")
    .WithHeader("Accept-Language", "en-US,en;q=0.9")
    .WithTimeout(TimeSpan.FromSeconds(15))
    .Build();

CSS Selectors

var result = await scraper.FetchAsync("https://news.ycombinator.com");
var parser = new HtmlParser(result.Document);

// Single element
string? title  = parser.Text("span.titleline > a");
string? href   = parser.Attr("span.titleline > a", "href");

// All matching elements
IReadOnlyList<string> headlines = parser.Texts("span.titleline > a");
IReadOnlyList<string> links     = parser.Attrs("a[href]", "href");

// Raw inner HTML
string? html = parser.Html(".comment");

// Page meta
string? description = parser.Meta("description");
string? ogTitle     = parser.Meta("og:title");

XPath Queries

// Single node
string? price = parser.XText("//span[@class='price']");

// All matching nodes
IReadOnlyList<string> prices = parser.XTexts("//span[@class='price']");

// Raw HAP node (for advanced use)
HtmlNode? node = parser.SelectSingleNode("//table[@id='results']//tr[1]");

Structured Extraction

Single page → one item

var item = await scraper.ExtractAsync(
    new Uri("https://example.com/product/123"),
    new Dictionary<string, string>
    {
        ["title"]       = "h1.product-title",
        ["price"]       = "span.price",
        ["description"] = "div.product-description",
        ["sku"]         = "span[itemprop='sku']",
    });

Console.WriteLine(item["title"]);
Console.WriteLine(item["price"]);

Single page → list of items

var products = await scraper.ExtractListAsync(
    new Uri("https://example.com/category/laptops"),
    rowSelector: "div.product-card",
    fieldSelectors: new Dictionary<string, string>
    {
        ["name"]  = "h2.name",
        ["price"] = "span.price",
        ["url"]   = "a[href]",
    });

foreach (var p in products)
    Console.WriteLine($"{p["name"]} — {p["price"]}");

Bulk Fetch

var urls = Enumerable.Range(1, 20)
    .Select(i => new Uri($"https://example.com/page/{i}"))
    .ToList();

BulkScrapeResult bulk = await scraper.FetchAllAsync(urls);

Console.WriteLine($"Scraped {bulk.Succeeded.Count}/{urls.Count} pages");
Console.WriteLine($"Success rate: {bulk.SuccessRate:P0}");
Console.WriteLine($"Total time: {bulk.TotalElapsed.TotalSeconds:F1}s");

foreach (var failed in bulk.Failed)
    Console.WriteLine($"FAILED {failed.Url} — {failed.Reason}");

Link & Image Extraction

var parser = new HtmlParser(result.Document);

// Absolute URIs, relative links resolved against page URL
IReadOnlyList<Uri> links  = parser.ExtractLinks(result.Url);
IReadOnlyList<Uri> images = parser.ExtractImages(result.Url);

Table Parsing

// Parse first <table> into rows of string[]
IReadOnlyList<string[]> rows = parser.ExtractTable("table.data-table", hasHeaders: true);

foreach (var row in rows)
    Console.WriteLine(string.Join(" | ", row));

BFS Web Crawler

using var scraper = new ScraperBuilder()
    .WithRateLimit(TimeSpan.FromSeconds(1))
    .Build();

var crawler = new PageCrawler(scraper);

var options = new CrawlOptions
{
    MaxDepth        = 3,
    MaxPages        = 200,
    MaxConcurrency  = 4,
    StayOnSameHost  = true,
    // optional custom filter
    LinkFilter = (link, seed) => !link.AbsolutePath.Contains("/login"),
};

await crawler.CrawlAsync(
    new Uri("https://example.com"),
    options,
    async page =>
    {
        var p = new HtmlParser(page.Document);
        Console.WriteLine($"[{(int)page.StatusCode}] {page.Url}  — {p.Title()}");
        await Task.CompletedTask;
    });

Dependency Injection (ASP.NET Core / Generic Host)

// Program.cs
builder.Services.AddWebScraper(o =>
{
    o.RateLimitDelay = TimeSpan.FromSeconds(1);
    o.RetryCount     = 3;
    o.MaxConcurrency = 8;
});

// Or with the fluent builder:
builder.Services.AddWebScraper(b => b
    .WithRateLimit(TimeSpan.FromSeconds(1))
    .WithRetry(3)
    .WithConcurrency(8));

Then inject WebScraper or PageCrawler normally:

public class ProductService(WebScraper scraper)
{
    public async Task<string?> GetTitleAsync(string url)
    {
        var result = await scraper.FetchAsync(url);
        return new HtmlParser(result.Document).Title();
    }
}

Proxy Rotation

using var scraper = new ScraperBuilder()
    .WithProxies(
        "http://p1.example.com:3128",
        "http://p2.example.com:3128",
        "socks5://p3.example.com:1080"
    )
    .Build();
// Proxies are rotated round-robin; failed proxies back off exponentially.

Retry Policy

By default the scraper retries up to 3 times with exponential back-off on:

HttpRequestException (network errors)
TaskCanceledException (timeouts)
HTTP 429, 500, 502, 503, 504

Customise:

var options = new ScraperOptions
{
    RetryCount     = 5,
    RetryBaseDelay = TimeSpan.FromSeconds(1),
    RetryableStatusCodes = new HashSet<int> { 429, 503 },
    HonorRetryAfterHeader = true,   // respects Retry-After response header
};

Rate Limiting

The rate limiter operates per host, so concurrent requests to different domains are not affected by each other.

// 500 ms between requests to the same host
var options = new ScraperOptions { RateLimitDelay = TimeSpan.FromMilliseconds(500) };

API Reference

Type	Description
`WebScraper`	Main scraper. `FetchAsync`, `FetchAllAsync`, `ExtractAsync`, `ExtractListAsync`.
`HtmlParser`	CSS (`QuerySelector`, `Text`, `Attr`, `Attrs`, `Html`) and XPath (`XText`, `XTexts`, `SelectNodes`) queries. Also `ExtractLinks`, `ExtractImages`, `ExtractTable`, `Title`, `Meta`.
`ScraperBuilder`	Fluent builder for `WebScraper`.
`ScraperOptions`	All configuration properties.
`PageCrawler`	BFS crawler with depth/page limits and link filtering.
`CrawlOptions`	`MaxDepth`, `MaxPages`, `StayOnSameHost`, `LinkFilter`.
`ScrapeResult`	URL, status code, raw HTML, parsed `HtmlDocument`, headers, elapsed time.
`ScrapeItem`	Key/value fields extracted from a page.
`BulkScrapeResult`	Succeeded / Failed lists + success rate.
`ScraperException`	Thrown by `EnsureSuccess()` or on unrecoverable errors.

License

Product	Compatible and additional computed target framework versions.
.NET	net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net10.0
- Fizzler.Systems.HtmlAgilityPack (>= 1.2.1)
- HtmlAgilityPack (>= 1.11.74)
- Microsoft.Extensions.Logging.Abstractions (>= 10.0.7)
- Polly (>= 8.6.6)

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
1.0.0	102	5/3/2026