Zaiets.WebScraper.Lite 1.0.0

dotnet add package Zaiets.WebScraper.Lite --version 1.0.0
                    
NuGet\Install-Package Zaiets.WebScraper.Lite -Version 1.0.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Zaiets.WebScraper.Lite" Version="1.0.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Zaiets.WebScraper.Lite" Version="1.0.0" />
                    
Directory.Packages.props
<PackageReference Include="Zaiets.WebScraper.Lite" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Zaiets.WebScraper.Lite --version 1.0.0
                    
#r "nuget: Zaiets.WebScraper.Lite, 1.0.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package Zaiets.WebScraper.Lite@1.0.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=Zaiets.WebScraper.Lite&version=1.0.0
                    
Install as a Cake Addin
#tool nuget:?package=Zaiets.WebScraper.Lite&version=1.0.0
                    
Install as a Cake Tool

Zaiets.WebScraper.Lite

A lightweight, production-ready .NET 10 web scraping library with CSS selectors, XPath querying, rate limiting, automatic retry with exponential back-off, and proxy rotation.

NuGet License: MIT


Installation

dotnet add package Zaiets.WebScraper.Lite

Quick Start

using Zaiets.WebScraper.Lite;

// Create with defaults
using var scraper = new WebScraper();

var result = await scraper.FetchAsync("https://example.com");
if (result.IsSuccess)
{
    var parser = new HtmlParser(result.Document);
    Console.WriteLine(parser.Title());
    Console.WriteLine(parser.Text("h1"));
}

Fluent Builder

using var scraper = new ScraperBuilder()
    .WithRateLimit(TimeSpan.FromSeconds(2))   // min 2s between requests to same host
    .WithConcurrency(8)                        // up to 8 parallel requests
    .WithRetry(retryCount: 3, baseDelay: TimeSpan.FromSeconds(1))
    .WithProxies(
        "http://proxy1:8080",
        "http://proxy2:8080"
    )
    .WithUserAgent("MyBot/1.0")
    .WithHeader("Accept-Language", "en-US,en;q=0.9")
    .WithTimeout(TimeSpan.FromSeconds(15))
    .Build();

CSS Selectors

Powered by HtmlAgilityPack + Fizzler.

var result = await scraper.FetchAsync("https://news.ycombinator.com");
var parser = new HtmlParser(result.Document);

// Single element
string? title  = parser.Text("span.titleline > a");
string? href   = parser.Attr("span.titleline > a", "href");

// All matching elements
IReadOnlyList<string> headlines = parser.Texts("span.titleline > a");
IReadOnlyList<string> links     = parser.Attrs("a[href]", "href");

// Raw inner HTML
string? html = parser.Html(".comment");

// Page meta
string? description = parser.Meta("description");
string? ogTitle     = parser.Meta("og:title");

XPath Queries

// Single node
string? price = parser.XText("//span[@class='price']");

// All matching nodes
IReadOnlyList<string> prices = parser.XTexts("//span[@class='price']");

// Raw HAP node (for advanced use)
HtmlNode? node = parser.SelectSingleNode("//table[@id='results']//tr[1]");

Structured Extraction

Single page → one item

var item = await scraper.ExtractAsync(
    new Uri("https://example.com/product/123"),
    new Dictionary<string, string>
    {
        ["title"]       = "h1.product-title",
        ["price"]       = "span.price",
        ["description"] = "div.product-description",
        ["sku"]         = "span[itemprop='sku']",
    });

Console.WriteLine(item["title"]);
Console.WriteLine(item["price"]);

Single page → list of items

var products = await scraper.ExtractListAsync(
    new Uri("https://example.com/category/laptops"),
    rowSelector: "div.product-card",
    fieldSelectors: new Dictionary<string, string>
    {
        ["name"]  = "h2.name",
        ["price"] = "span.price",
        ["url"]   = "a[href]",
    });

foreach (var p in products)
    Console.WriteLine($"{p["name"]} — {p["price"]}");

Bulk Fetch

var urls = Enumerable.Range(1, 20)
    .Select(i => new Uri($"https://example.com/page/{i}"))
    .ToList();

BulkScrapeResult bulk = await scraper.FetchAllAsync(urls);

Console.WriteLine($"Scraped {bulk.Succeeded.Count}/{urls.Count} pages");
Console.WriteLine($"Success rate: {bulk.SuccessRate:P0}");
Console.WriteLine($"Total time: {bulk.TotalElapsed.TotalSeconds:F1}s");

foreach (var failed in bulk.Failed)
    Console.WriteLine($"FAILED {failed.Url} — {failed.Reason}");

var parser = new HtmlParser(result.Document);

// Absolute URIs, relative links resolved against page URL
IReadOnlyList<Uri> links  = parser.ExtractLinks(result.Url);
IReadOnlyList<Uri> images = parser.ExtractImages(result.Url);

Table Parsing

// Parse first <table> into rows of string[]
IReadOnlyList<string[]> rows = parser.ExtractTable("table.data-table", hasHeaders: true);

foreach (var row in rows)
    Console.WriteLine(string.Join(" | ", row));

BFS Web Crawler

using var scraper = new ScraperBuilder()
    .WithRateLimit(TimeSpan.FromSeconds(1))
    .Build();

var crawler = new PageCrawler(scraper);

var options = new CrawlOptions
{
    MaxDepth        = 3,
    MaxPages        = 200,
    MaxConcurrency  = 4,
    StayOnSameHost  = true,
    // optional custom filter
    LinkFilter = (link, seed) => !link.AbsolutePath.Contains("/login"),
};

await crawler.CrawlAsync(
    new Uri("https://example.com"),
    options,
    async page =>
    {
        var p = new HtmlParser(page.Document);
        Console.WriteLine($"[{(int)page.StatusCode}] {page.Url}  — {p.Title()}");
        await Task.CompletedTask;
    });

Dependency Injection (ASP.NET Core / Generic Host)

// Program.cs
builder.Services.AddWebScraper(o =>
{
    o.RateLimitDelay = TimeSpan.FromSeconds(1);
    o.RetryCount     = 3;
    o.MaxConcurrency = 8;
});

// Or with the fluent builder:
builder.Services.AddWebScraper(b => b
    .WithRateLimit(TimeSpan.FromSeconds(1))
    .WithRetry(3)
    .WithConcurrency(8));

Then inject WebScraper or PageCrawler normally:

public class ProductService(WebScraper scraper)
{
    public async Task<string?> GetTitleAsync(string url)
    {
        var result = await scraper.FetchAsync(url);
        return new HtmlParser(result.Document).Title();
    }
}

Proxy Rotation

using var scraper = new ScraperBuilder()
    .WithProxies(
        "http://p1.example.com:3128",
        "http://p2.example.com:3128",
        "socks5://p3.example.com:1080"
    )
    .Build();
// Proxies are rotated round-robin; failed proxies back off exponentially.

Retry Policy

By default the scraper retries up to 3 times with exponential back-off on:

  • HttpRequestException (network errors)
  • TaskCanceledException (timeouts)
  • HTTP 429, 500, 502, 503, 504

Customise:

var options = new ScraperOptions
{
    RetryCount     = 5,
    RetryBaseDelay = TimeSpan.FromSeconds(1),
    RetryableStatusCodes = new HashSet<int> { 429, 503 },
    HonorRetryAfterHeader = true,   // respects Retry-After response header
};

Rate Limiting

The rate limiter operates per host, so concurrent requests to different domains are not affected by each other.

// 500 ms between requests to the same host
var options = new ScraperOptions { RateLimitDelay = TimeSpan.FromMilliseconds(500) };

API Reference

Type Description
WebScraper Main scraper. FetchAsync, FetchAllAsync, ExtractAsync, ExtractListAsync.
HtmlParser CSS (QuerySelector, Text, Attr, Attrs, Html) and XPath (XText, XTexts, SelectNodes) queries. Also ExtractLinks, ExtractImages, ExtractTable, Title, Meta.
ScraperBuilder Fluent builder for WebScraper.
ScraperOptions All configuration properties.
PageCrawler BFS crawler with depth/page limits and link filtering.
CrawlOptions MaxDepth, MaxPages, StayOnSameHost, LinkFilter.
ScrapeResult URL, status code, raw HTML, parsed HtmlDocument, headers, elapsed time.
ScrapeItem Key/value fields extracted from a page.
BulkScrapeResult Succeeded / Failed lists + success rate.
ScraperException Thrown by EnsureSuccess() or on unrecoverable errors.

License

MIT © 2025 Vladyslav Zaiets

Product Compatible and additional computed target framework versions.
.NET net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.0.0 102 5/3/2026