Zaiets.WebScraper.Lite
1.0.0
dotnet add package Zaiets.WebScraper.Lite --version 1.0.0
NuGet\Install-Package Zaiets.WebScraper.Lite -Version 1.0.0
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Zaiets.WebScraper.Lite" Version="1.0.0" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Zaiets.WebScraper.Lite" Version="1.0.0" />
<PackageReference Include="Zaiets.WebScraper.Lite" />
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Zaiets.WebScraper.Lite --version 1.0.0
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
#r "nuget: Zaiets.WebScraper.Lite, 1.0.0"
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package Zaiets.WebScraper.Lite@1.0.0
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=Zaiets.WebScraper.Lite&version=1.0.0
#tool nuget:?package=Zaiets.WebScraper.Lite&version=1.0.0
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
Zaiets.WebScraper.Lite
A lightweight, production-ready .NET 10 web scraping library with CSS selectors, XPath querying, rate limiting, automatic retry with exponential back-off, and proxy rotation.
Installation
dotnet add package Zaiets.WebScraper.Lite
Quick Start
using Zaiets.WebScraper.Lite;
// Create with defaults
using var scraper = new WebScraper();
var result = await scraper.FetchAsync("https://example.com");
if (result.IsSuccess)
{
var parser = new HtmlParser(result.Document);
Console.WriteLine(parser.Title());
Console.WriteLine(parser.Text("h1"));
}
Fluent Builder
using var scraper = new ScraperBuilder()
.WithRateLimit(TimeSpan.FromSeconds(2)) // min 2s between requests to same host
.WithConcurrency(8) // up to 8 parallel requests
.WithRetry(retryCount: 3, baseDelay: TimeSpan.FromSeconds(1))
.WithProxies(
"http://proxy1:8080",
"http://proxy2:8080"
)
.WithUserAgent("MyBot/1.0")
.WithHeader("Accept-Language", "en-US,en;q=0.9")
.WithTimeout(TimeSpan.FromSeconds(15))
.Build();
CSS Selectors
Powered by HtmlAgilityPack + Fizzler.
var result = await scraper.FetchAsync("https://news.ycombinator.com");
var parser = new HtmlParser(result.Document);
// Single element
string? title = parser.Text("span.titleline > a");
string? href = parser.Attr("span.titleline > a", "href");
// All matching elements
IReadOnlyList<string> headlines = parser.Texts("span.titleline > a");
IReadOnlyList<string> links = parser.Attrs("a[href]", "href");
// Raw inner HTML
string? html = parser.Html(".comment");
// Page meta
string? description = parser.Meta("description");
string? ogTitle = parser.Meta("og:title");
XPath Queries
// Single node
string? price = parser.XText("//span[@class='price']");
// All matching nodes
IReadOnlyList<string> prices = parser.XTexts("//span[@class='price']");
// Raw HAP node (for advanced use)
HtmlNode? node = parser.SelectSingleNode("//table[@id='results']//tr[1]");
Structured Extraction
Single page → one item
var item = await scraper.ExtractAsync(
new Uri("https://example.com/product/123"),
new Dictionary<string, string>
{
["title"] = "h1.product-title",
["price"] = "span.price",
["description"] = "div.product-description",
["sku"] = "span[itemprop='sku']",
});
Console.WriteLine(item["title"]);
Console.WriteLine(item["price"]);
Single page → list of items
var products = await scraper.ExtractListAsync(
new Uri("https://example.com/category/laptops"),
rowSelector: "div.product-card",
fieldSelectors: new Dictionary<string, string>
{
["name"] = "h2.name",
["price"] = "span.price",
["url"] = "a[href]",
});
foreach (var p in products)
Console.WriteLine($"{p["name"]} — {p["price"]}");
Bulk Fetch
var urls = Enumerable.Range(1, 20)
.Select(i => new Uri($"https://example.com/page/{i}"))
.ToList();
BulkScrapeResult bulk = await scraper.FetchAllAsync(urls);
Console.WriteLine($"Scraped {bulk.Succeeded.Count}/{urls.Count} pages");
Console.WriteLine($"Success rate: {bulk.SuccessRate:P0}");
Console.WriteLine($"Total time: {bulk.TotalElapsed.TotalSeconds:F1}s");
foreach (var failed in bulk.Failed)
Console.WriteLine($"FAILED {failed.Url} — {failed.Reason}");
Link & Image Extraction
var parser = new HtmlParser(result.Document);
// Absolute URIs, relative links resolved against page URL
IReadOnlyList<Uri> links = parser.ExtractLinks(result.Url);
IReadOnlyList<Uri> images = parser.ExtractImages(result.Url);
Table Parsing
// Parse first <table> into rows of string[]
IReadOnlyList<string[]> rows = parser.ExtractTable("table.data-table", hasHeaders: true);
foreach (var row in rows)
Console.WriteLine(string.Join(" | ", row));
BFS Web Crawler
using var scraper = new ScraperBuilder()
.WithRateLimit(TimeSpan.FromSeconds(1))
.Build();
var crawler = new PageCrawler(scraper);
var options = new CrawlOptions
{
MaxDepth = 3,
MaxPages = 200,
MaxConcurrency = 4,
StayOnSameHost = true,
// optional custom filter
LinkFilter = (link, seed) => !link.AbsolutePath.Contains("/login"),
};
await crawler.CrawlAsync(
new Uri("https://example.com"),
options,
async page =>
{
var p = new HtmlParser(page.Document);
Console.WriteLine($"[{(int)page.StatusCode}] {page.Url} — {p.Title()}");
await Task.CompletedTask;
});
Dependency Injection (ASP.NET Core / Generic Host)
// Program.cs
builder.Services.AddWebScraper(o =>
{
o.RateLimitDelay = TimeSpan.FromSeconds(1);
o.RetryCount = 3;
o.MaxConcurrency = 8;
});
// Or with the fluent builder:
builder.Services.AddWebScraper(b => b
.WithRateLimit(TimeSpan.FromSeconds(1))
.WithRetry(3)
.WithConcurrency(8));
Then inject WebScraper or PageCrawler normally:
public class ProductService(WebScraper scraper)
{
public async Task<string?> GetTitleAsync(string url)
{
var result = await scraper.FetchAsync(url);
return new HtmlParser(result.Document).Title();
}
}
Proxy Rotation
using var scraper = new ScraperBuilder()
.WithProxies(
"http://p1.example.com:3128",
"http://p2.example.com:3128",
"socks5://p3.example.com:1080"
)
.Build();
// Proxies are rotated round-robin; failed proxies back off exponentially.
Retry Policy
By default the scraper retries up to 3 times with exponential back-off on:
HttpRequestException(network errors)TaskCanceledException(timeouts)- HTTP 429, 500, 502, 503, 504
Customise:
var options = new ScraperOptions
{
RetryCount = 5,
RetryBaseDelay = TimeSpan.FromSeconds(1),
RetryableStatusCodes = new HashSet<int> { 429, 503 },
HonorRetryAfterHeader = true, // respects Retry-After response header
};
Rate Limiting
The rate limiter operates per host, so concurrent requests to different domains are not affected by each other.
// 500 ms between requests to the same host
var options = new ScraperOptions { RateLimitDelay = TimeSpan.FromMilliseconds(500) };
API Reference
| Type | Description |
|---|---|
WebScraper |
Main scraper. FetchAsync, FetchAllAsync, ExtractAsync, ExtractListAsync. |
HtmlParser |
CSS (QuerySelector, Text, Attr, Attrs, Html) and XPath (XText, XTexts, SelectNodes) queries. Also ExtractLinks, ExtractImages, ExtractTable, Title, Meta. |
ScraperBuilder |
Fluent builder for WebScraper. |
ScraperOptions |
All configuration properties. |
PageCrawler |
BFS crawler with depth/page limits and link filtering. |
CrawlOptions |
MaxDepth, MaxPages, StayOnSameHost, LinkFilter. |
ScrapeResult |
URL, status code, raw HTML, parsed HtmlDocument, headers, elapsed time. |
ScrapeItem |
Key/value fields extracted from a page. |
BulkScrapeResult |
Succeeded / Failed lists + success rate. |
ScraperException |
Thrown by EnsureSuccess() or on unrecoverable errors. |
License
MIT © 2025 Vladyslav Zaiets
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
-
net10.0
- Fizzler.Systems.HtmlAgilityPack (>= 1.2.1)
- HtmlAgilityPack (>= 1.11.74)
- Microsoft.Extensions.Logging.Abstractions (>= 10.0.7)
- Polly (>= 8.6.6)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 1.0.0 | 102 | 5/3/2026 |