Spidey 5.0.33

There is a newer version of this package available.
See the version list below for details.
dotnet add package Spidey --version 5.0.33
                    
NuGet\Install-Package Spidey -Version 5.0.33
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Spidey" Version="5.0.33" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Spidey" Version="5.0.33" />
                    
Directory.Packages.props
<PackageReference Include="Spidey" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Spidey --version 5.0.33
                    
#r "nuget: Spidey, 5.0.33"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package Spidey@5.0.33
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=Spidey&version=5.0.33
                    
Install as a Cake Addin
#tool nuget:?package=Spidey&version=5.0.33
                    
Install as a Cake Tool

Spidey

.NET Publish

Library to help with crawling web content. Compatible with .Net Core and .Net Framework.

Setting up the Library

Spidey relies on Canister in order to hook itself up. In order for this to work, you must do the following at startup:

new ServiceCollection().AddCanisterModules();

The AddCanisterModules function is an extension method that registers it with the IoC container. When this is done, Spidey is ready to use.

Basic Usage

Spidey really boils down to using one class called Crawler:

ServiceCollection.AddSingleton(new Options
			{
			    ItemFound = FoundFile=>{}                                            //The callback method used when a new page is discovered.
			    Allow = new List<string> { "http://mywebsite", "http://mywebsite2" },    //Regexes of what sites/pages are allowed to be crawled.
			    FollowOnly = new List<string> { "..." },                                 //Regexes of pages to only follow links that are found on them.
			    Ignore = new List<string> { "..." },                                     //Regexes that the system will ignore when they are encountered.
			    StartLocations = new List<string> { "http://mywebsite", "http://mywebsite2" },    //Starting URLs for the crawler.
			    UrlReplacements = new Dictionary<string,string> {...}                    //When the system hits one of the keys in the dictionary, it will replace it with the value.
			});
			

Note that it is recommended that you actually register the Options object in your ServiceCollection and resolve the Crawler object from the service provider but it is not required. You can simply new up an instance of Crawler if you want. Anyway, the Options class has a number of properties, some of which are not displayed above such as NetworkCredentials, UseDefaultCredentials, and Proxy. The callback method is what will be called by the system once a link's info has been received and looks like this:

void CallbackMethod(ResultFile obj) { ... }

The library will handle parsing of links found within the page, downloading the content, etc. for the most part. At this point all you have to do is call the StartCrawl method:

MyCrawler.StartCrawl();

Customization

Note that it's possible to customize the crawler's various parts. The system is divided into the following sections:

  1. Content Parser (IContentParser) - This parses the resulting data and converts it to the ResultFile object.
  2. Engine (IEngine) - This downloads the content from the server.
  3. Link Discoverer (ILinkDiscoverer) - Takes the content from the engine and looks for links to other resources.
  4. Processor (IProcessor) - Takes the parsed content and hands it off to your code. The default one simply calls the method provided in the options.
  5. Scheduler (IScheduler) - Handles handing out work to the various workers.
  6. Pipeline (IPipeline) - Manages the various parts of the process by feeding the content to the next bit of the process.

These subsystems all implement interfaces found in the Spidey.Engines.Interfaces namespace. In order to replace the default in any of these systems all you need to do is create a class that implements the interface that you want to replace. After that the system will automatically pick it up if resolved from the service provider. If you, instead, new up a Crawler object then you will need to compose the Pipeline object.

Installation

The library is available via Nuget with the package name "Spidey". To install it run the following command in the Package Manager Console:

Install-Package Spidey

FAQ

  1. Is it possible to run the crawler using multiple nodes?

The default scheduler assumes that you are only running the crawler from one location and doesn't talk to other instances of the application. But it is possible to replace the scheduler with one that will talk via some mechanism like a database to coordinate work between instances and is recommended for more complex setups.

Build Process

In order to build the library you will require the following:

  1. Visual Studio 2019

Other than that, just clone the project and you should be able to load the solution and build without too much effort.

Product Compatible and additional computed target framework versions.
.NET net6.0 is compatible.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
6.0.4 130 7/16/2025
6.0.3 138 6/27/2025
6.0.2 145 6/25/2025
6.0.1 165 12/9/2024
6.0.0 148 11/25/2024
5.0.131 145 11/12/2024
5.0.130 127 11/11/2024
5.0.129 127 11/6/2024
5.0.128 123 11/5/2024
5.0.127 123 11/4/2024
5.0.126 126 10/31/2024
5.0.125 119 10/30/2024
5.0.124 132 10/29/2024
5.0.123 140 10/11/2024
5.0.122 132 10/10/2024
5.0.121 123 10/9/2024
5.0.120 134 10/2/2024
5.0.119 131 10/1/2024
5.0.118 144 9/24/2024
5.0.117 143 9/17/2024
5.0.116 180 9/10/2024
5.0.115 141 9/3/2024
5.0.114 133 8/30/2024
5.0.113 146 8/27/2024
5.0.112 151 8/26/2024
5.0.111 161 8/23/2024
5.0.110 160 8/21/2024
5.0.109 152 8/20/2024
5.0.108 159 8/16/2024
5.0.107 159 8/15/2024
5.0.106 137 8/5/2024
5.0.105 117 8/2/2024
5.0.104 128 8/1/2024
5.0.103 130 7/26/2024
5.0.102 151 7/11/2024
5.0.101 148 7/2/2024
5.0.100 148 6/27/2024
5.0.99 135 6/26/2024
5.0.98 157 6/19/2024
5.0.97 148 6/18/2024
5.0.96 156 6/17/2024
5.0.95 149 6/14/2024
5.0.94 138 6/13/2024
5.0.93 137 6/12/2024
5.0.92 145 5/31/2024
5.0.91 140 5/30/2024
5.0.90 147 5/17/2024
5.0.89 153 5/16/2024
5.0.88 166 5/8/2024
5.0.87 164 5/7/2024
5.0.86 180 5/6/2024
5.0.85 120 5/3/2024
5.0.84 138 5/2/2024
5.0.83 147 5/1/2024
5.0.82 161 4/30/2024
5.0.81 151 4/16/2024
5.0.80 153 4/12/2024
5.0.79 151 4/11/2024
5.0.78 166 4/1/2024
5.0.77 156 3/29/2024
5.0.76 162 3/18/2024
5.0.75 148 3/15/2024
5.0.74 156 3/14/2024
5.0.73 152 3/11/2024
5.0.72 147 3/8/2024
5.0.71 151 3/7/2024
5.0.70 173 3/6/2024
5.0.69 162 3/5/2024
5.0.68 158 3/4/2024
5.0.67 165 2/29/2024
5.0.66 151 2/28/2024
5.0.65 161 2/26/2024
5.0.64 158 2/23/2024
5.0.63 167 2/22/2024
5.0.62 165 2/21/2024
5.0.61 157 2/16/2024
5.0.60 157 2/15/2024
5.0.59 164 2/12/2024
5.0.58 155 2/8/2024
5.0.57 148 2/7/2024
5.0.56 152 2/6/2024
5.0.55 137 2/1/2024
5.0.54 157 1/31/2024
5.0.53 147 1/30/2024
5.0.52 141 1/24/2024
5.0.51 155 1/23/2024
5.0.50 157 1/12/2024
5.0.49 156 1/11/2024
5.0.48 165 12/26/2023
5.0.47 152 12/22/2023
5.0.46 146 12/18/2023
5.0.45 134 12/15/2023
5.0.44 136 12/14/2023
5.0.43 147 12/13/2023
5.0.42 143 12/12/2023
5.0.41 170 11/24/2023
5.0.40 163 11/21/2023
5.0.39 138 11/20/2023
5.0.38 140 11/17/2023
5.0.37 143 11/16/2023
5.0.36 141 11/14/2023
5.0.35 138 11/8/2023
5.0.34 129 11/7/2023
5.0.33 145 11/6/2023
5.0.32 142 11/1/2023
5.0.31 136 10/31/2023
5.0.30 160 10/30/2023
5.0.29 146 10/26/2023
5.0.28 179 10/12/2023
5.0.27 178 10/5/2023
5.0.26 163 9/26/2023
5.0.25 140 9/20/2023
5.0.24 139 9/19/2023
5.0.23 181 9/18/2023
5.0.22 157 9/14/2023
5.0.21 142 9/13/2023
5.0.20 168 9/11/2023
5.0.19 163 9/7/2023
5.0.18 160 9/6/2023
5.0.17 158 9/5/2023
5.0.16 171 9/4/2023
5.0.15 158 9/1/2023
5.0.14 168 8/31/2023
5.0.13 181 8/30/2023
5.0.12 181 8/29/2023
5.0.11 169 8/25/2023
5.0.10 183 8/23/2023
5.0.9 168 8/18/2023
5.0.8 192 8/10/2023
5.0.7 193 8/8/2023
5.0.6 198 8/8/2023
5.0.5 218 8/7/2023
5.0.4 196 8/3/2023
5.0.3 213 7/26/2023
5.0.2 188 7/20/2023
5.0.1 200 7/14/2023
5.0.0 361 12/12/2022
4.0.5 601 6/10/2022
4.0.2 562 1/11/2022
4.0.1 530 7/19/2021
3.0.9 592 1/7/2021
3.0.7 677 9/13/2020
3.0.6 650 6/26/2020
3.0.5 627 6/26/2020
3.0.3 637 3/25/2020
3.0.2 694 3/1/2020
3.0.1 712 1/1/2020
3.0.0 683 12/23/2019
2.0.15 718 11/22/2019
2.0.14 674 11/22/2019
2.0.13 663 11/22/2019
2.0.12 653 11/21/2019
2.0.11 649 11/21/2019
2.0.10 661 11/21/2019
2.0.9 650 11/21/2019
2.0.8 832 3/3/2019
2.0.7 758 3/3/2019
2.0.6 774 3/3/2019
2.0.5 759 3/3/2019
2.0.4 819 2/7/2019
2.0.3 1,292 6/1/2018
2.0.2 1,283 5/22/2018
2.0.1 1,419 1/2/2018
1.0.12 1,195 11/2/2017
1.0.11 1,192 10/30/2017
1.0.10 1,196 10/26/2017
1.0.9 1,193 10/26/2017
1.0.8 1,223 10/26/2017
1.0.7 1,178 10/25/2017
1.0.6 1,167 10/25/2017
1.0.5 1,207 10/24/2017
1.0.4 1,142 10/24/2017
1.0.3 1,138 10/19/2017
1.0.2 1,272 10/18/2017
1.0.1 1,192 9/29/2017