JonDavis.Majestic12.HTMLparser 3.1.4.1

dotnet add package JonDavis.Majestic12.HTMLparser --version 3.1.4.1
NuGet\Install-Package JonDavis.Majestic12.HTMLparser -Version 3.1.4.1
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="JonDavis.Majestic12.HTMLparser" Version="3.1.4.1" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add JonDavis.Majestic12.HTMLparser --version 3.1.4.1
#r "nuget: JonDavis.Majestic12.HTMLparser, 3.1.4.1"
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install JonDavis.Majestic12.HTMLparser as a Cake Addin
#addin nuget:?package=JonDavis.Majestic12.HTMLparser&version=3.1.4.1

// Install JonDavis.Majestic12.HTMLparser as a Cake Tool
#tool nuget:?package=JonDavis.Majestic12.HTMLparser&version=3.1.4.1

This is a .NET Standard 2.0 port of Majetic12.HTMLparser v3.1.4
Original source: https://www.majestic12.co.uk/projects/html_parser.php

Product Compatible and additional computed target framework versions.
.NET net5.0 was computed.  net5.0-windows was computed.  net6.0 was computed.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
.NET Core netcoreapp2.0 was computed.  netcoreapp2.1 was computed.  netcoreapp2.2 was computed.  netcoreapp3.0 was computed.  netcoreapp3.1 was computed. 
.NET Standard netstandard2.0 is compatible.  netstandard2.1 was computed. 
.NET Framework net461 was computed.  net462 was computed.  net463 was computed.  net47 was computed.  net471 was computed.  net472 was computed.  net48 was computed.  net481 was computed. 
MonoAndroid monoandroid was computed. 
MonoMac monomac was computed. 
MonoTouch monotouch was computed. 
Tizen tizen40 was computed.  tizen60 was computed. 
Xamarin.iOS xamarinios was computed. 
Xamarin.Mac xamarinmac was computed. 
Xamarin.TVOS xamarintvos was computed. 
Xamarin.WatchOS xamarinwatchos was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
  • .NETStandard 2.0

    • No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
3.1.4.1 511 2/23/2021
3.1.4 274 2/23/2021

From the original author:

Free .NET HTML parser (C#) is an open source high-performance .NET C# module that was created to parse HTML for links, indexing and other purposes. Full source code (~5k lines) is available under BSD license (this means you can use it in your commercial applications). This cross-platform code is verified to run very well under Mono. The parser is 100% self-contained managed code that does not depend on any external DLLs apart from core .NET libraries. We use this parser to process well over 3 TB of HTML every day.

I created this module for use in Distributed Search Engine that required processing of terabytes of HTML on a daily basis, and naturally it had to be done very fast. Thus, the focus for this project was its high performance. I've spent countless hours making sure its fast, and you will be able to benchmark it on your own hardware, but Majestic-12's homepage snapshot (20 KB) is parsed as fast as under 2 ms (v1.0) 0.47 msecs (v3.0) on an Athlon x2 3800 (2 Ghz) PC (using single core, dual channel DDR 400).

Current version is about 2 4 (!) times faster than the one released last year, it also supports non-English words support via encodings (see Main.cs for details) as well as Unicode characters set via entities, it should also be more suitable for XML parsing.

There are NUnit tests that cover approximately 71% of code, with 91% of key TagParser.cs that deals with tag parsing - you can help by adding to existing tests, best to use TestDriven.NET as they allow to easily test tests and see how much of the code is covered by those tests.

I would be very interested to know how this module compares to others, so if you made some testing then please email me the results. Also it would be nice to get a few NUnit test cases for automated testing as it is very easy to break parser in a subtle way that won't be immediately apparent.

Finally, if you manage to squeeze more speed out of it, then it would be nice for you to share the changes with me, this would help you too, because I am certainly going to try to make it faster than it is now, so if you share your changes it would mean you won't have to merge my changes into yours.