DAWG (Directed Acyclic Word Graph) is a data structure for storing and searching large word lists while keeping your memory footprint small and lookups fast. DawgSharp is an open-source C# implementation featuring a linear time graph reduction algorithm and out-of-the-box persistence support.
The two main objects in the library are Dawg and DawgBuilder. Dawg is immutable, you must use DawgBuilder to build a Dawg and then save it to a stream. Then use Dawg.Load to rehydrate the data. Once reloaded, Dawg re-emerges as a completely different data structure (but, oddly, the same class) that is nearly as fast as a HashSet for lookups and is much, much more memory-efficient (factors of 30x - 40x are not uncommon). Please note that the Save/Load step is necessary to get the full potential out of the Dawg object. Use a MemoryStream if disk interaction is not desired.
The Dawg class can be thought of as a read-only Dictionary <string, Value> type. It has the [""] indexer and implements IEnumerable <KeyValuePair <string, Value>>.
One other very useful feature of Dawg (not found in Dictionary) is the ability to quickly find all words that start with a particular substring: dawg.MatchPrefix ("star") could possibly yield "star", "starch", "start", "starting", etc.
This package is provided under the terms of the GNU GPL v3. Source code and documentation are available on GitHub: https://github.com/bzaar/DawgSharp. Commercial licenses are also available.
See the version list below for details.
Install-Package DawgSharp -Version 1.2.0
dotnet add package DawgSharp --version 1.2.0
<PackageReference Include="DawgSharp" Version="1.2.0" />
paket add DawgSharp --version 1.2.0
This version has been optimized to use EIGHT times less RAM than the previous version. In a typical benchmark test, it used 2.5M RAM to store 2.5 million words (yes, one byte per word) while maintaining a lookup speed of around one million words per second. The .NET Dictionary object uses 87M RAM under the same conditions.
The new version is fully compatible with the previous version on both source-code and binary levels and it will happily read files produced by its predecessor.
The assembly is now CLS-compliant which ensures it can be used from VB.NET and F#.
The assembly has been signed to allow side-by-side installations.
The SaveTo method has been changed slightly not to close the stream after it’s done.
The license was changed to GPL for this release. If you need a commercial license, please contact the author.
This package has no dependencies.
NuGet packages (2)
Showing the top 2 NuGet packages that depend on DawgSharp:
.NET Morphological library for Russian language
.NET Vector words library for Russian language
This package is not used by any popular GitHub repositories.