Luthor 2.3.0

.NET Standard 2.1

dotnet add package Luthor --version 2.3.0

NuGet\Install-Package Luthor -Version 2.3.0

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="Luthor" Version="2.3.0" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

paket add Luthor --version 2.3.0

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: Luthor, 2.3.0"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

// Install Luthor as a Cake Addin
#addin nuget:?package=Luthor&version=2.3.0

// Install Luthor as a Cake Tool
#tool nuget:?package=Luthor&version=2.3.0

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Luthor

Extract structure from any text using a tokenising lexer.

Using Luthor you can convert any single or multiple line text into a collection containing runs of token types and their content. This provides access to the content at a higher level of abstraction, allowing further processing without having to worry about the specifics of the raw text.

For each token you get the offest, the line number, the column within the line, and the content.

For example:

Sample text.
Across 3 lines.
With a "multi 'word' string".

This gives a list of tokens like this (also including line number etc):

Letters    : "Sample"
Whitespace : " "
Letters    : "text"
Symbols    : "."
EOL        : \n
Letters    : "Across"
Whitespace : " "
Digits     : "3"
Whitespace : " "
Letters    : "lines"
Symbols    : "."
EOL        : \n
Letters    : "With"
Whitespace : " "
Letters    : "a"
Whitespace : " "
String     : ""multi 'word' string""
Symbols    : "."
EOF        : ""

Note the difference between Letters and String, the latter of which is quoted (single, double, or backticks) and can have other quotation symbols embedded within it.

Usage

To get the tokens from a given source text:

var tokens = new Lexer(sourceAsString).GetTokens();
tokens.ForEach(x => Console.WriteLine($"{x.Location.Offset,3}: {x.TokenType} => {x.Content}"));

To do the same, but with each whitespace run compressed to a single space:

var tokens = new Lexer(sourceAsString).GetTokens(true);
tokens.ForEach(x => Console.WriteLine($"{x.Location.Offset,3}: {x.TokenType} => {x.Content}"));

To get the tokens from a given source text as a collection of lines:

var lines = new Lexer(sourceAsString).GetTokensAsLines();
foreach (var line in lines)
{
    Console.WriteLine($"Line: {line.Key}");
    line.Value.ForEach(x => Console.WriteLine($" {x.Location.Column,3}: {x.TokenType} => {x.Content}"));
}

This call also supports the whitespace compression optional argument to GetTokensAsLines().

The output tokens

Token types

These are the default definitions of the available tokens.

Whitespace - spaces, tabs
Letters - upper and lower case English alphabet
Digits - 0 to 9
Symbols - any of !£$%^&*()-_=+[]{};:'@#~,.<>/?\|
String - anything enclosed in either ", ', or a backtick
Other - input characters not covered by other types
EOL - an LF (\n); any CRs (\r) are ignored
EOF - automatically added

Redefining the tokens

You can change the characters underlying the different token types:

var lexer = new Lexer(sourceAsString)
{
    Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz",
    Digits = "0123456789",
    Symbols = "!£$%^&*()-_=+[]{};:'@#~,.<>/?\\|",
    Whitespace = " \t",
    Quotes = "'\"`",
};
var tokens = lexer.GetTokens();

The Quotes characters are handled differently from the others. Each one represents a valid start/end character ('terminator'), and the same character must be used to close the string as to open it.

Other quote characters within the string (i.e. between the terminators) are considered plain content within the current string rather than terminators for new strings in their own right.

General comments

Linux/Unix, Mac OS, and Windows all have a \n (LF) in their line endings, so \r (CR) is discarded and won't appear in any tokens.
There will always be a final EOF token, even for an empty input string.

Product	Compatible and additional computed target framework versions.
.NET	net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed.
.NET Core	netcoreapp3.0 was computed. netcoreapp3.1 was computed.
.NET Standard	netstandard2.1 is compatible.
MonoAndroid	monoandroid was computed.
MonoMac	monomac was computed.
MonoTouch	monotouch was computed.
Tizen	tizen60 was computed.
Xamarin.iOS	xamarinios was computed.
Xamarin.Mac	xamarinmac was computed.
Xamarin.TVOS	xamarintvos was computed.
Xamarin.WatchOS	xamarinwatchos was computed.

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

.NETStandard 2.1
- No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last updated
2.3.0	217	8/30/2023
2.2.1	453	6/19/2021
2.2.0	361	6/18/2021
2.1.0	384	6/18/2021
1.0.1	903	8/19/2018