Stringier.Patterns 0.4.0

Provides SNOBOL4 or UNICON inspired patterns and parsing

Install-Package Stringier.Patterns -Version 0.4.0
dotnet add package Stringier.Patterns --version 0.4.0
<PackageReference Include="Stringier.Patterns" Version="0.4.0" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Stringier.Patterns --version 0.4.0
The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Stringier.Patterns

Patterns, probably introduced with SNOBOL, and also seen with SPITBOL and UNICON, are considerably more powerful than Regular Expressions. So what do you do when you need to parse something more complicated than a Regex? Hacky Regex extensions aren't great, and still lack what some advanced alternatives can offer. Parser Combinators? Actually these are great. I'm not going to bash them at all. Pattern Matching and Parser Combinators share a huge amount of theory and implementation details. You could consider them alternative interpretations of the same concept. That being said, you'll notice a few small differences, but the advantages apply to both.

Including

using System.Text.Patterns;

Usage

In most situations, there's only three usage patterns you'll need to know.

Declaration

Literal patternName = "Text to match";

or

Literal patternName = ("Text to match", StringComparison.CurrentCultureIgnoreCase);
// Comparisons of this Literal will use the StringComparison value

or

Pattern patternName = literalPattern1 & (literalPattern2 | literalPattern3);

Matching

patternName.Consume("Candidate text");
//Assuming Consume captures "Candidate" this will return true and "Candidate"

Inline (Quick Match)

"Hello".Consume("Hello World!");
//Assuming "Hello" captures "Hello" (which it obviously will) this will return true and "Hello"

Concepts

Multiple return values

Pattern matching is largely based around the idea of goal-direction. The two most likely languages you're using this library from C# and VB.NET don't support goal-direction (if you're using F# then FParsec is going to match that programming style better anyways). Goal-directed semantics require both a success state and the result to be returned from every function call (or just the success state for a void return).

But wait, C# can't return multiple values!

While true, this is remarkably pedantic. Whether you return an array, a struct, a class, or a tuple, you are returning multiple values as one conceptual value. All the parsing methods return Result which contains both the success state (Boolean) and the result of the operation (String). Result implicitly casts to both Boolean and String and can be used as such. This allows some conveniences without adding new methods.

So every return passes two values? Isn't that a lot of extra memory?

One, no not really, a single Boolean isn't very large. Two, it doesn't actually pass a Boolean at all. An empty string is recognized as a failure. Essentially Result is a box of String with special comparisons and implicit conversions. In other words, the behavior of Parse and TryParse combined into one method. And, getting technical, we're not actually passing around String either. We're actually passing around Span&lt;Char&gt; for performance reasons; actually passing around references to parts of the string, preventing copying in most situations.

Literal

Literal patternName = "Literal Pattern";

This is an exact 1:1 match pattern, and is equivalent to

"pattern" == "candidate"`

Literal is meant mostly as a building block for patterns. Because pattern operators expect to use a Literal, which is not a string, the convenient syntax shown above only applies to Literal. Use inside a pattern operator might require a cast like

(Literal)"Literal Pattern" & "Other Literal Pattern"

Alternator

Pattern patternName = pattern1 | pattern2;

Alternators accept either pattern, and are equivalent to the regex (pattern1|pattern2).

Combinator

Pattern patternName = pattern1 & pattern2;

Combinators require both patterns in sequence and are equivalent to the regex (pattern1)(pattern2) with the unnecessary parenthesis added for readability.

Optor

Pattern patternName = ~pattern;

Optors make the pattern completly optional, so success is always true, and are equivalent to the regex (pattern)?.

Repeater

Pattern patternName = pattern * 3; //repeats the pattern three times

Repeaters require the pattern to repeat the specified number of times, and can be thought of the multiplcation to patterns when combinators are addition. The above example would be equivalent to the regex pattern{3}.

Spanner

Pattern patternName = pattern.Span();

Spanners require the pattern to exist at least once, but will repeat until the pattern can no longer be matched, and are equivalent to the regex pattern+.

OptorSpanners

Pattern patternName = ~pattern.Span();

Technically not its own type, but this does represent a Regex symbol that doesn't have a direct matching. It is equivalent to the regex pattern*.

Stringier.Patterns

Patterns, probably introduced with SNOBOL, and also seen with SPITBOL and UNICON, are considerably more powerful than Regular Expressions. So what do you do when you need to parse something more complicated than a Regex? Hacky Regex extensions aren't great, and still lack what some advanced alternatives can offer. Parser Combinators? Actually these are great. I'm not going to bash them at all. Pattern Matching and Parser Combinators share a huge amount of theory and implementation details. You could consider them alternative interpretations of the same concept. That being said, you'll notice a few small differences, but the advantages apply to both.

Including

using System.Text.Patterns;

Usage

In most situations, there's only three usage patterns you'll need to know.

Declaration

Literal patternName = "Text to match";

or

Literal patternName = ("Text to match", StringComparison.CurrentCultureIgnoreCase);
// Comparisons of this Literal will use the StringComparison value

or

Pattern patternName = literalPattern1 & (literalPattern2 | literalPattern3);

Matching

patternName.Consume("Candidate text");
//Assuming Consume captures "Candidate" this will return true and "Candidate"

Inline (Quick Match)

"Hello".Consume("Hello World!");
//Assuming "Hello" captures "Hello" (which it obviously will) this will return true and "Hello"

Concepts

Multiple return values

Pattern matching is largely based around the idea of goal-direction. The two most likely languages you're using this library from C# and VB.NET don't support goal-direction (if you're using F# then FParsec is going to match that programming style better anyways). Goal-directed semantics require both a success state and the result to be returned from every function call (or just the success state for a void return).

But wait, C# can't return multiple values!

While true, this is remarkably pedantic. Whether you return an array, a struct, a class, or a tuple, you are returning multiple values as one conceptual value. All the parsing methods return Result which contains both the success state (Boolean) and the result of the operation (String). Result implicitly casts to both Boolean and String and can be used as such. This allows some conveniences without adding new methods.

So every return passes two values? Isn't that a lot of extra memory?

One, no not really, a single Boolean isn't very large. Two, it doesn't actually pass a Boolean at all. An empty string is recognized as a failure. Essentially Result is a box of String with special comparisons and implicit conversions. In other words, the behavior of Parse and TryParse combined into one method. And, getting technical, we're not actually passing around String either. We're actually passing around Span&lt;Char&gt; for performance reasons; actually passing around references to parts of the string, preventing copying in most situations.

Literal

Literal patternName = "Literal Pattern";

This is an exact 1:1 match pattern, and is equivalent to

"pattern" == "candidate"`

Literal is meant mostly as a building block for patterns. Because pattern operators expect to use a Literal, which is not a string, the convenient syntax shown above only applies to Literal. Use inside a pattern operator might require a cast like

(Literal)"Literal Pattern" & "Other Literal Pattern"

Alternator

Pattern patternName = pattern1 | pattern2;

Alternators accept either pattern, and are equivalent to the regex (pattern1|pattern2).

Combinator

Pattern patternName = pattern1 & pattern2;

Combinators require both patterns in sequence and are equivalent to the regex (pattern1)(pattern2) with the unnecessary parenthesis added for readability.

Optor

Pattern patternName = ~pattern;

Optors make the pattern completly optional, so success is always true, and are equivalent to the regex (pattern)?.

Repeater

Pattern patternName = pattern * 3; //repeats the pattern three times

Repeaters require the pattern to repeat the specified number of times, and can be thought of the multiplcation to patterns when combinators are addition. The above example would be equivalent to the regex pattern{3}.

Spanner

Pattern patternName = pattern.Span();

Spanners require the pattern to exist at least once, but will repeat until the pattern can no longer be matched, and are equivalent to the regex pattern+.

OptorSpanners

Pattern patternName = ~pattern.Span();

Technically not its own type, but this does represent a Regex symbol that doesn't have a direct matching. It is equivalent to the regex pattern*.

Version History

Version Downloads Last updated
0.4.0 57 5/4/2019
0.3.1 59 5/3/2019
0.3.0 59 5/3/2019
0.2.0 59 4/29/2019
0.1.1 98 4/24/2019
0.1.0 95 4/24/2019