DevelApp.StepLexer 1.0.1

.NET 8.0

dotnet add package DevelApp.StepLexer --version 1.0.1

NuGet\Install-Package DevelApp.StepLexer -Version 1.0.1

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="DevelApp.StepLexer" Version="1.0.1" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="DevelApp.StepLexer" Version="1.0.1" />
                    

                            Directory.Packages.props

<PackageReference Include="DevelApp.StepLexer" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add DevelApp.StepLexer --version 1.0.1

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: DevelApp.StepLexer, 1.0.1"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package DevelApp.StepLexer@1.0.1

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=DevelApp.StepLexer&version=1.0.1
                    

                            Install as a Cake Addin

#tool nuget:?package=DevelApp.StepLexer&version=1.0.1
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

ENFAStepLexer-StepParser

A modern, high-performance lexical analysis and parsing system with comprehensive PCRE2 support and CognitiveGraph integration. The system consists of DevelApp.StepLexer for zero-copy tokenization and DevelApp.StepParser for semantic analysis and grammar-based parsing.

Overview

ENFAStepLexer-StepParser is a complete parsing solution designed for high-performance pattern recognition and semantic analysis. The system uses a two-phase approach: StepLexer handles zero-copy tokenization with PCRE2 support, while StepParser provides grammar-based parsing with CognitiveGraph integration for semantic analysis and code understanding.

Key Features

🚀 DevelApp.StepLexer - Zero-Copy Tokenization

Zero-copy architecture: Memory-efficient string processing with ZeroCopyStringView
UTF-8 native processing: Direct UTF-8 handling without encoding conversions
Forward-only parsing: Predictable performance without backtracking
Comprehensive PCRE2 support: 70+ regex features including Unicode and POSIX classes
Ambiguity resolution: Splittable tokens for handling parsing ambiguities

🧠 DevelApp.StepParser - Semantic Analysis

CognitiveGraph integration: Automatic semantic graph construction during parsing
GLR-style parsing: Handles ambiguous grammars efficiently
Context-sensitive grammars: Hierarchical context management for complex languages
Symbol table management: Scope-aware symbol tracking and resolution
Grammar inheritance: Reusable grammar components and DSL composition

🔧 Advanced Pattern Support

Basic regex constructs: Literals, character classes, quantifiers, alternation
Extended anchors: \A, \Z, \z, \G for precise boundary matching
Unicode support: \x{FFFF} code points, \p{property} classes, \R newlines
POSIX character classes: [:alpha:], [:digit:], [:space:], etc.
Groups & assertions: Capturing groups, lookahead/lookbehind, named groups
Back references: Numbered (\1) and named (\k<name>) references

🏗️ Modern Architecture

Modular design: Clear separation between lexer, parser, and semantic analysis
Type-safe transitions: Enum-based token classification for reliability
Performance optimized: Zero-copy operations and memory-efficient data structures
Extensible framework: Plugin architecture for custom grammar features

📚 Comprehensive Documentation

Complete component documentation for StepLexer and StepParser
PCRE2 feature support matrix with exclusion explanations
Grammar creation guide for DSL development
CognitiveGraph integration examples
Performance optimization guidelines

Quick Start

Building the Project

# Clone the repository
git clone https://github.com/DevelApp-ai/ENFAStepLexer-StepPerser.git
cd ENFAStepLexer-StepPerser

# Restore dependencies
dotnet restore

# Build all projects
dotnet build

# Run tests
dotnet test

# Run the demo
cd src/ENFAStepLexer.Demo
dotnet run

Basic StepLexer Usage

using DevelApp.StepLexer;
using System.Text;

// Create a pattern parser for regex
var parser = new PatternParser(ParserType.Regex);

// Parse a regex pattern with zero-copy
string pattern = @"\d{2,4}-\w+@[a-z]+\.com";
var utf8Pattern = Encoding.UTF8.GetBytes(pattern);

bool success = parser.ParsePattern(utf8Pattern, "email_pattern");

if (success)
{
    Console.WriteLine("Pattern compiled successfully!");
    var tokens = parser.GetTokens();
    foreach (var token in tokens)
    {
        Console.WriteLine($"{token.Type}: {token.Text}");
    }
}

Basic StepParser Usage

using DevelApp.StepParser;

// Create parser engine
var engine = new StepParserEngine();

// Load grammar for a simple expression language
var grammar = @"
Grammar: SimpleExpr
TokenSplitter: Space

<NUMBER> ::= /[0-9]+/
<IDENTIFIER> ::= /[a-zA-Z][a-zA-Z0-9]*/
<PLUS> ::= '+'
<MINUS> ::= '-'
<WS> ::= /[ \t\r\n]+/ => { skip }

<expr> ::= <expr> <PLUS> <expr>
        | <expr> <MINUS> <expr>
        | <NUMBER>
        | <IDENTIFIER>
";

engine.LoadGrammarFromContent(grammar);

// Parse source code
var result = engine.Parse("x + 42 - y");

if (result.Success)
{
    Console.WriteLine("Parse successful!");
    var cognitiveGraph = result.CognitiveGraph;
    // Access semantic analysis results
}

Architecture

Core Components

DevelApp.StepLexer: Zero-copy lexical analyzer
- PatternParser: High-level pattern processing controller
- StepLexer: Core tokenization engine with PCRE2 support
- ZeroCopyStringView: Memory-efficient string operations
- SplittableToken: Ambiguity-aware token representation
DevelApp.StepParser: Semantic analysis and grammar parsing
- StepParserEngine: Main parsing controller with CognitiveGraph integration
- GrammarDefinition: Complete grammar specification loader
- TokenRule/ProductionRule: Grammar component definitions
- IContextStack: Hierarchical context management
- IScopeAwareSymbolTable: Symbol resolution and scoping

Processing Pipeline

The system uses a two-phase processing approach:

Lexical Analysis Phase (StepLexer):
- UTF-8 input processing with zero-copy efficiency
- PCRE2-compatible pattern recognition
- Ambiguity detection and token splitting
- Forward-only parsing for predictable performance
Semantic Analysis Phase (StepParser):
- Grammar-based syntax tree construction
- CognitiveGraph integration for semantic analysis
- Context-sensitive parsing with scope management
- Symbol table construction and resolution

Design Philosophy

Zero-Copy Performance: Minimize memory allocations through efficient data structures
Forward-Only Parsing: Avoid backtracking for predictable performance characteristics
Semantic Integration: Automatic semantic graph construction during parsing
Modular Architecture: Clear separation of concerns between lexical and semantic analysis

PCRE2 Feature Support

✅ Fully Supported (70+ features)

All basic regex constructs and quantifiers
Character classes and escape sequences
Groups, assertions, and back references
Extended anchors and boundaries
Unicode code points and properties (basic)
POSIX character classes

⚠️ Partially Supported

Unicode properties (parsing only, requires runtime implementation)

❌ Not Supported (By Design)

The following features are intentionally excluded due to architectural design decisions:

Atomic Grouping (`(?>...)`)

Conflicts with forward-only parsing architecture
Would require backtracking mechanisms that violate design principles
Compromises zero-copy, single-pass performance advantages
Alternative: Use grammar-based parsing in StepParser for complex constructs

Recursive Pattern Support (`(?R)`, `(?&name)`)

Adds unnecessary complexity to lexer architecture
Better handled by grammar-based StepParser for recursive constructs
Would compromise predictable memory usage and performance
Alternative: Implement balanced parsing through grammar rules rather than regex recursion

Other Advanced Features

Possessive quantifiers (*+, ++)
Conditional patterns ((?(condition)yes|no))
Inline modifiers ((?i), (?m))

See docs/PCRE2-Support.md for complete feature matrix and detailed explanations.

Project Structure

ENFAStepLexer-StepPerser/
├── src/
│   ├── DevelApp.StepLexer/           # Zero-copy lexical analyzer
│   │   ├── StepLexer.cs              # Core tokenization engine
│   │   ├── PatternParser.cs          # High-level pattern controller
│   │   ├── ZeroCopyStringView.cs     # Memory-efficient string operations
│   │   ├── SplittableToken.cs        # Ambiguity-aware tokens
│   │   └── ...
│   ├── DevelApp.StepParser/          # Grammar-based semantic parser  
│   │   ├── StepParserEngine.cs       # Main parsing controller
│   │   ├── GrammarDefinition.cs      # Grammar specification
│   │   ├── TokenRule.cs              # Lexical analysis rules
│   │   ├── ProductionRule.cs         # Syntax analysis rules
│   │   └── ...
│   ├── DevelApp.StepLexer.Tests/     # StepLexer unit tests
│   ├── DevelApp.StepParser.Tests/    # StepParser unit tests
│   └── ENFAStepLexer.Demo/           # Demo console application
├── docs/
│   ├── StepLexer.md                  # Complete StepLexer documentation
│   ├── StepParser.md                 # Complete StepParser documentation
│   ├── PCRE2-Support.md              # Feature support matrix
│   └── Grammar_File_Creation_Guide.md # DSL development guide
└── README.md                         # This file

Documentation

Component Documentation

StepLexer Documentation - Comprehensive guide to zero-copy lexical analysis
StepParser Documentation - Complete semantic parsing and CognitiveGraph integration
PCRE2 Support Matrix - Feature compatibility and exclusion explanations
Grammar Creation Guide - DSL development and grammar authoring

Getting Started: See Quick Start section above
Architecture Overview: Architecture section
Feature Support: PCRE2 Feature Support section
Performance: Performance section

Contributing

This project welcomes contributions in several areas:

Core Development

Adding new regex features: Extend TokenType enum and implement in StepLexer
Grammar features: Enhance StepParser with new grammar constructs
Performance improvements: Optimize zero-copy operations and memory usage
CognitiveGraph integration: Improve semantic analysis capabilities

Testing and Quality

Comprehensive unit tests: Expand test coverage for edge cases
Performance benchmarks: Add throughput and memory usage benchmarks
Grammar validation: Create test suites for grammar files
Documentation examples: Improve code examples and tutorials

Documentation

API documentation: Enhance inline code documentation
Tutorial content: Create step-by-step guides for common scenarios
Best practices: Document performance optimization techniques
Integration guides: Show integration with other parsing tools

Performance

The StepLexer-StepParser architecture provides:

StepLexer Performance

Zero-copy operations: No string allocations during tokenization
UTF-8 native processing: Direct byte-level operations
Forward-only parsing: Linear time complexity for most patterns
Memory efficient: Predictable memory usage patterns

StepParser Performance

Incremental parsing: Process changes without full re-parsing
CognitiveGraph caching: Semantic analysis result caching
Context-aware optimization: Optimized parsing for specific contexts
Symbol table efficiency: Fast symbol lookup and resolution

Benchmarks

Compilation speed: Direct pattern-to-token conversion
Memory usage: Minimal allocations with zero-copy design
Scalability: Linear performance characteristics for typical patterns
Throughput: High-performance processing for large codebases

Future Roadmap

Phase 1 (Immediate)

Enhanced test coverage for StepLexer and StepParser
Performance benchmarking suite
Nullable reference warning fixes
Advanced Unicode property validation
CognitiveGraph optimization

Phase 2 (Short-term)

Inline modifiers ((?i), (?m), etc.) in StepLexer
Literal text sequences (\Q...\E)
Comment support ((?#...))
Advanced error reporting with detailed diagnostics
Grammar inheritance improvements

Phase 3 (Long-term)

Evaluate atomic grouping support within forward-parsing constraints
Advanced CognitiveGraph analytics
Full Unicode ICU integration
Real-time parsing for IDEs and editors
Performance optimization with machine learning

Research Areas

GPU-accelerated pattern matching
Incremental parsing algorithms
Advanced semantic analysis techniques
Cross-language grammar compilation

License

This project is derived from @DevelApp/enfaparser but excludes the original license as requested. The enhancements and new code are provided for evaluation and development purposes.

Acknowledgments

Modern C# language features and .NET performance optimizations
PCRE2 specification for comprehensive regex feature reference
CognitiveGraph project for semantic analysis integration
Zero-copy design patterns inspired by Cap'n Proto and similar systems
Community feedback and contributions to parsing and lexical analysis techniques

Product	Compatible and additional computed target framework versions.
.NET	net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net8.0
- ICU4N (>= 60.1.0-alpha.438)

NuGet packages (1)

Showing the top 1 NuGet packages that depend on DevelApp.StepLexer:

Package	Downloads
DevelApp.StepParser A modern parser implementation with GLR-style multi-path parsing, context-sensitive grammar support, and CognitiveGraph integration for advanced semantic analysis. Part of the GrammarForge step-parser architecture.	109

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
1.0.1	217	9/14/2025
1.0.1-ci0072	90	9/14/2025

v1.0.1:
     - NEW: StepLexer with unified regex pattern parsing and source tokenization
     - NEW: Zero-copy UTF-8 processing with ReadOnlyMemory support
     - NEW: Two-phase parsing architecture for regex complexity avoidance
     - NEW: Multi-path tokenization for ambiguity resolution
     - NEW: Pattern splitting and single-pass disambiguation
     - NEW: Advanced Unicode support with ICU integration (Phase 3 PCRE2)
     - NEW: Comprehensive performance benchmarking framework
     - NEW: Enhanced Unicode property validation with 150+ properties
     - NEW: Unicode normalization support (NFC, NFD, NFKC, NFKD)
     - NEW: Script and binary property matching
     - ENHANCED: Location-based code targeting for surgical operations
     - ENHANCED: Production-ready PCRE2 support with comprehensive test coverage

DevelApp.StepLexer 1.0.1

ENFAStepLexer-StepParser

Overview

Key Features

🚀 DevelApp.StepLexer - Zero-Copy Tokenization

🧠 DevelApp.StepParser - Semantic Analysis

🔧 Advanced Pattern Support

🏗️ Modern Architecture

📚 Comprehensive Documentation

Quick Start

Building the Project

Basic StepLexer Usage

Basic StepParser Usage

Architecture

Core Components

Processing Pipeline

Design Philosophy

PCRE2 Feature Support

✅ Fully Supported (70+ features)

⚠️ Partially Supported

❌ Not Supported (By Design)

Atomic Grouping ((?>...))

Recursive Pattern Support ((?R), (?&name))

Other Advanced Features

Project Structure

Documentation

Component Documentation

Quick Navigation

Contributing

Core Development

Testing and Quality

Documentation

Performance

StepLexer Performance

StepParser Performance

Benchmarks

Future Roadmap

Phase 1 (Immediate)

Phase 2 (Short-term)

Phase 3 (Long-term)

Research Areas

License

Acknowledgments

net8.0

NuGet packages (1)

GitHub repositories

Atomic Grouping (`(?>...)`)

Recursive Pattern Support (`(?R)`, `(?&name)`)