RecursiveTextSplitter 1.0.2

dotnet add package RecursiveTextSplitter --version 1.0.2
                    
NuGet\Install-Package RecursiveTextSplitter -Version 1.0.2
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="RecursiveTextSplitter" Version="1.0.2" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="RecursiveTextSplitter" Version="1.0.2" />
                    
Directory.Packages.props
<PackageReference Include="RecursiveTextSplitter" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add RecursiveTextSplitter --version 1.0.2
                    
#r "nuget: RecursiveTextSplitter, 1.0.2"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package RecursiveTextSplitter@1.0.2
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=RecursiveTextSplitter&version=1.0.2
                    
Install as a Cake Addin
#tool nuget:?package=RecursiveTextSplitter&version=1.0.2
                    
Install as a Cake Tool

RecursiveTextSplitter User Guide

Overview

The RecursiveTextSplitter is a C# library that provides intelligent text splitting functionality with semantic awareness. Unlike simple character-based splitting, this library attempts to preserve meaningful boundaries by using a hierarchical approach to text segmentation, from paragraph breaks down to character-level splitting as a last resort.

Key Features

  • Semantic Awareness: Maintains natural text boundaries (paragraphs, sentences, words)
  • Configurable Overlap: Supports overlapping chunks for better context preservation
  • Flexible Separators: Allows custom separator hierarchies or uses intelligent defaults
  • Detailed Metadata: Provides comprehensive information about each chunk including position data and line/column tracking
  • Word-Safe Overlap: Ensures overlap occurs at natural word boundaries
  • Position Tracking: Tracks both character positions and line/column coordinates in the original text

Installation

Via NuGet Package Manager

Install the RecursiveTextSplitter package from NuGet:

dotnet add package RecursiveTextSplitter

Or via Package Manager Console in Visual Studio:

Install-Package RecursiveTextSplitter

Or search for "RecursiveTextSplitter" in the Visual Studio NuGet Package Manager UI.

NuGet Package: https://www.nuget.org/packages/RecursiveTextSplitter/

Usage

Add the namespace to your C# project:

using RecursiveTextSplitting;

Basic Usage

Simple Text Splitting

The most straightforward way to split text is using the RecursiveSplit extension method:

string document = "Artificial intelligence is transforming every industry.\nFrom healthcare to finance, automation is becoming smarter and more adaptive.\n\nHowever, challenges like bias, interpretability, and safety remain important areas of research.";

var chunks = document.RecursiveSplit(chunkSize: 80, chunkOverlap: 0);

foreach (var chunk in chunks)
{
    Console.WriteLine($"Chunk: {chunk}");
    Console.WriteLine("---");
}

Advanced Splitting with Metadata

For more detailed information about each chunk, including line and column positions, use the AdvancedRecursiveSplit method:

string document = "Artificial intelligence is transforming every industry.\nFrom healthcare to finance, automation is becoming smarter and more adaptive.\n\nHowever, challenges like bias, interpretability, and safety remain important areas of research.";

var chunks = document.AdvancedRecursiveSplit(chunkSize: 80, chunkOverlap: 0);

foreach (var chunk in chunks)
{
    Console.WriteLine($"Chunk {chunk.ChunkIndex}: {chunk.Text}");
    Console.WriteLine($"Start Position: {chunk.StartPosition} (Line {chunk.StartLine}, Column {chunk.StartColumn})");
    Console.WriteLine($"End Position: {chunk.EndPosition} (Line {chunk.EndLine}, Column {chunk.EndColumn})");
    Console.WriteLine($"Separator Used: {chunk.SeparatorUsed}");
    Console.WriteLine("---");
}

Working with Overlap

Overlap allows consecutive chunks to share some content, which is particularly useful for maintaining context in applications like search indexing or machine learning.

Basic Overlap Example

string document = "Artificial intelligence is transforming every industry.\nFrom healthcare to finance, automation is becoming smarter and more adaptive.\n\nHowever, challenges like bias, interpretability, and safety remain important areas of research.";

// Split with 25 characters of overlap
var chunks = document.RecursiveSplit(chunkSize: 80, chunkOverlap: 25);

foreach (var chunk in chunks)
{
    Console.WriteLine($"Chunk: {chunk}");
    Console.WriteLine("---");
}

Advanced Overlap with Metadata

string document = "Artificial intelligence is transforming every industry.\nFrom healthcare to finance, automation is becoming smarter and more adaptive.\n\nHowever, challenges like bias, interpretability, and safety remain important areas of research.";

var chunks = document.AdvancedRecursiveSplit(chunkSize: 80, chunkOverlap: 25);

foreach (var chunk in chunks)
{
    Console.WriteLine($"Chunk {chunk.ChunkIndex}:");
    Console.WriteLine($"  Full Text: {chunk.Text}");
    Console.WriteLine($"  Overlap: '{chunk.OverlapText}'");
    Console.WriteLine($"  Original Content: '{chunk.ChunkText}'");
    Console.WriteLine($"  Position: {chunk.StartPosition}-{chunk.EndPosition}");
    Console.WriteLine($"  Location: Lines {chunk.StartLine}-{chunk.EndLine}");
    Console.WriteLine("---");
}

Understanding the TextChunk Class

The TextChunk class provides comprehensive metadata about each split segment:

public class TextChunk
{
    public string Text { get; set; }           // Complete text including overlap
    public string OverlapText { get; set; }    // Only the overlap portion
    public string ChunkText { get; set; }      // Original chunk without overlap
    public int StartPosition { get; set; }     // 1-based start position in original text
    public int EndPosition { get; set; }       // 1-based end position in original text
    public string SeparatorUsed { get; set; }  // Separator that created this chunk
    public int ChunkIndex { get; set; }        // Sequential chunk number (1-based)
    public int StartColumn { get; set; }       // 1-based column where chunk starts
    public int StartLine { get; set; }         // 1-based line where chunk starts
    public int EndColumn { get; set; }         // 1-based column where chunk ends
    public int EndLine { get; set; }           // 1-based line where chunk ends
}

Position Tracking Features

The library now provides detailed position tracking with both character-level and line/column coordinates:

  • Character Positions: StartPosition and EndPosition provide 1-based character indices in the original text
  • Line/Column Tracking: StartLine, StartColumn, EndLine, EndColumn provide 1-based line and column coordinates
  • Comprehensive Coverage: All positions are tracked accurately even when overlap is applied

Custom Separators

You can provide your own separator hierarchy for specialized splitting needs:

string document = "Section 1|Subsection A;Item 1,Item 2|Section 2;Item 3";

// Custom separators prioritizing sections, then subsections, then items
string[] customSeparators = { "|", ";", "," };

var chunks = document.AdvancedRecursiveSplit(
    chunkSize: 20, 
    chunkOverlap: 0, 
    separators: customSeparators
);

foreach (var chunk in chunks)
{
    Console.WriteLine($"Chunk: {chunk.Text}");
    Console.WriteLine($"Split using: {chunk.SeparatorUsed}");
    Console.WriteLine($"At line {chunk.StartLine}, column {chunk.StartColumn}");
    Console.WriteLine("---");
}

Separator Hierarchy

The library uses a hierarchical approach to splitting, trying larger semantic units first:

  1. Paragraph breaks (\r\n\r\n, \n\n) - Largest semantic units
  2. Sentence endings with line breaks (.\r\n, !\r\n, ?\r\n, :\r\n, ;\r\n)
  3. Single line breaks (\r\n)
  4. Sentence endings with newlines (.\n, !\n, ?\n, :\n, ;\n)
  5. Single newlines (\n)
  6. Sentence endings with spaces (. , ! , ? )
  7. Punctuation with spaces (; , , )
  8. Word boundaries ( ) - Single spaces
  9. Character-by-character ("") - Last resort

Contributing

We welcome contributions to make RecursiveTextSplitter even better! Here are some ways you can help:

🌟 Star this repository if you find it useful!

Your star helps others discover this library and motivates continued development.

🔧 Pull Requests Welcome

We're open to pull requests! Whether you want to:

  • Fix bugs or improve existing functionality
  • Add new features or splitting strategies
  • Improve documentation or examples
  • Optimize performance
  • ...

Please feel free to fork the repository and submit a pull request. For larger changes, consider opening an issue first to discuss your approach.

📝 Reporting Issues

Found a bug or have a suggestion? Please open an issue with:

  • A clear description of the problem or enhancement
  • Steps to reproduce (for bugs)
  • Sample code demonstrating the issue
  • Expected vs actual behavior
Product Compatible and additional computed target framework versions.
.NET net5.0 was computed.  net5.0-windows was computed.  net6.0 was computed.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 is compatible.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
.NET Core netcoreapp2.0 was computed.  netcoreapp2.1 was computed.  netcoreapp2.2 was computed.  netcoreapp3.0 was computed.  netcoreapp3.1 was computed. 
.NET Standard netstandard2.0 is compatible.  netstandard2.1 was computed. 
.NET Framework net461 was computed.  net462 was computed.  net463 was computed.  net47 was computed.  net471 was computed.  net472 was computed.  net48 was computed.  net481 was computed. 
MonoAndroid monoandroid was computed. 
MonoMac monomac was computed. 
MonoTouch monotouch was computed. 
Tizen tizen40 was computed.  tizen60 was computed. 
Xamarin.iOS xamarinios was computed. 
Xamarin.Mac xamarinmac was computed. 
Xamarin.TVOS xamarintvos was computed. 
Xamarin.WatchOS xamarinwatchos was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
  • .NETStandard 2.0

    • No dependencies.
  • net8.0

    • No dependencies.
  • net9.0

    • No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.0.2 179 6/18/2025
1.0.1 147 6/17/2025