RecursiveTextSplitter 1.0.1
See the version list below for details.
dotnet add package RecursiveTextSplitter --version 1.0.1
NuGet\Install-Package RecursiveTextSplitter -Version 1.0.1
<PackageReference Include="RecursiveTextSplitter" Version="1.0.1" />
<PackageVersion Include="RecursiveTextSplitter" Version="1.0.1" />
<PackageReference Include="RecursiveTextSplitter" />
paket add RecursiveTextSplitter --version 1.0.1
#r "nuget: RecursiveTextSplitter, 1.0.1"
#:package RecursiveTextSplitter@1.0.1
#addin nuget:?package=RecursiveTextSplitter&version=1.0.1
#tool nuget:?package=RecursiveTextSplitter&version=1.0.1
RecursiveTextSplitter User Guide
Overview
The RecursiveTextSplitter is a C# library that provides intelligent text splitting functionality with semantic awareness. Unlike simple character-based splitting, this library attempts to preserve meaningful boundaries by using a hierarchical approach to text segmentation, from paragraph breaks down to character-level splitting as a last resort.
Key Features
- Semantic Awareness: Maintains natural text boundaries (paragraphs, sentences, words)
- Configurable Overlap: Supports overlapping chunks for better context preservation
- Flexible Separators: Allows custom separator hierarchies or uses intelligent defaults
- Detailed Metadata: Provides comprehensive information about each chunk including position data
- Line Ending Preservation: Maintains original line ending formats across different platforms
- Word-Safe Overlap: Ensures overlap occurs at natural word boundaries
Installation
Via NuGet Package Manager
Install the RecursiveTextSplitter package from NuGet:
dotnet add package RecursiveTextSplitter
Or via Package Manager Console in Visual Studio:
Install-Package RecursiveTextSplitter
Or search for "RecursiveTextSplitter" in the Visual Studio NuGet Package Manager UI.
NuGet Package: https://www.nuget.org/packages/RecursiveTextSplitter/
Usage
Add the namespace to your C# project:
using RecursiveTextSplitting;
Basic Usage
Simple Text Splitting
The most straightforward way to split text is using the RecursiveSplit extension method:
string document = "Artificial intelligence is transforming every industry. From healthcare to finance, automation is becoming smarter and more adaptive. However, challenges like bias, interpretability, and safety remain important areas of research.";
var chunks = document.RecursiveSplit(chunkSize: 80, chunkOverlap: 0);
foreach (var chunk in chunks)
{
Console.WriteLine($"Chunk: {chunk}");
Console.WriteLine("---");
}
Advanced Splitting with Metadata
For more detailed information about each chunk, use the AdvancedRecursiveSplit method:
string document = "Artificial intelligence is transforming every industry. From healthcare to finance, automation is becoming smarter and more adaptive. However, challenges like bias, interpretability, and safety remain important areas of research.";
var chunks = document.AdvancedRecursiveSplit(chunkSize: 80, chunkOverlap: 0);
foreach (var chunk in chunks)
{
Console.WriteLine($"Chunk {chunk.ChunkIndex}: {chunk.Text}");
Console.WriteLine($"Start Position: {chunk.StartPosition}");
Console.WriteLine($"End Position: {chunk.EndPosition}");
Console.WriteLine($"Separator Used: {chunk.SeparatorUsed}");
Console.WriteLine("---");
}
Working with Overlap
Overlap allows consecutive chunks to share some content, which is particularly useful for maintaining context in applications like search indexing or machine learning.
Basic Overlap Example
string document = "Artificial intelligence is transforming every industry. From healthcare to finance, automation is becoming smarter and more adaptive. However, challenges like bias, interpretability, and safety remain important areas of research.";
// Split with 25 characters of overlap
var chunks = document.RecursiveSplit(chunkSize: 80, chunkOverlap: 25);
foreach (var chunk in chunks)
{
Console.WriteLine($"Chunk: {chunk}");
Console.WriteLine("---");
}
Advanced Overlap with Metadata
string document = "Artificial intelligence is transforming every industry. From healthcare to finance, automation is becoming smarter and more adaptive. However, challenges like bias, interpretability, and safety remain important areas of research.";
var chunks = document.AdvancedRecursiveSplit(chunkSize: 80, chunkOverlap: 25);
foreach (var chunk in chunks)
{
Console.WriteLine($"Chunk {chunk.ChunkIndex}:");
Console.WriteLine($" Full Text: {chunk.Text}");
Console.WriteLine($" Overlap: '{chunk.OverlapText}'");
Console.WriteLine($" Original Content: '{chunk.ChunkText}'");
Console.WriteLine($" Position: {chunk.StartPosition}-{chunk.EndPosition}");
Console.WriteLine("---");
}
Understanding the TextChunk Class
The TextChunk class provides comprehensive metadata about each split segment:
public class TextChunk
{
public string Text { get; set; } // Complete text including overlap
public string OverlapText { get; set; } // Only the overlap portion
public string ChunkText { get; set; } // Original chunk without overlap
public int StartPosition { get; set; } // Start position in original text
public int EndPosition { get; set; } // End position in original text
public string SeparatorUsed { get; set; } // Separator that created this chunk
public int ChunkIndex { get; set; } // Sequential chunk number
}
Separator Hierarchy
The library uses a hierarchical approach to splitting, trying larger semantic units first:
- Paragraph breaks (
\n\n) - Largest semantic units - Sentence endings with newlines (
.\n,!\n,?\n) - Other punctuation with newlines (
:\n,;\n) - Single newlines (
\n) - Line breaks - Sentence endings with spaces (
.,!,?) - Punctuation with spaces (
;,,) - Word boundaries (
) - Single spaces - Character-by-character - Last resort
Contributing
We welcome contributions to make RecursiveTextSplitter even better! Here are some ways you can help:
🌟 Star this repository if you find it useful!
Your star helps others discover this library and motivates continued development.
🔧 Pull Requests Welcome
We're open to pull requests! Whether you want to:
- Fix bugs or improve existing functionality
- Add new features or splitting strategies
- Improve documentation or examples
- Optimize performance
- ...
Please feel free to fork the repository and submit a pull request. For larger changes, consider opening an issue first to discuss your approach.
📝 Reporting Issues
Found a bug or have a suggestion? Please open an issue with:
- A clear description of the problem or enhancement
- Steps to reproduce (for bugs)
- Sample code demonstrating the issue
- Expected vs actual behavior
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
| .NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
| .NET Standard | netstandard2.0 is compatible. netstandard2.1 was computed. |
| .NET Framework | net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
| MonoAndroid | monoandroid was computed. |
| MonoMac | monomac was computed. |
| MonoTouch | monotouch was computed. |
| Tizen | tizen40 was computed. tizen60 was computed. |
| Xamarin.iOS | xamarinios was computed. |
| Xamarin.Mac | xamarinmac was computed. |
| Xamarin.TVOS | xamarintvos was computed. |
| Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETStandard 2.0
- No dependencies.
-
net8.0
- No dependencies.
-
net9.0
- No dependencies.
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.