EasyReasy.KnowledgeBase
1.1.2
dotnet add package EasyReasy.KnowledgeBase --version 1.1.2
NuGet\Install-Package EasyReasy.KnowledgeBase -Version 1.1.2
<PackageReference Include="EasyReasy.KnowledgeBase" Version="1.1.2" />
<PackageVersion Include="EasyReasy.KnowledgeBase" Version="1.1.2" />
<PackageReference Include="EasyReasy.KnowledgeBase" />
paket add EasyReasy.KnowledgeBase --version 1.1.2
#r "nuget: EasyReasy.KnowledgeBase, 1.1.2"
#:package EasyReasy.KnowledgeBase@1.1.2
#addin nuget:?package=EasyReasy.KnowledgeBase&version=1.1.2
#tool nuget:?package=EasyReasy.KnowledgeBase&version=1.1.2
EasyReasy.KnowledgeBase
A powerful .NET library for processing and intelligently chunking knowledge documents using embedding-based similarity analysis. Designed for RAG (Retrieval-Augmented Generation) systems that need to handle large documents and books efficiently.
Key Features
- 🧠 Smart AI Semantic Sectioning: Uses embedding similarity and statistical analysis to intelligently group related chunks into sections
- 📊 Adaptive Thresholds: Automatically determines section boundaries using standard deviation analysis of the embedding semantic similarity instead of fixed similarity values
- 💾 Memory Efficient: Three-tier streaming architecture that processes large documents without loading everything into memory. No matter how small or large a file is, it can be chunked and sectioned with very low memory usage.
- 🎯 Progressive Strictness: Becomes more selective about section breaks as sections approach maximum size
- 📝 Document Format Aware: Respects document structure (markdown headers, code blocks) for natural boundaries, with support for custom document types
- 🔍 Intelligent Search: Vector-based similarity search with comprehensive confidence ratings and relevance metrics
- 🤖 AI Generation: Built-in services for summarization, question generation, and contextualization using one-shot LLM generation
No Forced Dependencies
This package is built with the philosophy that no dependencies should be required. This core library provides all the interfaces and logic for how they interact but how you implement them is up to you. Or if you don't want to implement them yourself all of them are already readily implemented for you in separate packages. You can choose exactly which you want to use and which you want to implement yourself.
Available Packages for Implementations
- EasyReasy.KnowledgeBase: Core library with interfaces and models
- EasyReasy.KnowledgeBase.Storage.Sqlite: SQLite-based storage implementation
- EasyReasy.KnowledgeBase.Storage.IntegratedVectorStore: Vector storage integration
- EasyReasy.VectorStorage: High-performance vector storage and similarity search
- EasyReasy.KnowledgeBase.BertTokenization: BERT-based tokenization
- EasyReasy.KnowledgeBase.OllamaGeneration: Ollama integration for embeddings and generation
Quick Start
Creating a Searchable Knowledge Base
[!NOTE] In the example below a lot of implementations from other packages are used. This is because this main package (EasyReasy.KnowledgeBase) mostly contains interfaces for things. Implementations are then in other packages so that there are no forced dependencies and you can choose yourself if you want to use the provided implementations (from the provided additional packages) or if you want to create your own.
// Set up your services using provided implementations
// Note: These implementations are in separate packages - see Dependencies section below
BertTokenizer tokenizer = await BertTokenizer.CreateAsync();
EasyReasyOllamaEmbeddingService embeddingService = await EasyReasyOllamaEmbeddingService.CreateAsync(
baseUrl: "https://your-ollama-server.com",
apiKey: "your-api-key",
modelName: "nomic-embed-text");
// Create storage components using provided implementations
// Note: These implementations are in separate packages - you can also implement your own
IKnowledgeStore knowledgeStore = await SqliteKnowledgeStore.CreateAsync("knowledge.db");
CosineVectorStore cosineVectorStore = new CosineVectorStore(embeddingService.Dimensions);
IKnowledgeVectorStore chunksVectorStore = new EasyReasyVectorStore(cosineVectorStore);
// Create the searchable knowledge store
ISearchableKnowledgeStore searchableKnowledgeStore = new SearchableKnowledgeStore(knowledgeStore, chunksVectorStore);
// Create the searchable knowledge base
ISearchableKnowledgeBase knowledgeBase = new SearchableKnowledgeBase(
searchableKnowledgeStore,
embeddingService,
tokenizer);
// Index documents using a file source provider
IFileSourceProvider fileSourceProvider = new YourFileSourceProvider();
IIndexer indexer = knowledgeBase.CreateIndexer();
foreach (IFileSource fileSource in await fileSourceProvider.GetAllFilesAsync())
{
bool wasIndexed = await indexer.ConsumeAsync(fileSource);
if (wasIndexed)
{
Console.WriteLine($"Indexed: {fileSource.FileName}");
}
else
{
Console.WriteLine($"Skipped (already up to date): {fileSource.FileName}");
}
}
// Search for relevant content
IKnowledgeBaseSearchResult result = await knowledgeBase.SearchAsync("your query", maxSearchResultsCount: 10);
if (result.WasSuccess)
{
// Cast to concrete type to access detailed results
if (result is KnowledgeBaseSearchResult searchResult)
{
foreach (RelevanceRatedEntry<KnowledgeFileSection> section in searchResult.RelevantSections)
{
Console.WriteLine($"Relevance: {section.Relevance.RelevanceScore}");
Console.WriteLine($"Content: {section.Item.ToString()}");
}
}
// Or use the context string for LLM input
string contextString = result.GetAsContextString();
Console.WriteLine(contextString);
}
Simple Section Reading
// Set up services using provided implementations
BertTokenizer tokenizer = await BertTokenizer.CreateAsync();
EasyReasyOllamaEmbeddingService embeddingService = await EasyReasyOllamaEmbeddingService.CreateAsync(
baseUrl: "https://your-ollama-server.com",
apiKey: "your-api-key",
modelName: "nomic-embed-text");
// Using the factory for easy setup
SectionReaderFactory factory = new SectionReaderFactory(embeddingService, tokenizer);
using Stream stream = File.OpenRead("document.md");
Guid fileId = Guid.NewGuid(); // The ID of the knowledge file being processed
// Create a section reader with sensible defaults
SectionReader sectionReader = factory.CreateForMarkdown(stream, fileId, maxTokensPerChunk: 100, maxTokensPerSection: 1000);
// Read sections
int sectionIndex = 0;
await foreach (List<KnowledgeFileChunk> chunks in sectionReader.ReadSectionsAsync())
{
KnowledgeFileSection section = KnowledgeFileSection.CreateFromChunks(chunks, fileId, sectionIndex);
Console.WriteLine($"Section: {section.ToString()}");
sectionIndex++;
}
// Clean up resources
embeddingService.Dispose();
Manual Configuration
// Set up services using provided implementations
BertTokenizer tokenizer = await BertTokenizer.CreateAsync();
EasyReasyOllamaEmbeddingService embeddingService = await EasyReasyOllamaEmbeddingService.CreateAsync(
baseUrl: "https://your-ollama-server.com",
apiKey: "your-api-key",
modelName: "nomic-embed-text");
// For more control over the chunking process
Guid fileId = Guid.NewGuid(); // The ID of the knowledge file being processed
ChunkingConfiguration chunkingConfig = new ChunkingConfiguration(tokenizer, maxTokensPerChunk: 100, ChunkStopSignals.Markdown);
SectioningConfiguration sectioningConfig = new SectioningConfiguration(
maxTokensPerSection: 1000,
lookaheadBufferSize: 200,
standardDeviationMultiplier: 1.0,
minimumTokensPerSection: 50,
chunkStopSignals: ChunkStopSignals.Markdown);
using StreamReader reader = new StreamReader(stream);
TextSegmentReader textSegmentReader = TextSegmentReader.CreateForMarkdown(reader);
SegmentBasedChunkReader chunkReader = new SegmentBasedChunkReader(textSegmentReader, chunkingConfig);
SectionReader sectionReader = new SectionReader(chunkReader, embeddingService, sectioningConfig, tokenizer, fileId);
await foreach (List<KnowledgeFileChunk> chunks in sectionReader.ReadSectionsAsync())
{
// Process sections...
}
// Clean up resources
embeddingService.Dispose();
Architecture Overview
The system uses a three-tier streaming architecture for efficient document processing:
1. Text Segment Reader
- Reads the smallest meaningful text units (sentences, lines, paragraphs)
- Handles different text formats (markdown, plain text)
- Memory efficient streaming
2. Chunk Reader
- Combines segments into chunks based on token limits
- Respects stop signals (markdown headers, code blocks) for natural boundaries
- Configurable chunk sizes and stop conditions
3. Section Reader
- Groups chunks into sections using embedding similarity analysis
- Uses statistical thresholds (standard deviation) instead of fixed similarity values
- Progressive strictness: Becomes more selective as sections approach size limits
- Minimum constraints: Prevents tiny sections from being created
How Smart Sectioning Works
- Lookahead Analysis: Maintains a buffer of upcoming chunks (~200 by default)
- Statistical Thresholding: Calculates similarity distribution and uses
mean - (multiplier × std_deviation)
as split threshold - Progressive Strictness: After 75% section capacity, splitting likelihood increases quadratically
- Minimum Constraints: Ensures sections have meaningful content (minimum chunks and tokens)
- Stop Signal Awareness: Considers markdown structure when making splitting decisions
API Reference
Core Configuration Classes
ChunkingConfiguration
new ChunkingConfiguration(
ITokenizer tokenizer,
int maxTokensPerChunk = 300,
string[]? chunkStopSignals = null)
MaxTokensPerChunk
: Maximum tokens per chunkChunkStopSignals
: Signals that force chunk boundaries (e.g.,ChunkStopSignals.Markdown
)
SectioningConfiguration
new SectioningConfiguration(
int maxTokensPerSection = 4000,
int lookaheadBufferSize = 100,
double standardDeviationMultiplier = 1.0,
double minimumSimilarityThreshold = 0.65,
double tokenStrictnessThreshold = 0.75,
int minimumChunksPerSection = 2,
int minimumTokensPerSection = 50,
string[]? chunkStopSignals = null)
Readers
TextSegmentReader
CreateForMarkdown(StreamReader reader)
: Creates reader optimized for markdownReadNextTextSegmentAsync()
: Returns next text segment or null
ITextSegmentReader
ReadNextTextSegmentAsync(CancellationToken cancellationToken)
: Returns next text segment or null- Generic interface for text segmentation capabilities
SegmentBasedChunkReader
- Constructor:
(TextSegmentReader segmentReader, ChunkingConfiguration config)
ReadNextChunkContentAsync()
: Returns next chunk as string or null
IKnowledgeChunkReader
ReadNextChunkContentAsync(CancellationToken cancellationToken)
: Returns next chunk as string or null- Interface for reading chunks of content from streams
SectionReader
- Constructor:
(SegmentBasedChunkReader chunkReader, IEmbeddingService embeddings, SectioningConfiguration config, ITokenizer tokenizer, Guid fileId)
ReadSectionsAsync()
: ReturnsIAsyncEnumerable<List<KnowledgeFileChunk>>
IKnowledgeSectionReader
ReadSectionsAsync(CancellationToken cancellationToken)
: ReturnsIAsyncEnumerable<List<KnowledgeFileChunk>>
- Interface for reading sections by grouping chunks based on content similarity
- Implements
IDisposable
SectionReaderFactory
- Constructor:
(IEmbeddingService embeddingService, ITokenizer tokenizer)
CreateForMarkdown(Stream stream, Guid fileId, int maxTokensPerChunk, int maxTokensPerSection)
: Quick setup for markdown documents
ITokenReader
ReadNextTokens(int tokenCount)
: Reads next specified number of tokensPeekNextTokens(int tokenCount)
: Peeks at next tokens without consuming themSeekBackward(int tokenCount)
: Seeks backward in token bufferCurrentPosition
: Gets current position in token streamTotalTokensRead
: Gets total tokens read so farHasMoreTokens
: Checks if more tokens are available- Provides streaming tokenization with forward and backward buffer support
Text Segmentation
TextSegmentSplitters
TextSegmentSplitters.Markdown
: Predefined break strings optimized for Markdown content- Includes heading markers, paragraph breaks, list items, code blocks, line breaks, and sentence endings
- Ordered by preference with more specific patterns first
Stop Signals
ChunkStopSignals
ChunkStopSignals.Markdown
: Pre-configured signals for markdown (headers, code blocks, bold text)- Custom arrays can be provided for other document types
Models
KnowledgeFile
Id
: Guid - Unique identifier for the knowledge fileName
: string - Name of the knowledge fileHash
: byte[] - Content hash for integrity verification- Constructor:
(Guid id, string name, byte[] hash)
- Creates knowledge file with metadata
KnowledgeFileChunk
Id
: GuidSectionId
: GuidChunkIndex
: intContent
: stringEmbedding
: float[]Vector()
: Returns the embedding vectorContainsVector()
: Returns true if embedding is available
KnowledgeFileSection
Id
: GuidFileId
: GuidSectionIndex
: intSummary
: string?AdditionalContext
: string?Chunks
: List<KnowledgeFileChunk>Embedding
: float[]CreateFromChunks(List<KnowledgeFileChunk> chunks, Guid fileId, int sectionIndex)
: Creates section from chunksVector()
: Returns the embedding vectorContainsVector()
: Returns true if embedding is availableToString()
: Returns combined contentToString(string separator)
: Returns content with custom separator
Generation Services
IEmbeddingService
ModelName
: string - Name of the embedding modelDimensions
: int - Number of dimensions in the embedding vectorsEmbedAsync(string text, CancellationToken cancellationToken)
: Generate embeddings
ISummarizationService
SummarizeAsync(string text, CancellationToken cancellationToken)
: Generate summaries of text content
IQuestionGenerationService
GenerateQuestionsAsync(string text, CancellationToken cancellationToken)
: Generate synthetic questions from text content
IContextualizationService
ContextualizeAsync(string textSnippet, string surroundingContent, CancellationToken cancellationToken)
: Provide contextual information for text snippets
IOneShotService
ProcessAsync(string systemPrompt, string userInput, CancellationToken cancellationToken)
: Perform one-shot text processing tasks
Generation Service Implementations
OneShotServiceBase (Abstract Base Class)
OneShotService
: Protected property for underlying one-shot serviceProcessAsync(string systemPrompt, string userInput, CancellationToken cancellationToken)
: Protected method for processing- Provides consistent foundation for services built on top of IOneShotService
QuestionGenerationService : OneShotServiceBase, IQuestionGenerationService
- Constructor:
(IOneShotService oneShotService, string? systemPrompt = null)
GenerateQuestionsAsync(string text, CancellationToken cancellationToken)
: Generates 3-5 diverse questions- Uses default system prompt for factual, conceptual, and application-based questions
- Includes retry logic with ListParser for robust question extraction
SummarizationService : OneShotServiceBase, ISummarizationService
- Constructor:
(IOneShotService oneShotService, string? systemPrompt = null)
SummarizeAsync(string text, CancellationToken cancellationToken)
: Generates concise 2-3 sentence summaries- Uses default system prompt focused on document retrieval and search
ContextualizationService : OneShotServiceBase, IContextualizationService
- Constructor:
(IOneShotService oneShotService, string? systemPrompt = null)
ContextualizeAsync(string textSnippet, string surroundingContent, CancellationToken cancellationToken)
: Provides contextual information- Uses default system prompt for explaining snippet's role and significance
Generation Utilities
ListParser (Static Class)
ParseList(string text)
: Parses lists from text content- Handles numbered lists (1., 2., etc.), bullet points (-, *), and plain text lists
- Removes list markers and returns clean list of strings
- Returns null for invalid or empty lists
- Used by QuestionGenerationService for robust question extraction
Search Interfaces
ISearchableKnowledgeBase
CreateIndexer(IEmbeddingService? customEmbeddingService)
: Create an indexer for adding documentsSearchAsync(string query, int? maxSearchResultsCount, CancellationToken cancellationToken)
: Search for relevant content
IKnowledgeBaseSearchResult
WasSuccess
: bool - Whether the search was successfulCanBeRetried
: bool - Whether the search can be retriedShouldBeRetried
: bool - Whether the search should be retriedErrorMessage
: string? - Error message if search failedGetAsContextString()
: string - Returns formatted context string for LLM input
Search Implementations
SearchableKnowledgeStore : ISearchableKnowledgeStore
- Constructor:
(IFileStore fileStore, ISectionStore sectionStore, IChunkStore chunkStore, IKnowledgeVectorStore chunksVectorStore)
- Constructor:
(IKnowledgeStore knowledgeStore, IKnowledgeVectorStore chunksVectorStore)
Files
: IFileStore - Access to file storageSections
: ISectionStore - Access to section storageChunks
: IChunkStore - Access to chunk storageGetChunksVectorStore()
: IKnowledgeVectorStore - Vector store for chunk searches- Wraps basic knowledge store with vector search capabilities
KnowledgeBaseSearchResult : IKnowledgeBaseSearchResult
- Constructor:
(IReadOnlyList<RelevanceRatedEntry<KnowledgeFileSection>> relevantSections, string query, bool wasSuccess = true, bool canBeRetried = false, bool shouldBeRetried = false, string? errorMessage = null)
WasSuccess
: bool - Whether the search was successfulCanBeRetried
: bool - Whether the search can be retriedShouldBeRetried
: bool - Whether the search should be retriedErrorMessage
: string? - Error message if search failedRelevantSections
: IReadOnlyList<RelevanceRatedEntry<KnowledgeFileSection>> - Relevant sections with metricsQuery
: string - Original search queryCreateError(string query, string errorMessage, bool canBeRetried = false, bool shouldBeRetried = false)
: Static method to create error resultsGetAsContextString()
: string - Formats results as context string for LLM input
KnowledgeVector : IKnowledgeVector
- Constructor:
(Guid id, float[] vector)
Id
: Guid - Unique identifier for the knowledge vectorVector()
: float[] - Returns the vector representationContainsVector()
: bool - Returns true if vector is available- Concrete implementation of IKnowledgeVector with vector data
Search Factories
IKnowledgeBaseFactory<T>
CreateKnowledgebaseAsync()
: Task<T> - Creates and returns a knowledge base of type T- Generic factory interface for creating different types of knowledge bases
Indexing Interfaces
IIndexer
ConsumeAsync(IFileSource fileSource)
: Index documents from a file source. Returns true if content was indexed, false if the file was already up to date.
IFileSource
FileId
: Guid - Gets the unique identifier for this fileFileName
: string - Gets the name of the fileCreateReadStreamAsync()
: Task<Stream> - Creates a new read stream for the file content
IFileSourceProvider
GetFileSourcesAsync(CancellationToken cancellationToken)
: Get available file sources
Indexing Implementations
KnowledgeBaseIndexer : IIndexer
- Constructor:
(ISearchableKnowledgeStore searchableKnowledgeStore, IEmbeddingService embeddingService, ITokenizer tokenizer, int maxTokensPerChunk = 100, int maxTokensPerSection = 1000)
ConsumeAsync(IFileSource fileSource)
: Indexes file content into knowledge base. Returns true if content was indexed, false if the file was already up to date.- Features:
- Content hash verification for duplicate detection
- Automatic cleanup of old content when file changes
- Chunk and section creation with embeddings
- Vector store integration for similarity search
- Markdown-optimized processing using SectionReaderFactory
Confidence Rating Utilities
ConfidenceMath (Static Class)
- Vector Operations:
DotProduct(float[] a, float[] b)
: Calculate dot product of two vectorsVectorNorm(float[] v)
: Calculate L2 norm (magnitude) of a vectorNormalizeVector(float[] v)
: Normalize vector to unit length (L2 normalization)NormalizeVectorInPlace(float[] v)
: Normalize vector in-placeCosineSimilarity(float[] a, float[] b)
: Calculate cosine similarity between vectors (-1 to 1)CosineSimilarityPreNormalized(float[] a, float[] b)
: Fast cosine similarity for pre-normalized vectorsUpdateCentroidInPlace(float[] centroid, float[] nextVector, int countBefore)
: Update running average centroid
- Statistics:
CalculateMean(double[] values)
: Calculate arithmetic meanCalculateStandardDeviation(double[] values, bool sample = false)
: Calculate standard deviationMinMaxNormalization(double[] values, double min, double max)
: Normalize values to 0-100 range
- Utilities:
RoundToInt(double value)
: Round to nearest integerClamp(double value, double min, double max)
: Clamp value to rangeClamp(float value, float min, float max)
: Clamp float value to range
WithSimilarity<T> (where T : IVectorObject)
Item
: The wrapped item of type TSimilarity
: The similarity score (double)CreateBetween(T theItem, float[] vectorA, float[] vectorB)
: Create from two vectorsCreateBetween(IVectorObject obj, float[] vector)
: Create from vector object and vectorCreateBetween(IVectorObject objA, IVectorObject objB)
: Create from two vector objectsCreateList(IEnumerable<T> items, float[] vector, bool onlyIncludeItemsWithValidVectors = true)
: Create list for collection- Wraps any IVectorObject with a similarity score for ranking and filtering
RelevanceRatedEntry<T>
Item
: The item being rated for relevance (type T)Relevance
: KnowledgebaseRelevanceMetrics for the item- Represents a search result entry with comprehensive relevance scoring
KnowledgebaseRelevanceMetrics
CosineSimilarity
: Raw cosine similarity value (e.g., 0.82)RelevanceScore
: Relevance score as integer (e.g., 82 for 0.82 similarity)NormalizedScore
: Normalized score on 0-100 scale based on result setStandardDeviation
: Standard deviation of top-k similarity scores- Provides comprehensive metrics for search result relevance analysis
Utility Classes
StreamHashHelper (Static Class)
GenerateSha256Hash(Stream stream)
: Generates SHA256 hash from stream content for file integrity verification- Used for creating content hashes when storing knowledge files
- Note: Stream position will be at the end after hashing
Interfaces
ITokenizer
CountTokens(string text)
: Count tokens in text
IVectorObject
Vector()
: Returns the vector representationContainsVector()
: Returns true if vector is available
Configuration Tips
For Technical Documentation
SectioningConfiguration config = new SectioningConfiguration(
maxTokensPerSection: 800,
standardDeviationMultiplier: 0.8, // More aggressive splitting
minimumTokensPerSection: 100,
chunkStopSignals: ChunkStopSignals.Markdown);
For Narrative Content
SectioningConfiguration config = new SectioningConfiguration(
maxTokensPerSection: 1200,
standardDeviationMultiplier: 1.2, // More lenient splitting
minimumTokensPerSection: 75);
For Large Books
SectioningConfiguration config = new SectioningConfiguration(
maxTokensPerSection: 1500,
lookaheadBufferSize: 300, // Larger lookahead for better statistics
tokenStrictnessThreshold: 0.65); // Earlier progressive strictness
Storage System
The EasyReasy.KnowledgeBase library provides a comprehensive storage abstraction for managing knowledge files, chunks, sections, and vector embeddings. The storage system is designed with a clean separation of concerns, allowing you to implement different storage backends while maintaining a consistent API.
Storage Architecture
The storage system follows a layered architecture with clear interfaces:
IKnowledgeStore (Main Interface)
├── IFileStore (File Management)
├── IChunkStore (Chunk Storage)
├── ISectionStore (Section Storage)
└── IKnowledgeVectorStore (Vector Embeddings)
Quick Start with Storage
The library provides several ready-to-use storage implementations:
Using SQLite Storage (Recommended for most use cases)
// Set up services using provided implementations
BertTokenizer tokenizer = await BertTokenizer.CreateAsync();
EasyReasyOllamaEmbeddingService embeddingService = await EasyReasyOllamaEmbeddingService.CreateAsync(
baseUrl: "https://your-ollama-server.com",
apiKey: "your-api-key",
modelName: "nomic-embed-text");
// Create SQLite-based storage (requires EasyReasy.KnowledgeBase.Storage.Sqlite package)
SqliteKnowledgeStore knowledgeStore = await SqliteKnowledgeStore.CreateAsync("knowledge.db");
// Create vector storage using EasyReasy.VectorStorage
CosineVectorStore cosineVectorStore = new CosineVectorStore(embeddingService.Dimensions);
EasyReasyVectorStore vectorStore = new EasyReasyVectorStore(cosineVectorStore);
// Store a knowledge file
KnowledgeFile file = new KnowledgeFile(Guid.NewGuid(), "document.md", contentHash);
Guid fileId = await knowledgeStore.Files.AddAsync(file);
// Store chunks with embeddings
foreach (KnowledgeFileChunk chunk in chunks)
{
await knowledgeStore.Chunks.AddAsync(chunk);
if (chunk.Embedding != null)
{
await vectorStore.AddAsync(chunk.Id, chunk.Embedding);
}
}
// Store sections
foreach (KnowledgeFileSection section in sections)
{
await knowledgeStore.Sections.AddAsync(section);
if (section.Embedding != null)
{
await vectorStore.AddAsync(section.Id, section.Embedding);
}
}
Using Custom Storage Implementations
// You can also create your own implementations of the storage interfaces
IFileStore fileStore = new YourCustomFileStore();
IChunkStore chunkStore = new YourCustomChunkStore();
ISectionStore sectionStore = new YourCustomSectionStore();
IKnowledgeVectorStore vectorStore = new YourCustomVectorStore();
// Create the main knowledge store
KnowledgeStore knowledgeStore = new KnowledgeStore(fileStore, chunkStore, sectionStore);
Storage Interfaces
IKnowledgeStore
The main interface that provides access to all storage components:
public interface IKnowledgeStore
{
IFileStore Files { get; }
IChunkStore Chunks { get; }
ISectionStore Sections { get; }
}
IFileStore
Manages knowledge file metadata:
// Add a new knowledge file
Guid fileId = await fileStore.AddAsync(new KnowledgeFile(id, name, hash));
// Retrieve a file
KnowledgeFile? file = await fileStore.GetAsync(fileId);
// Check if file exists
bool exists = await fileStore.ExistsAsync(fileId);
// Get all files
IEnumerable<KnowledgeFile> allFiles = await fileStore.GetAllAsync();
// Update file metadata
await fileStore.UpdateAsync(updatedFile);
// Delete a file
bool deleted = await fileStore.DeleteAsync(fileId);
IChunkStore
Manages individual content chunks:
// Add a chunk
await chunkStore.AddAsync(new KnowledgeFileChunk(id, sectionId, index, content, embedding));
// Get chunk by ID
KnowledgeFileChunk? chunk = await chunkStore.GetAsync(chunkId);
// Get multiple chunks by IDs
IEnumerable<KnowledgeFileChunk> chunks = await chunkStore.GetAsync(chunkIds);
// Get chunk by index within section
KnowledgeFileChunk? chunk = await chunkStore.GetByIndexAsync(sectionId, chunkIndex);
// Get all chunks for a section
IEnumerable<KnowledgeFileChunk> sectionChunks = await chunkStore.GetBySectionAsync(sectionId);
// Delete all chunks for a file
bool deleted = await chunkStore.DeleteByFileAsync(fileId);
ISectionStore
Manages sections containing multiple chunks:
// Add a section
await sectionStore.AddAsync(new KnowledgeFileSection(id, fileId, index, chunks, summary, embedding));
// Get section by ID
KnowledgeFileSection? section = await sectionStore.GetAsync(sectionId);
// Get section by index within file
KnowledgeFileSection? section = await sectionStore.GetByIndexAsync(fileId, sectionIndex);
// Delete all sections for a file
bool deleted = await sectionStore.DeleteByFileAsync(fileId);
IKnowledgeVectorStore
Manages vector embeddings for similarity search:
// Add a vector
await vectorStore.AddAsync(entityId, embedding);
// Remove a vector
await vectorStore.RemoveAsync(entityId);
// Search for similar vectors
IEnumerable<IKnowledgeVector> similarVectors = await vectorStore.SearchAsync(queryVector, maxResults);
IExplicitPersistence
Defines explicit persistence operations for storage components that need manual control over data loading and saving:
// Load data from persistent storage during startup
await storageComponent.LoadAsync(cancellationToken);
// Save data to persistent storage during shutdown
await storageComponent.SaveAsync(cancellationToken);
Storage Implementations
KnowledgeStore : IKnowledgeStore
- Constructor:
(IFileStore files, IChunkStore chunks, ISectionStore sections)
Files
: IFileStore - Access to file storageChunks
: IChunkStore - Access to chunk storageSections
: ISectionStore - Access to section storage- Provides unified interface for managing knowledge files, chunks, and sections
- Sealed class for consistent storage access patterns
SqliteKnowledgeStore : IKnowledgeStore, IExplicitPersistence
- Constructor:
(string connectionString)
- Creates with SQLite connection string CreateAsync(string path, CancellationToken cancellationToken)
: Static factory method for easy setupLoadAsync(CancellationToken cancellationToken)
: Loads and initializes database schemaSaveAsync(CancellationToken cancellationToken)
: Saves data (no-op for SQLite as it's transactional)- Complete SQLite-based implementation with automatic schema creation
- Available in
EasyReasy.KnowledgeBase.Storage.Sqlite
package
EasyReasyVectorStore : IKnowledgeVectorStore
- Constructor:
(IVectorStore vectorStore)
- Wraps any IVectorStore implementation - Integrates with EasyReasy.VectorStorage for high-performance vector operations
- Available in
EasyReasy.KnowledgeBase.Storage.IntegratedVectorStore
package
Storage Models
KnowledgeFile
Represents a knowledge file with metadata:
public class KnowledgeFile
{
public Guid Id { get; set; }
public string Name { get; set; }
public byte[] Hash { get; set; } // Content hash for integrity
}
KnowledgeFileChunk
Represents a chunk of content with optional embedding:
public class KnowledgeFileChunk : IVectorObject
{
public Guid Id { get; set; }
public Guid SectionId { get; set; }
public int ChunkIndex { get; set; }
public string Content { get; set; }
public float[]? Embedding { get; set; }
public float[] Vector() => Embedding ?? Array.Empty<float>();
public bool ContainsVector() => Embedding != null;
}
KnowledgeFileSection
Represents a section containing multiple chunks:
public class KnowledgeFileSection : IVectorObject
{
public Guid Id { get; set; }
public Guid FileId { get; set; }
public int SectionIndex { get; set; }
public string? Summary { get; set; }
public string? AdditionalContext { get; set; }
public List<KnowledgeFileChunk> Chunks { get; set; }
public float[]? Embedding { get; set; }
public static KnowledgeFileSection CreateFromChunks(List<KnowledgeFileChunk> chunks, Guid fileId, int sectionIndex);
public float[] Vector() => Embedding ?? Array.Empty<float>();
public bool ContainsVector() => Embedding != null;
public override string ToString() => Combined content of all chunks;
}
Storage Implementation Patterns
File-Based Storage
public class FileBasedFileStore : IFileStore
{
private readonly string _basePath;
public async Task<Guid> AddAsync(KnowledgeFile file)
{
string filePath = Path.Combine(_basePath, $"{file.Id}.json");
await File.WriteAllTextAsync(filePath, JsonSerializer.Serialize(file));
return file.Id;
}
// Implement other methods...
}
Database Storage
public class DatabaseChunkStore : IChunkStore
{
private readonly IDbConnection _connection;
public async Task AddAsync(KnowledgeFileChunk chunk)
{
const string sql = "INSERT INTO chunks (id, section_id, chunk_index, content, embedding) VALUES (@Id, @SectionId, @ChunkIndex, @Content, @Embedding)";
await _connection.ExecuteAsync(sql, chunk);
}
// Implement other methods...
}
Vector Store Implementation
For vector store implementations, see EasyReasy.KnowledgeBase.Storage.IntegratedVectorStore which provides a complete vector storage solution.
Dependencies
- .NET 8.0+: Modern async/await patterns and performance features
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- No dependencies.
NuGet packages (4)
Showing the top 4 NuGet packages that depend on EasyReasy.KnowledgeBase:
Package | Downloads |
---|---|
EasyReasy.KnowledgeBase.BertTokenization
A BERT tokenization extension for EasyReasy KnowledgeBase with FastBertTokenizer integration |
|
EasyReasy.KnowledgeBase.Storage.Sqlite
SQLite storage provider for EasyReasy KnowledgeBase with persistent data storage |
|
EasyReasy.KnowledgeBase.OllamaGeneration
Ollama integration for EasyReasy KnowledgeBase with embedding and generation services |
|
EasyReasy.KnowledgeBase.Storage.IntegratedVectorStore
Integrated vector store storage provider for EasyReasy KnowledgeBase |
GitHub repositories
This package is not used by any popular GitHub repositories.