Mythosia.VectorDb.Postgres 10.5.0

There is a newer version of this package available.
See the version list below for details.
dotnet add package Mythosia.VectorDb.Postgres --version 10.5.0
                    
NuGet\Install-Package Mythosia.VectorDb.Postgres -Version 10.5.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Mythosia.VectorDb.Postgres" Version="10.5.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Mythosia.VectorDb.Postgres" Version="10.5.0" />
                    
Directory.Packages.props
<PackageReference Include="Mythosia.VectorDb.Postgres" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Mythosia.VectorDb.Postgres --version 10.5.0
                    
#r "nuget: Mythosia.VectorDb.Postgres, 10.5.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package Mythosia.VectorDb.Postgres@10.5.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=Mythosia.VectorDb.Postgres&version=10.5.0
                    
Install as a Cake Addin
#tool nuget:?package=Mythosia.VectorDb.Postgres&version=10.5.0
                    
Install as a Cake Tool

Mythosia.VectorDb.Postgres

PostgreSQL (pgvector) implementation of IVectorStore.
Single-table design with namespace column for logical isolation.

Migration

Run the following migration before deploying if your Postgres table was created by any earlier version of Mythosia.VectorDb.Postgres.

This normalizes the table to the current schema and is safe to run on already-updated tables.

If you configured custom names via PostgresOptions.SchemaName or PostgresOptions.TableName, replace every occurrence of "public"."vectors" in the SQL below to match your actual schema and table name.

-- Legacy rename step for older schemas that still use `collection`
DO $$
BEGIN
    IF EXISTS (
        SELECT 1
        FROM information_schema.columns
        WHERE table_schema = 'public'
          AND table_name = 'vectors'
          AND column_name = 'collection'
    ) AND EXISTS (
        SELECT 1
        FROM information_schema.columns
        WHERE table_schema = 'public'
          AND table_name = 'vectors'
          AND column_name = 'namespace'
    ) THEN
        EXECUTE 'ALTER TABLE "public"."vectors" RENAME COLUMN namespace TO scope';
        EXECUTE 'ALTER TABLE "public"."vectors" RENAME COLUMN collection TO namespace';
    END IF;
END $$;

-- Recreate index/primary key if the legacy schema was renamed above
DROP INDEX IF EXISTS idx_vectors_collection_ns;
CREATE INDEX IF NOT EXISTS idx_vectors_ns_scope ON "public"."vectors" (namespace, scope);

DO $$
BEGIN
    IF EXISTS (
        SELECT 1
        FROM pg_constraint c
        JOIN pg_class t ON t.oid = c.conrelid
        JOIN pg_namespace n ON n.oid = t.relnamespace
        WHERE n.nspname = 'public'
          AND t.relname = 'vectors'
          AND c.conname = 'vectors_pkey'
    ) THEN
        EXECUTE 'ALTER TABLE "public"."vectors" DROP CONSTRAINT vectors_pkey';
    END IF;
EXCEPTION
    WHEN undefined_table THEN NULL;
END $$;

ALTER TABLE "public"."vectors"
    ADD PRIMARY KEY (namespace, id);

-- Current v10.2.0 hybrid-search schema
ALTER TABLE "public"."vectors"
    ADD COLUMN IF NOT EXISTS content_tsv tsvector;

UPDATE "public"."vectors"
SET content_tsv = to_tsvector('simple',
    regexp_replace(
        regexp_replace(coalesce(content, ''),
            '([a-zA-Z0-9])([^\u0001-\u007F\s])', E'\\1 \\2', 'g'),
        '([^\u0001-\u007F\s])([a-zA-Z0-9])', E'\\1 \\2', 'g'));

ALTER TABLE "public"."vectors"
    ALTER COLUMN content_tsv SET NOT NULL;

COMMENT ON COLUMN "public"."vectors".content IS
    'Nullable to support customer policies that prohibit storing original text while still allowing hybrid search via content_tsv.';

CREATE INDEX IF NOT EXISTS idx_vectors_content_tsv
    ON "public"."vectors" USING gin (content_tsv);

Notes:

  • The collectionnamespace rename block only affects old legacy schemas.
  • The content_tsv migration is the current breaking schema change because hybrid search now depends on a persisted content_tsv column instead of recalculating to_tsvector(content) at query time.
  • If you are creating a new deployment from scratch, you do not need this migration; the current schema below is sufficient.
  • The sample SQL assumes the default SchemaName = "public" and TableName = "vectors".

Prerequisites

  • PostgreSQL 12+
  • pgvector extension installed:
CREATE EXTENSION IF NOT EXISTS vector;
  • (Optional, for TextSearchMode.Trigram) pg_trgm extension:
CREATE EXTENSION IF NOT EXISTS pg_trgm;

pg_trgm is a standard PostgreSQL contrib module, available on most managed services (Azure, Supabase, RDS, etc.). When EnsureSchema = true and TextSearchMode = Trigram, the extension is created automatically.

Quick Start

using Mythosia.VectorDb;
using Mythosia.VectorDb.Postgres;

var store = new PostgresStore(new PostgresOptions
{
    ConnectionString = "Host=localhost;Database=mydb;Username=postgres;Password=secret",
    Dimension = 1536,
    EnsureSchema = true,  // auto-creates table + indexes
    Index = new HnswIndexOptions { M = 16, EfConstruction = 64, EfSearch = 40 }
});

// Fluent API (recommended)
var ns = store.InNamespace("my-namespace");
await ns.UpsertAsync(record);
var results = await ns.SearchAsync(queryVector, topK: 5);

// With scope
var scoped = ns.InScope("tenant-1");
await scoped.UpsertAsync(record);   // record.Scope set automatically
var scopedResults = await scoped.SearchAsync(queryVector);

ERD

erDiagram
    vectors {
        text namespace PK "NOT NULL — logical namespace"
        text id PK "NOT NULL — unique record ID within namespace"
        text scope "NULL — optional sub-namespace isolation"
        text content "NULL — original text content, optional when only content_tsv may be stored"
        tsvector content_tsv "NOT NULL — persisted full-text search vector for hybrid search"
        jsonb metadata "NOT NULL DEFAULT '{}' — arbitrary key-value pairs"
        vector embedding "NOT NULL — vector(dimension) for similarity search"
        timestamptz created_at "NOT NULL DEFAULT now()"
        timestamptz updated_at "NOT NULL DEFAULT now()"
    }

Single-table design: All namespaces share one table. The composite primary key (namespace, id) ensures uniqueness per namespace.

Indexes

Index Type Target Purpose
PK btree (namespace, id) Primary key / upsert conflict
idx_*_embedding hnsw / ivfflat embedding vector_*_ops ANN similarity search (distance strategy dependent)
idx_*_metadata gin metadata jsonb containment filter (@>)
idx_*_content_tsv gin content_tsv Full-text search index for hybrid lexical retrieval (TsVector mode)
idx_*_content_trgm gin content gin_trgm_ops Trigram similarity index for hybrid search (Trigram mode, auto-created when configured)
idx_*_ns_scope btree (namespace, scope) Scope-scoped queries

Schema

When EnsureSchema = true, the following is created automatically:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE IF NOT EXISTS "public"."vectors" (
    namespace   text        NOT NULL,
    id          text        NOT NULL,
    scope       text        NULL,
    content     text        NULL,
    content_tsv tsvector    NOT NULL,
    metadata    jsonb       NOT NULL DEFAULT '{}'::jsonb,
    embedding   vector(1536) NOT NULL,
    created_at  timestamptz NOT NULL DEFAULT now(),
    updated_at  timestamptz NOT NULL DEFAULT now(),
    PRIMARY KEY (namespace, id)
);

ALTER TABLE "public"."vectors"
    ADD COLUMN IF NOT EXISTS content_tsv tsvector;

UPDATE "public"."vectors"
SET content_tsv = to_tsvector('simple',
    regexp_replace(
        regexp_replace(coalesce(content, ''),
            '([a-zA-Z0-9])([^\u0001-\u007F\s])', E'\\1 \\2', 'g'),
        '([^\u0001-\u007F\s])([a-zA-Z0-9])', E'\\1 \\2', 'g'));

ALTER TABLE "public"."vectors"
    ALTER COLUMN content_tsv SET NOT NULL;

COMMENT ON COLUMN "public"."vectors".content IS
    'Nullable to support customer policies that prohibit storing original text while still allowing hybrid search via content_tsv.';

-- Indexes
CREATE INDEX IF NOT EXISTS idx_vectors_metadata
    ON "public"."vectors" USING gin (metadata);

CREATE INDEX IF NOT EXISTS idx_vectors_content_tsv
    ON "public"."vectors" USING gin (content_tsv);

CREATE INDEX IF NOT EXISTS idx_vectors_ns_scope
    ON "public"."vectors" (namespace, scope);

-- vector index (default: HNSW)
CREATE INDEX IF NOT EXISTS idx_vectors_embedding
    ON "public"."vectors" USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);

Notes:

  • The vector index SQL changes by Index type (HnswIndexOptions / IvfFlatIndexOptions / NoIndexOptions).
  • The operator class changes by DistanceStrategy:
    • Cosinevector_cosine_ops
    • Euclideanvector_l2_ops
    • InnerProductvector_ip_ops

When EnsureSchema = false (recommended for production), the table must already exist.
An InvalidOperationException is thrown with a clear message if the table is missing.

Manual Schema Setup (Production)

For production deployments, create the schema manually before starting the application:

-- 1. Enable pgvector
CREATE EXTENSION IF NOT EXISTS vector;

-- 2. Create table (adjust dimension as needed)
CREATE TABLE public.vectors (
    namespace   text        NOT NULL,
    id          text        NOT NULL,
    scope       text        NULL,
    content     text        NULL,
    content_tsv tsvector    NOT NULL,
    metadata    jsonb       NOT NULL DEFAULT '{}'::jsonb,
    embedding   vector(1536) NOT NULL,
    created_at  timestamptz NOT NULL DEFAULT now(),
    updated_at  timestamptz NOT NULL DEFAULT now(),
    PRIMARY KEY (namespace, id)
);

COMMENT ON COLUMN public.vectors.content IS
    'Nullable to support customer policies that prohibit storing original text while still allowing hybrid search via content_tsv.';

-- 3. Indexes
CREATE INDEX idx_vectors_metadata
    ON public.vectors USING gin (metadata);

CREATE INDEX idx_vectors_content_tsv
    ON public.vectors USING gin (content_tsv);

CREATE INDEX idx_vectors_ns_scope
    ON public.vectors (namespace, scope);

-- 4-A. Option A (recommended default): HNSW
CREATE INDEX idx_vectors_embedding
    ON public.vectors USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);

-- 4-B. Option B: IVFFlat (create after loading data)
--      ivfflat requires rows to exist for training
-- CREATE INDEX idx_vectors_embedding
--     ON public.vectors USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

-- 5. Analyze for query planner (recommended)
ANALYZE public.vectors;

-- 6. (Optional, for TextSearchMode.Trigram) Trigram index
-- CREATE EXTENSION IF NOT EXISTS pg_trgm;
-- CREATE INDEX idx_vectors_content_trgm
--     ON public.vectors USING gin (content gin_trgm_ops);

Options

Option Default Description
ConnectionString (required) PostgreSQL connection string
Dimension (required) Embedding vector dimension (e.g., 1536 for OpenAI)
SchemaName "public" Database schema
TableName "vectors" Table name
EnsureSchema false Auto-create extension/table/indexes
DistanceStrategy Cosine Similarity metric (Cosine, Euclidean, InnerProduct)
Index new HnswIndexOptions() Vector index settings object (HnswIndexOptions, IvfFlatIndexOptions, NoIndexOptions)
HnswIndexOptions.M 16 HNSW build param (m), typical range 8-64
HnswIndexOptions.EfConstruction 64 HNSW build param (ef_construction), typical range 32-400
HnswIndexOptions.EfSearch 40 HNSW runtime ef_search default
IvfFlatIndexOptions.Lists 100 Number of IVF lists for the ivfflat index
IvfFlatIndexOptions.Probes 10 IVFFlat runtime probes default
FailFastOnIndexCreationFailure true Throw when vector index creation fails (recommended for production)
TextSearchMode TsVector Text search strategy for hybrid search (TsVector or Trigram)
TextSearchConfig "simple" PostgreSQL text search configuration for to_tsvector / to_tsquery (only used in TsVector mode)

Runtime Tuning Guide (DX)

  • IvfFlatSearchRuntimeOptions.Probes: increase for better recall, decrease for lower latency.
  • HnswSearchRuntimeOptions.EfSearch: increase for better recall, decrease for lower latency.
  • IvfFlatIndexOptions.Lists: start around sqrt(total_rows) and tune from there.

Use runtime options matching your index settings:

  • Index = new HnswIndexOptions(...)HnswSearchRuntimeOptions
  • Index = new IvfFlatIndexOptions(...)IvfFlatSearchRuntimeOptions

Recommended starting points:

Goal IvfFlatSearchRuntimeOptions.Probes HnswSearchRuntimeOptions.EfSearch
Fast 4 16
Balanced 10 40
HighRecall 32 120

These are practical ranges, not strict hard limits. Final values should be chosen from production latency/recall measurements.

Namespace & Filter Behavior

  • Namespaces are stored as a namespace column in a single shared table (not separate tables).
  • There is no explicit namespace-create API. Namespaces are implicitly created on first upsert.
  • Delete all rows in a namespace via store.InNamespace("your-ns").DeleteAllAsync().
  • Scope filter: WHERE scope = @scope
  • Metadata filter: WHERE metadata @> @jsonb (jsonb containment, AND logic)
  • MinScore filter (distance-strategy dependent):
    • Cosine: 1 - (embedding <=> @q::vector) >= @minScore
    • Euclidean: 1 / (1 + (embedding <-> @q::vector)) >= @minScore
    • InnerProduct: -(embedding <#> @q::vector) >= @minScore

PostgresStore supports native IVectorStore.HybridSearchAsync for hybrid search. When called via UseHybridSearch(), it runs parallel queries — text search and pgvector similarity search — then merges results via Reciprocal Rank Fusion (RRF).

TextSearchMode.TsVector (default)

Uses PostgreSQL tsvector / tsquery full-text search with a persisted content_tsv column plus GIN index. Works well for European languages with good built-in text search configurations.

var store = new PostgresStore(new PostgresOptions
{
    ConnectionString = connString,
    Dimension = 1536,
    EnsureSchema = true,
    // TextSearchMode = TextSearchMode.TsVector  (default)
});

TextSearchMode.Trigram

Uses pg_trgm extension with word_similarity matching. Better for CJK languages (Korean, Japanese, Chinese) and agglutinative languages where PostgreSQL lacks built-in morphological analysis.

var store = new PostgresStore(new PostgresOptions
{
    ConnectionString = connString,
    Dimension = 1536,
    EnsureSchema = true,
    TextSearchMode = TextSearchMode.Trigram  // pg_trgm word_similarity
});

Why Trigram for Korean/CJK?
PostgreSQL’s simple text search config tokenizes by whitespace only. Korean particles (조사/어미) attach to words, so "opm에""opm은" — no match. Trigram splits text into 3-character grams and uses substring similarity, bypassing morphological analysis entirely.

content may remain nullable for deployments where original text storage is prohibited, but content_tsv is required for lexical retrieval in TsVector mode.

Batch Get & Count

// Fetch multiple records by ID — single query using WHERE id = ANY(@ids)
var records = await store.InNamespace("docs").GetBatchAsync(new[] { "id-1", "id-2", "id-3" });

// Count all records in a namespace
long count = await store.InNamespace("docs").CountAsync();

// Count with additional metadata filter (jsonb @> containment)
long filtered = await store.InNamespace("docs").CountAsync(
    VectorFilter.ByMetadata("storage_id", storageId));

// Count across all namespaces
long total = await store.CountAsync();

GetBatchAsync uses a single WHERE id = ANY(@ids) query with Npgsql array binding. Applies the full filter (namespace, scope, metadata) in the same WHERE clause. CountAsync uses SELECT COUNT(*) with optional clauses for namespace, scope, and jsonb containment.

Atomic Vector Replacement

ReplaceByFilterAsync wraps DELETE + INSERT in a single PostgreSQL transaction, eliminating the query gap that occurs during re-embedding:

// Delete all vectors matching the filter, then insert new ones — atomically
var filter = VectorFilter.ByMetadata("full_path", "/docs/policy.md");
filter.Namespace = "default";

await store.ReplaceByFilterAsync(filter, newRecords);

How it works:

BEGIN TRANSACTION
  DELETE FROM vectors WHERE metadata->>'full_path' = '/docs/policy.md'
  INSERT INTO vectors (...) VALUES (...), (...), ...
COMMIT

If the INSERT fails, the DELETE is rolled back and existing data remains intact. Queries always see either the old data or the new data, never an empty result.

RAG Integration

var store = await RagStore.BuildAsync(config => config
    .AddText("Your document text here", id: "doc-1")
    .UseLocalEmbedding(512)
    .UseVectorStore(new PostgresStore(new PostgresOptions
    {
        ConnectionString = Environment.GetEnvironmentVariable("MYTHOSIA_PG_CONN")!,
        Dimension = 512,
        EnsureSchema = true,
        Index = new HnswIndexOptions()
    }))
    .WithTopK(5)
);

Connection Verification

Call VerifyConnectionAsync to test TCP connectivity and authentication before running queries:

var store = new PostgresStore(new PostgresOptions
{
    ConnectionString = connString,
    Dimension = 1536
});

try
{
    await store.VerifyConnectionAsync();
    Console.WriteLine("Connected!");
}
catch (Exception ex)
{
    Console.WriteLine($"Connection failed: {ex.Message}");
}

Performance Tips

  • ivfflat lists: Rule of thumb — lists = sqrt(total_rows). Default 100 is good for up to ~10K rows.
  • Run ANALYZE vectors; after bulk inserts for optimal query plans.
  • For large datasets (1M+ rows), consider HNSW index (CREATE INDEX ... USING hnsw) instead of ivfflat.
  • Use connection pooling (e.g., Npgsql connection string Pooling=true;Maximum Pool Size=20).

EnsureSchema Guidance

  • EnsureSchema = true: Development, testing, local Docker — auto-provisions everything.
  • EnsureSchema = false: Production — schema managed by DBA/migration tools; fails fast with clear error if missing.
  • For ivfflat, index creation can fail on empty tables (PostgreSQL/pgvector behavior). In that case, use Hnsw or create ivfflat after loading data.
  • FailFastOnIndexCreationFailure = true (default): throws immediately if vector index creation fails.
  • FailFastOnIndexCreationFailure = false: startup continues even if vector index creation fails.
Product Compatible and additional computed target framework versions.
.NET net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
10.6.1 73 4/1/2026
10.6.0 87 3/30/2026
10.5.0 90 3/29/2026
10.4.0 88 3/28/2026
10.3.0 82 3/22/2026
10.2.1 123 3/15/2026
10.2.0 99 3/11/2026 10.2.0 is deprecated because it has critical bugs.
10.1.0 85 3/6/2026
10.0.0 90 3/5/2026

v10.5.0: Add GetBatchAsync (WHERE id = ANY(@ids) batch query) and CountAsync (SELECT COUNT(*) with namespace/scope/metadata filters). GetAsync and DeleteAsync now apply full filter conditions (scope, metadata) via BuildFilterWhere. Dependency update: Npgsql 10.0.2, System.Text.Json 10.0.5.