Skip to main content

Embedding Stores

An Embedding Store is the specialized vector database that persists and queries document embeddings — the numerical vectors generated by the Embedding Model. It enables similarity search, finding the documents most semantically related to a user's query.

Turing ES supports three backends via Spring AI. The active backend is set globally in Administration → Settings and can be overridden per Semantic Navigation Site in its Generative AI tab.


Supported Backends

ChromaDB

A lightweight, open-source vector database ideal for development and small to medium deployments.

  • Self-hosted, connects via its HTTP API
  • Zero infrastructure overhead for teams already running Python tooling
  • No special schema setup required — Turing ES manages the collections automatically
  • Multi-tenant and multi-database support

Docker Compose quickstart:

services:
chroma:
image: chromadb/chroma:latest
ports:
- "8000:8000"
ConfigurationDefaultDescription
Base URLhttp://localhost:8000Chroma HTTP API endpoint
Collection NameturingTarget collection name
Tenant Namedefault_tenantChroma tenant identifier
Database Namedefault_databaseChroma database name
Key TokenBearer token for authentication
Basic Username / PasswordBasic auth credentials

Authentication: ChromaDB supports two methods — Bearer token and Basic auth. Configure either via the credential field or the provider options. When both are present, the credential field takes precedence.

PgVector

PostgreSQL with the pgvector extension — the best choice for deployments that already use PostgreSQL as their primary database.

  • Avoids an additional infrastructure dependency
  • Embeddings live in the same database as your application data
  • Supports standard PostgreSQL backup, replication, and access control
  • Connection pooling via HikariCP (max 5 connections per store)

Enable the required extensions in your PostgreSQL instance:

CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS hstore;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
ConfigurationDefaultDescription
JDBC URLPostgreSQL connection string (e.g., jdbc:postgresql://localhost:5432/turing)
UsernameDatabase user
PasswordDatabase password
Table Namevector_storeTable where embeddings are stored
Schema NamepublicPostgreSQL schema
DimensionsVector dimensionality — must match the embedding model
Distance TypeCOSINE_DISTANCE, EUCLIDEAN_DISTANCE, or INNER_PRODUCT
Index TypeHNSW or IVFFLAT

Table schema created by Turing ES:

CREATE TABLE IF NOT EXISTS "vector_store" (
id UUID DEFAULT uuid_generate_v4() PRIMARY KEY,
content TEXT,
metadata JSON,
embedding VECTOR
);

Milvus

A purpose-built, cloud-native vector database designed for high-scale similarity search.

  • Recommended for large corpora or high-throughput deployments
  • Supports distributed operation, horizontal scaling, and advanced index management
  • Managed cloud offering available (Zilliz Cloud)
ConfigurationDefaultDescription
Base URLhttp://localhost:19530Milvus service URI
Collection NameturingTarget collection
Database NameOptional database name
TokenAuthentication token (username:password format)
Embedding DimensionVector dimensionality — must match the embedding model
Metric TypeCOSINE, L2, or IP (inner product)
Index TypeHNSW, IVF_FLAT, IVF_SQ8, or DISKANN
Index ParametersJSON string with index-specific params (e.g., {"M":16,"efConstruction":200})

Store Comparison

FeatureChromaDBPgVectorMilvus
Best forDev / small-mediumPostgreSQL shopsLarge-scale production
InfrastructureStandalone containerPostgreSQL extensionDedicated cluster
ScalingSingle nodePostgreSQL replicationHorizontal / distributed
Index typesAutomaticHNSW, IVFFLATHNSW, IVF_FLAT, IVF_SQ8, DISKANN
Distance metricsCosine, L2Cosine, L2, Inner ProductCosine, L2, Inner Product
Multi-tenantYes (tenant + database)Via schemaYes (database + collection)
AuthenticationToken or Basic AuthJDBC credentialsToken
Managed cloudAny managed PostgreSQLZilliz Cloud

Create / Edit Form

Navigate to Generative AI → Embedding Stores to manage store instances.

General Information

FieldRequiredDescription
TitleYesDisplay name for this store instance
DescriptionYesBrief description of its purpose
VendorYesSelect the backend: ChromaDB, PgVector, or Milvus. Selecting a vendor applies default values for Endpoint URL and Collection Name.

Connection

FieldRequiredDescription
Endpoint URLYesBase URL for the store backend
Collection NameName of the collection or table (vendor-specific default applied)
CredentialAuthentication token or password — stored encrypted. Leave blank when editing to keep the existing value.

Provider Options

Each vendor exposes additional configuration fields in the Provider Options section. These fields appear dynamically when a vendor is selected. A raw JSON editor is also available for advanced configurations.

Status

FieldDescription
EnabledToggle to activate or deactivate this store. Disabled stores are not available for selection.

Vendor defaults applied on selection:

VendorDefault URLDefault Collection
ChromaDBhttp://localhost:8000turing
PgVectorjdbc:postgresql://localhost:5432/turingvector_store
Milvushttp://localhost:19530turing

Collection Management

Each store instance provides a Collections page where you can view and manage vector collections.

The collections table shows:

ColumnDescription
Collection NameName of the collection or table
IDInternal identifier
Document CountNumber of distinct source documents embedded
Chunk CountTotal number of embedding chunks stored

Available actions per collection:

ActionDescription
CreateCreate a new empty collection in the store
ClearRemove all embeddings from the collection while keeping the collection structure
DeleteRemove the entire collection and all its data
Clearing or deleting collections

Clearing or deleting a collection permanently removes all stored embeddings. A full re-indexing of all content is required to rebuild the vectors. This action cannot be undone.


System Information

The System Info endpoint returns backend-specific metadata for monitoring and diagnostics:

BackendInformation Returned
ChromaDBChromaDB version
PgVectorPostgreSQL version, pgvector extension version
MilvusMilvus server version

REST API

Store instances are managed via the REST API at /api/store.

Store Instance Endpoints

MethodEndpointDescription
GET/api/storeList all store instances (ordered by title)
GET/api/store/structureGet the structure template for a new instance
GET/api/store/{id}Get a specific store instance
POST/api/storeCreate a new store instance
PUT/api/store/{id}Update an existing store instance
DELETE/api/store/{id}Delete a store instance

Collection Endpoints

MethodEndpointDescription
GET/api/store/{id}/collectionsList all collections in a store
POST/api/store/{id}/collections/{name}Create a new collection
DELETE/api/store/{id}/collections/{name}Delete a collection
DELETE/api/store/{id}/collections/{name}/clearClear all embeddings from a collection

System Info Endpoint

MethodEndpointDescription
GET/api/store/{id}/system-infoGet store backend version and status

Store Vendor Endpoints

MethodEndpointDescription
GET/api/store/vendorList all available vendors
GET/api/store/vendor/{id}Get a specific vendor

Global Configuration

Set the default embedding store in Administration → Settings:

SettingDescription
Default Embedding StoreWhich vector database backend to use (ChromaDB, PgVector, or Milvus)

Individual Semantic Navigation Sites can override this setting in their Generative AI tab. The Knowledge Base always uses the global default.


Security

Credentials are handled with care at every layer:

  • Stored encrypted — credentials are encrypted via TurSecretCryptoService before being persisted in the credentialEncrypted column
  • Never returned — the credential field is transient and write-only. It flows in on save but never comes back in API responses
  • Edit safely — leaving the credential field blank when editing preserves the existing encrypted value
  • Per-vendor auth — ChromaDB supports Bearer token or Basic Auth; Milvus uses token-based auth; PgVector uses JDBC credentials with connection pooling

Caching

Store instance and vendor data is cached at the repository layer to avoid repeated database reads:

  • turStoreInstancefindAll — caches the full list of instances
  • turStoreInstancefindById — caches individual instance lookups
  • turStoreVendorfindAll / turStoreVendorfindById — caches vendor metadata

Cache entries are invalidated automatically on create, update, or delete.


PageDescription
Embedding ModelsConfigure the models that generate vectors stored here
What is RAG?How embedding stores fit into the RAG pipeline
GenAI & LLM ConfigurationRAG architecture, RAG sources, and system overview
LLM InstancesConfigure the LLM providers that supply embedding APIs
AssetsKnowledge Base files that are indexed into the Embedding Store
Semantic NavigationGenerative AI tab: per-site embedding overrides