Embedding Models

An Embedding Model converts text into a high-dimensional numerical vector that captures its semantic meaning. These vectors enable similarity search — finding documents conceptually related to a query even when they share no common keywords.

Turing ES uses embedding models in two phases:

Indexing — document chunks are vectorized and stored in the Embedding Store
Querying — the user's question is vectorized with the same model and compared against stored vectors

Same model for indexing and querying

The embedding model must remain consistent across both phases. Changing the model after documents have been indexed causes dimension mismatches and incorrect similarity results. A full re-indexing of all content is required whenever the embedding model changes.

Supported Providers

Embedding model support depends on the LLM vendor configured in the LLM Instance. Not all vendors provide an embedding API.

Provider	Embedding Support	Example Models	Default Model
OpenAI	Yes	`text-embedding-3-small`, `text-embedding-3-large`, `text-embedding-ada-002`	`text-embedding-3-small`
Azure OpenAI	Yes	Deployment name of an embedding model in your Azure resource	`text-embedding-ada-002`
Ollama	Yes	`nomic-embed-text`, `mxbai-embed-large`, `all-minilm`, `stella-v5`	(configurable)
Anthropic	No	—	—
Gemini	No	—	—
Gemini (OpenAI-compatible)	No	—	—

OpenAI

Connects to the OpenAI API (default: https://api.openai.com) using your API key. OpenAI offers three embedding model families:

Model	Dimensions	Notes
`text-embedding-3-small`	1,536	Best cost-performance balance for most deployments
`text-embedding-3-large`	3,072	Higher quality, larger storage footprint
`text-embedding-ada-002`	1,536	Legacy model — use `3-small` for new deployments

Azure OpenAI

Uses the same OpenAI embedding models, hosted on your Azure tenant. Configuration requires:

Endpoint — your Azure OpenAI resource URL (e.g., https://my-resource.openai.azure.com)
Embedding Deployment Name — the deployment name created in the Azure portal
API Key — Azure API key (stored encrypted)

Ollama (Local)

Runs embedding models locally via Ollama. No API key required for local deployments — ideal for air-gapped environments or development.

Model	Dimensions	Notes
`nomic-embed-text`	768	Good general-purpose model, lightweight
`mxbai-embed-large`	1,024	Higher quality, more resource-intensive
`all-minilm`	384	Very lightweight, fast inference

Pull a model before using it:

ollama pull nomic-embed-text

Local Transformers (ONNX)

Turing ES also supports running embedding models locally via ONNX Runtime, without an external LLM provider. This is useful for deploying custom or fine-tuned models.

Setting	Description
Model Path	Absolute path to the `.onnx` model file
Tokenizer Path	Absolute path to `tokenizer.json`
Enable GPU	Toggle GPU acceleration via ONNX Runtime
Batch Size	Number of texts to embed per batch

Create / Edit Form

Navigate to Generative AI → Embedding Models to manage embedding model configurations.

General Information

Field	Required	Description
Model Name	Yes	Display name for this embedding model
Description		Free-text notes about the model's purpose

Provider

Field	Required	Description
LLM Instance	Yes*	Select the LLM Instance that provides the embedding API. Not required for Local Transformers.
Model Reference	Yes	Technical model identifier (e.g., `text-embedding-3-large`, `nomic-embed-text`)

Local Transformers Options

These fields appear only when Transformers (Local) is selected as the provider type:

Field	Description
Model Path	Path to the ONNX model file (`.onnx` extension)
Tokenizer Path	Path to the tokenizer file (`tokenizer.json`)
Batch Size	Number of texts processed per inference batch
Enable GPU	Toggle hardware acceleration

Status

Field	Description
Enabled	Toggle to activate or deactivate this model. Disabled models are not available for selection.

Choosing a Model

The embedding model determines two things:

Dimensionality — the number of dimensions in the vector (e.g., 384, 768, 1,536, 3,072). Higher dimensions capture more nuance but require more storage.
Semantic quality — how well the model captures meaning. Larger models generally produce better similarity results at the cost of slower indexing.

Recommendations

Scenario	Recommended Model	Provider
General production	`text-embedding-3-small`	OpenAI
Maximum quality	`text-embedding-3-large`	OpenAI
Local / air-gapped	`nomic-embed-text`	Ollama
Resource-constrained	`all-minilm`	Ollama
Azure enterprise	`text-embedding-ada-002` deployment	Azure OpenAI
Custom fine-tuned	Your `.onnx` model	Local Transformers

For most deployments, a mid-sized model such as text-embedding-3-small (OpenAI) or nomic-embed-text (Ollama) provides a good balance between quality and performance.

Global Configuration

Set the default embedding model in Administration → Settings:

Setting	Description
Default Embedding Model	The embedding model used to generate vectors at indexing and query time

Individual Semantic Navigation Sites can override this setting in their Generative AI tab.

The Knowledge Base always uses the global default.

REST API

Embedding models are managed via the REST API at /api/embedding-model.

Method	Endpoint	Description
`GET`	`/api/embedding-model`	List all embedding models
`GET`	`/api/embedding-model/structure`	Get the structure template for a new model
`GET`	`/api/embedding-model/{id}`	Get a specific embedding model
`POST`	`/api/embedding-model`	Create a new embedding model
`PUT`	`/api/embedding-model/{id}`	Update an existing embedding model
`DELETE`	`/api/embedding-model/{id}`	Delete an embedding model

Per-Site Override

Each Semantic Navigation Site can override the global embedding model in its Generative AI tab. This allows different sites to use different models — for example, a multilingual site might use a model optimized for cross-language embeddings while a technical site uses a domain-specific model.

The site-level configuration includes:

Setting	Description
Embedding Model	Overrides the global default for this site
Embedding Store	Overrides the global store backend for this site
LLM Instance	The chat/reasoning model for this site's GenAI features

Caching

Embedding model data is cached at the repository layer to avoid repeated database reads during high-throughput indexing:

turEmbeddingModelfindAll — caches the full list of models
turEmbeddingModelfindById — caches individual model lookups

Cache entries are invalidated automatically on create, update, or delete.

Page	Description
Embedding Stores	Vector database backends (ChromaDB, PgVector, Milvus)
What is RAG?	How embedding models fit into the RAG pipeline
LLM Instances	Configure the LLM providers that supply embedding APIs
Assets	Knowledge Base files indexed using the embedding model
Semantic Navigation	Per-site GenAI and embedding overrides
GenAI & LLM Configuration	Global settings and architecture overview

Supported Providers​

OpenAI​

Azure OpenAI​

Ollama (Local)​

Local Transformers (ONNX)​

Create / Edit Form​

General Information​

Provider​

Local Transformers Options​

Status​

Choosing a Model​

Recommendations​

Global Configuration​

REST API​

Per-Site Override​

Caching​

Related Pages​