Skip to main content

Embedding Models

An Embedding Model converts text into a high-dimensional numerical vector that captures its semantic meaning. These vectors enable similarity search — finding documents conceptually related to a query even when they share no common keywords.

Turing ES uses embedding models in two phases:

  • Indexing — document chunks are vectorized and stored in the Embedding Store
  • Querying — the user's question is vectorized with the same model and compared against stored vectors
Same model for indexing and querying

The embedding model must remain consistent across both phases. Changing the model after documents have been indexed causes dimension mismatches and incorrect similarity results. A full re-indexing of all content is required whenever the embedding model changes.


Supported Providers

Embedding model support depends on the LLM vendor configured in the LLM Instance. Not all vendors provide an embedding API.

ProviderEmbedding SupportExample ModelsDefault Model
OpenAIYestext-embedding-3-small, text-embedding-3-large, text-embedding-ada-002text-embedding-3-small
Azure OpenAIYesDeployment name of an embedding model in your Azure resourcetext-embedding-ada-002
OllamaYesnomic-embed-text, mxbai-embed-large, all-minilm, stella-v5(configurable)
AnthropicNo
GeminiNo
Gemini (OpenAI-compatible)No

OpenAI

Connects to the OpenAI API (default: https://api.openai.com) using your API key. OpenAI offers three embedding model families:

ModelDimensionsNotes
text-embedding-3-small1,536Best cost-performance balance for most deployments
text-embedding-3-large3,072Higher quality, larger storage footprint
text-embedding-ada-0021,536Legacy model — use 3-small for new deployments

Azure OpenAI

Uses the same OpenAI embedding models, hosted on your Azure tenant. Configuration requires:

  • Endpoint — your Azure OpenAI resource URL (e.g., https://my-resource.openai.azure.com)
  • Embedding Deployment Name — the deployment name created in the Azure portal
  • API Key — Azure API key (stored encrypted)

Ollama (Local)

Runs embedding models locally via Ollama. No API key required for local deployments — ideal for air-gapped environments or development.

ModelDimensionsNotes
nomic-embed-text768Good general-purpose model, lightweight
mxbai-embed-large1,024Higher quality, more resource-intensive
all-minilm384Very lightweight, fast inference

Pull a model before using it:

ollama pull nomic-embed-text

Local Transformers (ONNX)

Turing ES also supports running embedding models locally via ONNX Runtime, without an external LLM provider. This is useful for deploying custom or fine-tuned models.

SettingDescription
Model PathAbsolute path to the .onnx model file
Tokenizer PathAbsolute path to tokenizer.json
Enable GPUToggle GPU acceleration via ONNX Runtime
Batch SizeNumber of texts to embed per batch

Create / Edit Form

Navigate to Generative AI → Embedding Models to manage embedding model configurations.

General Information

FieldRequiredDescription
Model NameYesDisplay name for this embedding model
DescriptionFree-text notes about the model's purpose

Provider

FieldRequiredDescription
LLM InstanceYes*Select the LLM Instance that provides the embedding API. Not required for Local Transformers.
Model ReferenceYesTechnical model identifier (e.g., text-embedding-3-large, nomic-embed-text)

Local Transformers Options

These fields appear only when Transformers (Local) is selected as the provider type:

FieldDescription
Model PathPath to the ONNX model file (.onnx extension)
Tokenizer PathPath to the tokenizer file (tokenizer.json)
Batch SizeNumber of texts processed per inference batch
Enable GPUToggle hardware acceleration

Status

FieldDescription
EnabledToggle to activate or deactivate this model. Disabled models are not available for selection.

Choosing a Model

The embedding model determines two things:

  • Dimensionality — the number of dimensions in the vector (e.g., 384, 768, 1,536, 3,072). Higher dimensions capture more nuance but require more storage.
  • Semantic quality — how well the model captures meaning. Larger models generally produce better similarity results at the cost of slower indexing.

Recommendations

ScenarioRecommended ModelProvider
General productiontext-embedding-3-smallOpenAI
Maximum qualitytext-embedding-3-largeOpenAI
Local / air-gappednomic-embed-textOllama
Resource-constrainedall-minilmOllama
Azure enterprisetext-embedding-ada-002 deploymentAzure OpenAI
Custom fine-tunedYour .onnx modelLocal Transformers

For most deployments, a mid-sized model such as text-embedding-3-small (OpenAI) or nomic-embed-text (Ollama) provides a good balance between quality and performance.


Global Configuration

Set the default embedding model in Administration → Settings:

SettingDescription
Default Embedding ModelThe embedding model used to generate vectors at indexing and query time

Individual Semantic Navigation Sites can override this setting in their Generative AI tab.

The Knowledge Base always uses the global default.


REST API

Embedding models are managed via the REST API at /api/embedding-model.

MethodEndpointDescription
GET/api/embedding-modelList all embedding models
GET/api/embedding-model/structureGet the structure template for a new model
GET/api/embedding-model/{id}Get a specific embedding model
POST/api/embedding-modelCreate a new embedding model
PUT/api/embedding-model/{id}Update an existing embedding model
DELETE/api/embedding-model/{id}Delete an embedding model

Per-Site Override

Each Semantic Navigation Site can override the global embedding model in its Generative AI tab. This allows different sites to use different models — for example, a multilingual site might use a model optimized for cross-language embeddings while a technical site uses a domain-specific model.

The site-level configuration includes:

SettingDescription
Embedding ModelOverrides the global default for this site
Embedding StoreOverrides the global store backend for this site
LLM InstanceThe chat/reasoning model for this site's GenAI features

Caching

Embedding model data is cached at the repository layer to avoid repeated database reads during high-throughput indexing:

  • turEmbeddingModelfindAll — caches the full list of models
  • turEmbeddingModelfindById — caches individual model lookups

Cache entries are invalidated automatically on create, update, or delete.


PageDescription
Embedding StoresVector database backends (ChromaDB, PgVector, Milvus)
What is RAG?How embedding models fit into the RAG pipeline
LLM InstancesConfigure the LLM providers that supply embedding APIs
AssetsKnowledge Base files indexed using the embedding model
Semantic NavigationPer-site GenAI and embedding overrides
GenAI & LLM ConfigurationGlobal settings and architecture overview