Skip to main content

Turing ES — Architecture Overview

Introduction

Viglet Turing ES is an open-source enterprise search platform that combines semantic navigation, generative AI, tool calling, and AI Agents. It allows organizations to index content from multiple sources, apply Retrieval-Augmented Generation (RAG) over that content, and expose rich search experiences through REST and GraphQL APIs.

Content ingestion is handled by Viglet Dumont DEP, a separate project that runs connectors independently and delivers indexed documents to Turing ES via an asynchronous message queue.

This document describes the system's components, internal modules, and the two core data flows: indexing and search.


High-Level Component Diagram

Turing ES — High-Level Architecture

The architecture is organized into four layers: Clients & External Services at the top, an API gateway in the middle, Core Modules for business logic, and Backends & Storage at the bottom. Each layer is detailed below.


Clients & External Services

ComponentDescription
JS SDK / Java SDKTyped clients for search and indexing — available on npm and Maven Central
Dumont DEPSeparate application that runs connectors and sends documents to Turing ES via REST API
KeycloakOptional OAuth2 / OIDC provider for production SSO
LLM ProvidersSix supported vendors: Anthropic Claude, OpenAI, Azure OpenAI, Google Gemini, Gemini (OpenAI-compatible), Ollama

API Layer

ComponentPackageDescription
REST APIapiControllers for search, indexing, GenAI chat, agents, token usage, and administration
GraphQL APIapiSearch query resolvers as an alternative to REST
Admin Consoleturing-reactSN Site configuration, Assets file manager (MinIO), Chat interface, Token Usage dashboard
Securityspring/securitySession-based auth (console) + API Key header (REST) + optional Keycloak OAuth2/OIDC

Core Modules

ModulePackageResponsibility
Semantic NavigationsnCore search orchestration: query processing, facets, spotlights, targeting rules, autocomplete
Search Engine Pluginsse / plugins/seAbstraction layer over Solr (recommended), Elasticsearch, and Lucene
GenAI / RAGgenaiRAG over SN content and MinIO assets; LLM context building and invocation
Tool Callinggenai/tool27 native tools: SN search (15), RAG/KB (4), web crawler (2), finance (2), weather (1), image search (1), datetime (1), code interpreter (1); plus MCP servers
LLM Providersgenai/provider/llmPluggable provider factory: Anthropic, OpenAI, Azure OpenAI, Gemini, Gemini-OpenAI, Ollama
AI AgentsagentComposition of LLM Instance + Tools + MCP Servers into deployable assistants
Indexing PipelineindexerReceives documents via Artemis, applies Merge Providers, writes to Solr and embedding stores
Message QueueartemisApache Artemis — async communication between Dumont DEP and the indexing pipeline
OCRocrText extraction from PDFs, Word documents, and images via Apache Tika
PersistencepersistenceJPA entities, repositories, DTOs, and MapStruct mappers for all domain objects

Backends & Storage

BackendPurposeNotes
Apache SolrPrimary search indexSolrCloud mode with Zookeeper in production
Embedding StoresVector storage for RAGOne active per deployment — details
DatabaseConfiguration, metadata, spotlightsH2 for dev; MariaDB/MySQL for production
MinIOAsset/file object storageManaged via admin console Assets file manager
MongoDBApplication log persistenceOptional — custom Logback appender, browsable in admin console

Indexing Flow

Content ingestion is handled externally by Viglet Dumont DEP. Each connector runs as an independent process and sends documents to Turing ES via its REST API. The API receives the request, validates it against the target Semantic Navigation Site configuration, and creates an indexing job that is queued internally via Apache Artemis for asynchronous processing.

The Semantic Navigation Site is the central configuration artifact that drives the entire indexing behavior: it defines which Solr instance to use, which fields the documents carry, how those fields are mapped and used (title, text, URL, date, image, facets, etc.), how search will behave, and which spotlights are configured. The indexing pipeline reads this configuration to know exactly what to do with each incoming document.

Turing ES — Indexing Flow

Key indexing concepts

REST API as entry point: Dumont DEP connectors send documents to Turing ES via REST API, not by writing directly to Solr or the queue. The API is the single integration point — it validates the request, loads the target SN Site configuration, and enqueues an indexing job via Apache Artemis for asynchronous processing.

SN Site as the indexing blueprint: Every indexing job is bound to a Semantic Navigation Site. The site configuration defines the complete indexing contract: which Solr collection to write to, which document fields exist and how they are typed, which fields become facets, how highlighting and autocomplete will work, and which spotlights are active. The pipeline reads this configuration at job execution time to determine field mappings, facet assignments, and spotlight handling.

Spotlight persistence: When an incoming document matches a configured Spotlight — for example, its URL or ID matches a spotlight term — the pipeline indexes the document in Solr normally and also persists the spotlight content (title, description, URL, position) in the relational database. This ensures the spotlight data remains available for injection even if the document is later removed from the Solr index.

Viglet Dumont DEP: The connector system that feeds Turing ES. It runs as a separate application and manages its own connector lifecycle (schedules, credentials, field mappings). Connectors currently available in Dumont DEP include WebCrawler (Nutch-based), Database, FileSystem, AEM, and WordPress. Refer to the Dumont DEP documentation for connector configuration.

Merge Providers: When two Dumont DEP connectors independently index different representations of the same real-world document — for example, AEM indexing structured metadata from model.json and WebCrawler indexing the rendered HTML of the same page — the Merge Provider identifies them as the same document using a configured join key and merges their fields before writing to Solr. See Semantic Navigation for a detailed explanation.

Embedding stores: If Generative AI is enabled for an SN Site, a vector embedding is generated for each indexed document and written to the configured embedding store. Turing ES supports three embedding backends via Spring AI: ChromaDB, PgVector (PostgreSQL extension), and Milvus. Only one is active per deployment. The default embedding store and embedding model are defined globally in Administration → Settings.

MinIO asset indexing: Turing ES includes an Assets file manager in the admin console, backed by MinIO as the object storage layer. Files are uploaded via drag-and-drop, organized into folders, and automatically indexed as vector embeddings on upload (and unindexed on deletion). A batch "Train AI with Assets" operation processes all files using Apache Tika for text extraction, chunking at 1,024 characters, and storing embeddings in the active vector store.

Application logs in MongoDB: When MongoDB is configured, Turing ES ships with a custom Logback appender that extends ch.qos.logback. Every log entry generated by the application — including indexing events, search requests, errors, and system events — is persisted to MongoDB in addition to standard output. These logs are exposed in the admin console, giving administrators full visibility into application behavior without requiring access to the server file system or a separate log management tool.


Search Flow

The search flow is synchronous and request-driven. Every request goes through a structured pipeline inside TurSNSearchProcess before a response is returned to the client.

Turing ES — Search Flow

Key search concepts

Site-scoped execution: Every search request is scoped to a named Semantic Navigation Site. The site configuration defines which search engine backend to use, how many results per page, facet definitions, field mappings, whether Spotlights and Targeting Rules are active, and whether GenAI is enabled.

Plugin abstraction: The orchestrator does not query the search backend directly. An intermediate plugin layer translates the abstract search context into backend-specific queries, supporting Apache Solr, Elasticsearch, and Lucene as backends. Solr is the recommended production backend as it provides the most complete feature set, including full support for facets, spotlights, targeting rules, highlighting, and autocomplete. Elasticsearch and Lucene are available as alternatives with a reduced feature set.

Spotlight injection: After retrieving organic results, the system checks whether any configured Spotlights match the current query. Matching Spotlights insert curated documents at specific positions in the result list. See Semantic Navigation for details.

Targeting Rules: Results are filtered based on the requesting user's profile attributes, passed in the request context. Targeting Rules translate those attributes into additional Solr filter queries. See Semantic Navigation for details.

Field mapping and highlighting: Before returning results, each document's fields are remapped to a canonical set of display fields configured per site (title, description, text, date, image, URL). Highlighting wraps matched terms with configurable HTML tags (default: <mark>).

Metrics: After assembling the response, query metrics (search term, result count, site, timestamp) are logged asynchronously to avoid adding latency to the response.


Technology Stack

LayerTechnologyNotes
RuntimeJava 21Minimum supported version
FrameworkSpring Boot + Spring AIApplication container; Spring AI powers LLM and embedding integrations
Search EngineApache Solr (recommended), Elasticsearch, LuceneSolr is the primary production backend with the most complete feature set; Elasticsearch and Lucene are supported as alternatives
Solr CoordinationApache ZookeeperRequired for Solr in production (SolrCloud mode)
Message BrokerApache ArtemisAsynchronous indexing queue (embedded in Turing ES)
DatabaseH2 / MariaDB / MySQL / PostgreSQLH2 for development; MariaDB or MySQL recommended for production
Embedding StoresChromaDB / PgVector / MilvusVia Spring AI; one backend active per deployment
Asset StoreMinIOExternal service; configured in application.yaml (host, user, password); object storage for files managed via the admin console folder UI
Log StoreMongoDBOptional; custom Logback appender (ch.qos.logback) persists all application logs to MongoDB, accessible via the admin console
LLM ProvidersAnthropic Claude, OpenAI, Azure OpenAI, Google Gemini, Gemini (OpenAI-compatible), OllamaConfigured per site or globally; one active at a time
Tool Calling27 native tools across 7 categories + MCP (external servers)Semantic Nav (15), RAG/KB (4), Web Crawler (2), Finance (2), Weather (1), Image Search (1), DateTime (1), Code Interpreter (1); MCP via HTTP or stdio
Identity ManagementKeycloakOAuth2 / OpenID Connect; optional for deployments without SSO
Load BalancerApache HTTP ServerOptional; required for high-availability cluster deployments
Connector SystemViglet Dumont DEPSeparate application; sends documents to Turing ES via REST API, which queues them internally to Artemis
Build SystemApache MavenMulti-module project
FrontendReact + TypeScript + shadcn/ui + ViteAdmin console (turing-react) — includes SN configuration, Assets manager, Chat interface, Token Usage dashboard
ContainerizationDocker / Docker ComposeAvailable in containers/ directory
OrchestrationKubernetesManifests available in k8s/ directory
API ProtocolsREST + GraphQLSwagger UI available in development mode
Java SDKturing-java-sdkAvailable on Maven Central; typed client for search and indexing
JavaScript SDK@viglet/turing-sdkAvailable on npm; TypeScript-ready client for web and Node.js

Deployment Topologies

Turing ES supports several deployment configurations. Each builds on the previous one — start with what you need and add components as requirements grow.

Development

Minimal setup for local development and evaluation.

Turing ES (H2 embedded) + Apache Solr

Turing ES starts with an embedded H2 database. No external database is needed. Not suitable for production.


Simple Production

Recommended baseline for production environments.

Turing ES + Apache Solr + Zookeeper + MariaDB / MySQL

Solr runs in SolrCloud mode coordinated by Zookeeper, enabling index replication and fault tolerance at the Solr layer. MariaDB or MySQL provides durable persistence for Turing ES configuration and metadata.


Production with Security (SSO)

For environments that require integration with an identity provider or corporate SSO.

Turing ES + Apache Solr + Zookeeper + MariaDB / MySQL + Keycloak

Keycloak handles authentication via OAuth2 / OpenID Connect. Users log in through Keycloak and receive tokens that are validated by Turing ES. See Security & Keycloak for configuration details.


Production with Log UI

For environments where administrators need visibility into application behavior directly from the Turing ES admin console — without access to server logs or external log tools.

Turing ES + Apache Solr + Zookeeper + MariaDB / MySQL + MongoDB

When MongoDB is configured, a custom Logback appender persists every log entry generated by the application to MongoDB. The admin console exposes these logs in a searchable interface, showing errors, warnings, indexing events, and system activity in real time.


High Availability

For environments requiring horizontal scaling and zero-downtime deployments.

Apache HTTP Server (load balancer)
└── Turing ES node 1
└── Turing ES node 2
└── Turing ES node N
Apache Solr + Zookeeper (cluster)
MariaDB / MySQL (primary + replica)
Keycloak (optional)
MongoDB (optional)

Multiple Turing ES instances run behind Apache HTTP Server configured as a reverse proxy and load balancer. Solr runs as a multi-node SolrCloud cluster. The database should be configured with at least one replica for redundancy.


PageDescription
Installation GuideSetup with Docker, JAR, or build from source
Configuration ReferenceAll application.yaml properties
IntegrationManage content connectors in the admin console
Dumont DEP — ArchitectureConnector-side architecture — pipeline engine, message queue, and indexing plugins
Dumont DEP — ConnectorsAvailable connectors and deployment types