Connectors Overview

Connectors are the components that extract content from external sources and feed it into the Dumont DEP processing pipeline. Each connector specializes in a specific type of content source and knows how to navigate, extract, and map content into Job Items that the pipeline can process.

Available Connectors

Connector	Source Type	Deployment	Artifact
Web Crawler	Websites	Java Plugin — loaded via `-Dloader.path`	`web-crawler-plugin.jar`
AEM	Adobe Experience Manager	Java Plugin — loaded via `-Dloader.path`	`aem-plugin.jar`
Database	JDBC databases	Standalone Java CLI — runs independently	`dumont-db-indexer.jar`
FileSystem	Local/network directories	Standalone Java CLI — runs independently	`dumont-filesystem-indexer.jar`
WordPress	WordPress sites	PHP Plugin — installed inside WordPress	`viglet-turing-for-wordpress/`

How Connectors Work

Every connector follows the same lifecycle:

Connect — Establish a connection to the content source (HTTP, JDBC, file handle, JCR session)
Discover — Find content to process (follow links, execute query, list files, traverse nodes)
Extract — Pull field values from each content item (title, text, URL, date, custom fields)
Create — Build a Job Item with the extracted fields, an action (INDEX/DELETE), and metadata
Submit — Pass the Job Item into the processing pipeline (strategies → batch → queue)
Finish — Flush any remaining items in the batch processor and signal completion

Connector Interface

All connectors implement the DumConnectorPlugin interface:

Method	Description
`crawl()`	Full extraction — discover and process all content from the source
`indexAll(source)`	Re-index all content from a specific source
`indexById(source, contentIds)`	Index specific documents by their IDs
`getProviderName()`	Returns the connector's identifier (e.g., `web-crawler`, `database`)

Connector Plugins vs. Standalone Tools

Dumont DEP connectors are distributed in two forms:

Connector Plugins (AEM, Web Crawler)

The AEM and Web Crawler connectors are plugin JARs that run inside the dumont-connector.jar pipeline. They must be placed in a libs/ directory and loaded via Spring Boot's -Dloader.path:

# Directory layout
dumont-connector.jar
libs/
  ├── aem-plugin.jar
  └── web-crawler-plugin.jar

# Launch with plugins on the classpath
java -Dloader.path=libs -jar dumont-connector.jar

dumont-connector.jar alone does not crawl

The connector JAR provides only the pipeline infrastructure (queue, strategies, indexing). Without a plugin JAR on the classpath, there is no data source to extract content from. You must add exactly one connector plugin via -Dloader.path.

One plugin per JVM instance

Only one connector plugin can be loaded per JVM instance. To run multiple connectors (e.g., AEM and Web Crawler), start separate dumont-connector.jar instances — each with its own plugin and port.

Standalone CLI Tools (Database, FileSystem)

The Database and FileSystem connectors are standalone command-line tools — separate JARs that run independently and connect to a running Dumont DEP instance via REST API:

# Database import (standalone JAR)
java -cp dumont-db-indexer.jar com.viglet.dumont.connector.db.DumDbImportTool \
  --server http://localhost:30130 \
  --api-key <API_KEY> \
  --driver org.mariadb.jdbc.Driver \
  --connect "jdbc:mariadb://localhost:3306/products" \
  --query "SELECT id, name, description, price FROM products" \
  --site ProductCatalog \
  --locale en_US

# FileSystem import (standalone JAR)
java -cp dumont-filesystem-indexer.jar com.viglet.dumont.filesystem.DumFSImportTool \
  --source-dir /mnt/shared/documents \
  --server http://localhost:30130 \
  --api-key <API_KEY> \
  --site InternalDocs

These tools can be scheduled via cron jobs or CI/CD pipelines.

Managing Connectors via the Turing ES Console

The AEM and Web Crawler connector plugins can be managed through the Turing ES Admin Console. To connect a running dumont-connector.jar instance to the Turing ES UI:

Open the Turing ES Admin Console
Navigate to Enterprise Search → Integration
Click New to create a new integration instance
Set the Integration Type (AEM or Web Crawler)
Set the Endpoint to the URL of your Dumont DEP connector instance (e.g., http://localhost:30130)
Enable the integration

Once connected, the Turing ES console provides a graphical interface for:

Configuring sources, content types, and field mappings
Triggering full indexing and re-indexing operations
Monitoring indexing progress in real time
Viewing indexing statistics and status
Running double-check consistency validation

For full details on the Integration UI — including monitoring, indexing stats, and double-check — see the Turing ES Integration documentation.

For AEM-specific configuration (sources, content types, author/publish, delta tracking, locales, indexing rules) see Turing ES AEM Connector documentation.

Common Configuration Pattern

Every connector needs at least these pieces of information:

Setting	Description
Source	Where to read content (URL, connection string, directory path, AEM endpoint)
Credentials	Authentication (username/password, API key, or none)
Target SN Site	The Turing ES Semantic Navigation Site that will receive the content
Locale	The language/country code for the content (e.g., `en_US`)
Field Mapping	How source fields map to search index fields

How Each Connector Is Configured

Connector	Configuration Method	Where
Web Crawler	Turing ES Admin Console	Turing ES → Integration
AEM	JSON file + Turing ES Admin Console	`export/` directory + Turing ES → Integration
Database	CLI parameters	Command-line — see Database Connector
FileSystem	CLI parameters	Command-line — see FileSystem Connector
WordPress	WordPress Admin UI	WordPress → Settings → Viglet Dumont

Available Connectors​

How Connectors Work​

Connector Interface​

Connector Plugins vs. Standalone Tools​

Connector Plugins (AEM, Web Crawler)​

Standalone CLI Tools (Database, FileSystem)​

Managing Connectors via the Turing ES Console​

Common Configuration Pattern​

How Each Connector Is Configured​