Skip to main content
Version: 0.3.5

Viglet Turing ES: Connectors

There are several connectors to allow you to index content in Viglet Turing ES.

Apache Nutch

Apache Nutch is a highly extensible and scalable open source web crawler software project.

Installation

Nutch 1.18 and 1.20

  1. Go to https://viglet.org/turing/download/ and click on "Integration > Apache Nutch 1.18 Plugin" or "Integration > Apache Nutch 1.20 Plugin" link to download it.

  2. Extract the plugin to <APACHE_NUTCH>/plugins/indexer-viglet-turing

Configuration

nutch-site.xml

Add the following properties to <APACHE_NUTCH>/conf/nutch-site.xml:

ParameterDescription
turing.urlURL of Turing ES Server (e.g., http://localhost:2700)
turing.apiKeyAPI Key for authentication
turing.snSiteSemantic Navigation Site name
turing.localeLocale for indexing (e.g., en_US)

turing-mapping.xml

Create or edit <APACHE_NUTCH>/conf/turing-mapping.xml to configure field mappings:

<mapping>
<fields>
<field source="title" dest="title"/>
<field source="content" dest="text"/>
<field source="url" dest="url"/>
<field source="tstamp" dest="modification_date"/>
</fields>
<siteUrl>
<value url="https://example.com" snSite="Sample" locale="en_US"/>
</siteUrl>
<uniqueKey field="url"/>
</mapping>

Indexing a Website

cd <APACHE_NUTCH>
bin/crawl -i -D solr.server.url=http://localhost:8983/solr/turing -s urls crawl 5

Database

JDBC Connector that uses the same concept as sqoop, to create complex queries and map attributes to index based on the result.

Installation

Go to https://viglet.org/turing/download/ and click on "Integration > Database Connector" link to download the turing-jdbc.jar.

Usage

java -jar /appl/viglet/turing/jdbc/turing-jdbc.jar <PARAMETERS>

Parameters

ParameterDescription
--connectJDBC connection string
--driverJDBC driver class name
--querySQL query to execute
--siteSemantic Navigation Site name
--localeLocale for indexing
--chunkNumber of rows per chunk
--serverTuring ES server URL
--api-keyAPI Key for authentication
--file-path-fieldField containing file paths
--file-content-fieldField for file content
--file-extension-fieldField containing file extensions
--file-size-fieldField containing file sizes
--multi-valued-separatorSeparator for multi-valued fields
--remove-html-tags-fieldFields from which to remove HTML tags

Example

java -jar /appl/viglet/turing/jdbc/turing-jdbc.jar \
--connect "jdbc:mysql://localhost:3306/mydb" \
--driver "org.mariadb.jdbc.Driver" \
--query "SELECT id, title, content, url FROM articles" \
--site "Sample" \
--locale "en_US" \
--chunk 100 \
--server "http://localhost:2700" \
--api-key "your-api-key"

File System

FileSystem connector for indexing files with text extraction from Word, Excel, PDF and OCR for images.

Installation

Go to https://viglet.org/turing/download/ and click on "Integration > FileSystem Connector" link to download the turing-filesystem.jar.

Usage

java -jar /appl/viglet/turing/fs/turing-filesystem.jar <PARAMETERS>

Example

java -jar /appl/viglet/turing/fs/turing-filesystem.jar \
--server http://localhost:2700 \
--nlp <NLP_UUID> \
--source-dir /path/to/files \
--output-dir /path/to/output

Wordpress

Wordpress plugin that allows you to index posts.

Installation

  1. Upload the plugin folder to the /wp-content/plugins/ directory.
  2. Activate the plugin through the 'Plugins' menu in WordPress.
  3. Configure the hostname, port and URI of Turing ES.
  4. Click the settings button to load posts.

OpenText WEM Listener

OpenText WEM Listener to publish content to Viglet Turing.

Installation

Download

Go to https://viglet.com/turing/download/ and click on "Integration > WEM Listener" link to download it.

Extract the turing-wem.zip file to /appl/viglet/turing/wem:

mkdir -p /appl/viglet/turing/wem
unzip turing-wem.zip -d /appl/viglet/turing/wem

Classpath

  1. Copy the turing-wem-all.jar to WEM and CDS Library directory:
cp /appl/viglet/turing/wem/turing-wem-all.jar /appl/ot/WEM/Content/<VERSION>/lib/
  1. Edit the cda.classpath file of Management and Delivery Stages and add:
CLASSPATH.7=\#INSTALL_DIR\#/lib/turing-wem-all.jar

Command Line

Copy /appl/viglet/turing/wem/command-line/<WEM_VERSION>/turing-wem to <WEM_DIR>/bin.

ParameterRequiredDefaultDescription
--all / -aNofalseIndex all instances of all content types
--content-type / -cNo-XML name of the content type to index
--guids / -gNo-Path to file containing GUIDs
--host / -hYes-Content Management server host
--page-size / -zNo500Page size for processing
--password / -pNo-User password
--siteName / -sYesSampleWEM site name
--username / -uYes-Username for login
--working-dir / -wYes-Working directory with vgncfg.properties