Connecting to Remote Datasources
The Siren Federate plugin provides the capability to query data in remote datasources through the Elasticsearch API by mapping tables to virtual indices.
The plugin stores its configuration in two Elasticsearch indices:
-
.siren-federate-datasources
: used to store the JDBC configuration parameters of remote datasources. -
.siren-federate-indices
: used to store the configuration parameters of virtual indices.
Other indices are also used for different features:
- .siren-federate-ingestions
: used to store the ingestions configurations.
- .siren-federate-joblogs
: used to store logs of ingestion jobs.
Datasources and virtual indices can be managed using the REST API or the user interface available in Siren Investigate.
These indices are created automatically when required.
Connecting to JDBC datasources
Siren Federate provides the capability to query data from a remote JDBC databases.
Settings
In order to send queries to virtual indices the Elasticsearch cluster must contain at least one node enabled to issue queries over JDBC; it is advised to use a coordinating only node for this role, although this is not a requirement for testing purposes.
JDBC node settings
In order to enable JDBC on a node where the Siren Federate plugin is installed,
add the following setting to elasticsearch.yml
:
node.attr.connector.jdbc: true
Then, create a directory named jdbc-drivers
inside the configuration directory
of the node (e.g. elasticsearch/config
or /etc/elasticsearch
).
Finally, copy the JDBC driver for your remote datasource and its dependencies
to the jdbc-drivers
directory created above and restart the node; see the
JDBC driver installation and compatibility section for a list of compatible drivers and
dependencies.
Common configuration settings
Encryption
JDBC passwords are encrypted by default using a predefined 128 bit AES key;
before creating datasources, it is advised to generate a custom key by running
the keygen.sh
script included in the siren-federate
plugin directory as
follows:
bash plugins/siren-federate/tools/keygen.sh -s 128
The command will output a random base64 key; it is also possible to generate keys longer than 128 bit if your JVM supports it.
To use the custom key, the following parameters must be set in
elasticsearch.yml
on master nodes and on all the JDBC nodes:
-
siren.connector.encryption.enabled
:true
by default, can be set tofalse
to disable JDBC password encryption. -
siren.connector.encryption.secret_key
: a base64 encoded AES key used to encrypt JDBC passwords.
Example elasticsearch.yml
settings for a master node with a custom encryption
key:
siren.connector.encryption.secret_key: "1zxtIE6/EkAKap+5OsPWRw=="
Example elasticsearch.yml
settings for a JDBC node with a custom encryption
key:
siren.connector.encryption.secret_key: "1zxtIE6/EkAKap+5OsPWRw=="
node.attr.connector.jdbc: true
Restart the nodes after changing the configuration to apply the settings.
JDBC driver installation and compatibility
The JDBC driver for your remote datasource and its dependencies must be copied
to the jdbc-drivers
subdirectory inside the configuration directory of JDBC
nodes (e.g. elasticsearch/config/jdbc-drivers
).
It is not required nor recommended to copy these drivers to nodes which are not enabled to execute queries.
You may create a sub-directory within jdbc-drivers
to store a driver and to provide a custom security policy file for this driver. It is recommended to use this approach for drivers that come in multiple jars. A custom security policy file enables the definition of driver-specific permissions. The custom security policy file must be named security.policy
and must be located inside the driver sub-directory. The following variables can be used within the policy file:
-
codebase.federate.common
: Path to the directory storing thesecurity.policy
(defaults tojdbc-drivers
if the defaultdrivers-security.policy
file is used) -
codebase.federate.${jar_name}
: Path to a driver JAR file. Here,${jar_name}
refers to the name of the JAR file that is stored in the directory where thesecurity.policy
file is located. If the defaultdrivers-security.policy
file is used, then the system uses JAR files that are stored in thejdbc-drivers
directory. For example:
grant codeBase "${codebase.federate.postgresql-42.2.5.jar}" {
// Permissions for postgresql-42.2.5.jar
}
If a security.policy
is placed in the main jdbc-drivers
directory, then it overrides the default drivers-security.policy
Restart the JDBC node after copying the drivers.
Name | JDBC class | Notes |
---|---|---|
PostgreSQL |
org.postgresql.Driver |
Download the latest JDBC 4.2 driver from
https://jdbc.postgresql.org/download.html and copy the
|
MySQL |
com.mysql.jdbc.Driver |
Download the latest GA release from
https://dev.mysql.com/downloads/connector/j/, extract it, then copy
When writing the JDBC connection string, set the |
Oracle 12c+ |
oracle.jdbc.OracleDriver |
Download the latest |
Spark SQL 2.2+ |
com.simba.spark.jdbc41.Driver |
The Magnitude JDBC driver for Spark can be purchased at
https://www.simba.com/product/spark-drivers-with-sql-connector/; once
downloaded, extract the bundle, then extract the JDBC 4.1 archive and copy the
following jars to the In addition, copy your license file to the |
Neo4j |
org.neo4j.jdbc.http.HttpDriver |
Download the driver from https://mvnrepository.com/artifact/org.neo4j/neo4j-jdbc-driver/3.4.0 and copy the jar to the |
The following JDBC drivers are used by our customers but not officially supported. They are used as is
.
Name | JDBC class | Notes |
---|---|---|
Microsoft SQL Server 2014 or greater |
com.microsoft.sqlserver.jdbc.SQLServerDriver |
Download |
Sybase ASE 15.7+ |
com.sybase.jdbc4.jdbc.SybDriver OR net.sourceforge.jtds.jdbc.Driver |
To use the FreeTDS driver, download the latest version from
https://sourceforge.net/projects/jtds/files/, extract it, then copy
To use the jConnect driver, copy |
Presto |
com.facebook.presto.jdbc.PrestoDriver |
Download the latest JDBC driver from https://prestodb.io/docs/current/installation/jdbc.html
and copy it to the |
Dremio |
com.dremio.jdbc.Driver |
Download the jar at https://download.siren.io/dremio-jdbc-driver-1.4.4-201801230630490666-6d69d32.jar
and copy it to the |
Impala |
com.cloudera.impala.jdbc41.Driver |
Download the latest JDBC bundle from
https://www.cloudera.com/downloads/connectors/impala/jdbc/2-5-42.html, extract
the bundle, then extract the JDBC 4.1 archive and copy the following jars to the
|
Operations on virtual indices
The plugin supports the following operations on virtual indices:
-
get mapping
-
get field capabilities
-
search
-
msearch
-
get
-
mget
Search requests involving a mixture of virtual and normal Elasticsearch indices (e.g. when using a wildcard) are not supported and will be rejected; it is however possible to issue msearch requests containing requests on normal Elasticsearch indices and virtual indices.
When creating a virtual index, the plugin will create an empty Elasticsearch index for interoperability with Search Guard and Elastic Stack Security; if an Elasticsearch index with the same name as the virtual index already exists and it is not empty, the virtual index creation will fail.
When deleting a virtual index, the corresponding Elasticsearch index will not be removed.
Example of a search:
GET /siren/twitter/_search
{
"query": {
"match": {
"user": "siren"
}
}
}
The API returns the following response:
{
"took": 150,
"timed_out": false,
"hits": {
"total" : {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "twitter",
"_id": "0",
"_score": 2,
"_source": {
"user": "siren",
"date": "2019-11-16T12:12:12",
"message": "trying out Siren",
"likes": 0
}
}
]
}
}
Type conversion
The plugin converts JDBC types to their closest Elasticsearch equivalent:
-
String types are handled as
keyword
fields. -
Boolean types are handled as
boolean
fields. -
Date and timestamp are handled as
date
fields. -
Integer types are handled as
long
fields. -
Floating point types are handled as
double
fields.
Complex JDBC types which are not recognized by the plugin are skipped during query processing and resultset fetching.
Supported search queries
The plugin supports the following queries:
-
match_all
-
term
-
terms
-
range
-
exists
-
prefix
-
wildcard
-
ids
-
bool
At this time the plugin provides no support for datasource specific full text
search functions, so all these queries will work as if they were issued against
keyword
fields.
Troubleshooting
Cannot reconnect to datasource by hostname after DNS update
When the Java security manager is enabled, the JVM will cache name resolutions indefinitely; if the system you’re connecting to uses round-robin DNS or the IP address of the system changes frequently, you will need to modify the following Java Security Policy properties:
-
networkaddress.cache.ttl
: the number of seconds to cache a successful DNS lookup. Defaults to-1
(forever). -
networkaddress.cache.negative.ttl
: the number of seconds to cache an unsuccessful DNS lookup. Defaults to10
, set to0
to avoid caching.
Connecting to Remote Elasticsearch Clusters
Siren Federate provides the capability to query data from an Elasticsearch remote cluster through the Remote Clusters Module and the Federate Connector APIs.
This connector, unlike the JDBC connector, supports wildcard index patterns. |
Configuring the Remote Cluster
To send queries from a cluster (let’s call it the coordinator) to remote Elasticsearch clusters, the remote clusters must be configured as described in Configuring remote clusters.
The Siren Federate plugin has to be installed on the remote clusters.
This example shows how to set up the remote Elasticsearch clusters:
curl -X PUT http://localhost:9200/_cluster/settings -H 'Content-type: application/json' -d ' { "persistent": { "cluster": { "remote": { "remotefederate": { "seeds": [ "127.0.0.1:9330" ] } } } } } '
Configuring the Datasource
A datasource must first be defined as an alias to the remote cluster. Datasources are created in the coordinator cluster using the Federate Connector APIs.
curl -X PUT http://localhost:9200/_siren/connector/datasource/remotefederateds -H 'Content-type: application/json' -d ' { "elastic": { "alias": "remotefederate" } } '
Configuring the Virtual Index
Let’s assume our remote cluster remotefederate
has indices called logs-2019.01
, logs-2019.02
, …, logs-2019.12
.
Using a Wildcard Index Pattern
Let’s define a virtual index on the coordinator cluster that matches the wildcard index pattern logs-*
using the
Federate Virtual Index API:
curl -X PUT http://localhost:9200/_siren/connector/index/logsvi -H 'Content-type: application/json' -d ' { "datasource": "remotefederateds", "resource": "logs-*", "key": "_id" } '
Assuming the coordinator cluster has an index called machines
which contains information on IP addresses on
machines of interest, and that we would like to find out about the logs associated to these machines, you can execute the following Federate JOIN query to do so:
curl -X GET http://localhost:9200/siren/logsvi/_search -H 'Content-Type: application/json' -d ' { "query": { "join": { "indices": [ "machines" ], "on": [ "logs_ip_hash", "machines_ip_hash" ], "request": { "query": { "match_all": { } } } } } } '
logs_ip_hash
is the IP field in the index logsvi
and machines_ip_hash
is the IP field in the index machines
.
The API returns the following response:
{
"took": 150,
"timed_out": false,
"hits": {
"total" : {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "logs-2019-11-12",
"_id": "0",
"_score": 2,
"_source": {
"date": "2019-11-12T12:12:12",
"message": "trying out Siren"
}
}
]
}
}
Known limitations
In order to take advantage of Federate with a remote cluster, at the moment a coordinator Federate cluster must run 7.6.2-20.0 up and the remote cluster must run Federate version from 7.6.2-20.0 up.
Search Guard Compatibility
The connector is compatible with Search Guard. One can define Search Guard users with roles to secure the remote clusters and the coordinator cluster.
Each cluster must have the same user that has permissions to access the cluster datasources, indices and virtual indices in order to properly execute Federate search requests on remote clusters.
Using curl and a Search Guard user called admin
, the command would start like this:
curl -k -uadmin:password -X PUT https://localhost:9200/<some API request> ...
More information is available on the Search Guard website.
Known Limitations
Limitations for all the connectors
-
Cross backend join currently supports only integer keys.
-
Cross backend support has very different scalability according to the direction of the Join, a join which involves sending IDs to a remote system will be possibly hundreds of times less scalable (e.g. thousands vs millions) to one where the keys are fetched from a remote system.
-
Currently cross cluster searches on virtual indices are not supported.
Limitations for the JDBC Connector
-
Wildcards on virtual index names are not supported by any API; a wildcard search will silently ignore virtual indices.
-
Comma-separated lists of index patterns which target virtual indices are not supported.
-
Document-level security and field-level security are currently not supported.
-
Only terms aggregations can be nested inside a parent bucket aggregation.
-
The
missing
parameter in bucket aggregations is not supported. -
Scripted fields are not supported.
-
When issuing queries containing string comparisons, the plugin does not force a specific collation, if a table behind a virtual indices uses a case insensitive collation, string comparisons will be case insensitive.
-
Complex types are supported when their property types are scalar (text, numbers, boolean) or collections (list, map).
-
Arrays of complex type are supported if the complex type meets the previous requirement.