Getting Started
In this short guide, you will learn how you can quickly install the Siren Federate plugin in Elasticsearch, load two sets of documents inter-connected by a common attribute, and execute a relational query across the two sets within the Elasticsearch environment.
Prerequisites
This guide requires that you have downloaded and installed the Elasticsearch 7.9.3 distribution on your computer. If you do not have an Elasticsearch distribution, you can run the following commands:
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.3.zip
$ unzip elasticsearch-7.9.3.zip
$ cd elasticsearch-7.9.3
Installing the Siren Federate Plugin
Before you start Elasticsearch, you must install the Siren Federate plugin.
-
Extract the Siren Federate distribution ZIP file into a local directory.
-
In the extracted directory, locate the plugin ZIP file named
siren-federate-7.9.3-21.6-proguard-plugin.zip
. The path to this plugin ZIP file is represented byPATH-TO-SIREN-FEDERATE-PLUGIN
in the command that follows. -
From the Elasticsearch installation directory, run the following command:
$ ./bin/elasticsearch-plugin install file:///PATH-TO-SIREN-FEDERATE-PLUGIN/siren-federate-7.9.3-21.6-proguard-plugin.zip
-> Downloading file:///PATH-TO-SIREN-FEDERATE-PLUGIN/siren-federate-7.9.3-21.6-proguard-plugin.zip
[=================================================] 100%
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: plugin requires additional permissions @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
* java.io.FilePermission cloudera.properties read
* java.io.FilePermission simba.properties read
* java.lang.RuntimePermission accessClassInPackage.sun.misc
* java.lang.RuntimePermission accessClassInPackage.sun.misc.*
* java.lang.RuntimePermission accessClassInPackage.sun.security.provider
* java.lang.RuntimePermission accessDeclaredMembers
* java.lang.RuntimePermission createClassLoader
* java.lang.RuntimePermission getClassLoader
...
See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
for descriptions of what these permissions allow and the associated risks.
Continue with installation? [y/N]y
-> Installed siren-federate
To remove the plugin, run the following command:
$ bin/elasticsearch-plugin remove siren-federate
-> Removing siren-federate...
Removed siren-federate
Starting Elasticsearch
To launch Elasticsearch, run the following command:
$ ./bin/elasticsearch
In the output, you should see a line like the following which indicates that the Siren Federate plugin is installed and running:
[2017-04-11T10:42:02,209][INFO ][o.e.p.PluginsService ] [etZuTTn] loaded plugin [siren-federate]
Loading Some Relational Data
We will use a simple synthetic dataset for the purpose of this demo. The dataset consists of two sets
of documents: Article and Company. An article is connected to a company with the attribute mentions
.
Article will be loaded into the article
index and company in the company
index. To load the dataset, run
the following command:
$ curl -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/article'
$ curl -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/article/_mapping' -d '
{
"properties": {
"mentions": {
"type": "keyword"
}
}
}
'
$ curl -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/company'
$ curl -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/company/_mapping' -d '
{
"properties": {
"id": {
"type": "keyword"
}
}
}
'
$ curl -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/_bulk?pretty&refresh=true' -d '
{ "index" : { "_index" : "article", "_id" : "1" } }
{ "title" : "The NoSQL database glut", "mentions" : ["1", "2"] }
{ "index" : { "_index" : "article", "_id" : "2" } }
{ "title" : "Graph Databases Seen Connecting the Dots", "mentions" : [] }
{ "index" : { "_index" : "article", "_id" : "3" } }
{ "title" : "How to determine which NoSQL DBMS best fits your needs", "mentions" : ["2", "4"] }
{ "index" : { "_index" : "article", "_id" : "4" } }
{ "title" : "MapR ships Apache Drill", "mentions" : ["4"] }
{ "index" : { "_index" : "company", "_id" : "1" } }
{ "id": "1", "name" : "Elastic" }
{ "index" : { "_index" : "company", "_id" : "2" } }
{ "id": "2", "name" : "Orient Technologies" }
{ "index" : { "_index" : "company", "_id" : "3" } }
{ "id": "3", "name" : "Cloudera" }
{ "index" : { "_index" : "company", "_id" : "4" } }
{ "id": "4", "name" : "MapR" }
'
{
"took" : 8,
"errors" : false,
"items" : [ {
"index" : {
"_index" : "article",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 201
}
},
...
}
Relational Querying of the Data
We will now show you how to execute a relational query across the two indices. For example, we would like
to retrieve all the articles that mention companies whose name matches orient
. This relational query can be decomposed in
two search queries: the first one to find all the companies whose name matches orient
, and a second
query to filter out all articles that do not mention a company from the first result set. The Siren Federate plugin
introduces a new Elasticsearch filter,
named join
, that allows to
define such a query plan and a new search API siren/<index>/_search
that allows to execute this query plan.
Below is the command to run the relational query:
$ curl -H 'Content-Type: application/json' 'http://localhost:9200/siren/article/_search?pretty' -d '{ (1)
"query" : {
"join" : { (2)
"indices" : ["company"], (3)
"on" : ["mentions", "id"], (4)
"request" : { (5)
"query" : {
"term" : {
"name" : "orient"
}
}
}
}
}
}'
1 | The target index (i.e. article ) |
2 | The join query clause |
3 | The source indices (i.e., company ) |
4 | The clause specifying the paths for join keys in both source and target indices |
5 | The search request that will be used to filter out company (source set) |
The command should return you the following response with two search hits:
{
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "article",
"_id" : "1",
"_score" : 1.0,
"_source":{ "title" : "The NoSQL database glut", "mentions" : ["1", "2"] }
}, {
"_index" : "article",
"_id" : "3",
"_score" : 1.0,
"_source":{ "title" : "How to determine which NoSQL DBMS best fits your needs", "mentions" : ["2", "4"] }
} ]
}
}
You can also reverse the order of the join, and query for all the companies that are mentioned
in articles whose title matches nosql
:
$ curl -H 'Content-Type: application/json' 'http://localhost:9200/siren/company/_search?pretty' -d '{
"query" : {
"join" : {
"indices" : ["article"],
"on": ["id", "mentions"],
"request" : {
"query" : {
"term" : {
"title" : "nosql"
}
}
}
}
}
}'
The command should return you the following response with three search hits:
{
"hits" : {
"total" : 3,
"max_score" : 1.0,
"hits" : [ {
"_index" : "company",
"_id" : "4",
"_score" : 1.0,
"_source":{ "id": "4", "name" : "MapR" }
}, {
"_index" : "company",
"_id" : "1",
"_score" : 1.0,
"_source":{ "id": "1", "name" : "Elastic" }
}, {
"_index" : "company",
"_id" : "2",
"_score" : 1.0,
"_source":{ "id": "2", "name" : "Orient Technologies" }
} ]
}
}