Glossary

Many of the terms that are used in the Siren Federate documentation are also used in Elasticsearch. For more information, see the Elasticsearch glossary.

action

The type of request that can be executed on a cluster or an index. Actions are controlled and limited by user role permissions. For more information, see Configuring security for Siren Federate.

API

The acronym for Application Programming Interface, which is a software intermediary that allows two applications to talk to each other.

broadcast join

A distributed join execution strategy, which copies the child set and duplicates it across every node of the cluster. Subsequently, on each node a hash table is used to find matching tuples between the parent and child set.

child set

During a join of indices A and B, a search is performed against index A as it is filtered by its relation to index B. In this example, the child set is index B (the filtering set) and the parent set is index A (the filtered set). Note: A set of documents can come from multiple indices.

cluster

One or more nodes that share the same cluster name.

document

A JSON document that is stored in Elasticsearch. A document is like a row in a table in a relational database. Each document is stored in an index and has a unique identifier associated with it.

Federate cluster

An Elasticsearch cluster that has the Siren Federate plugin installed.

federation

The process that maps different external database systems into a unified API so that it can be used for business intelligence (BI) or other analysis.

hash join

A distributed join execution strategy, where the two data sets are partitioned using a hash function across every node of the cluster. Subsequently, on each node a hash table is used to find matching tuples between the two inputs.

index

An optimized collection of JSON documents. An index is a logical namespace that maps to one or more primary shards and can have zero or more replica shards.

index join

A distributed join execution strategy, which copies the child set and duplicates it across every node of the cluster. Subsequently, on each node index lookups are used to find matching tuples between the parent and child set.

index lookup

The action of retrieving documents from an index, for either a particular value or range of values.

inner join

Enables the projection of arbitrary fields (including script fields and document’s scores) from the child set, B, and combines them with the parent set, A. The projected fields and associated values of a document from set B are mapped to all of the documents from set A that satisfy the join condition. The result of the join is the parent set, A, augmented by the projected fields from the child set, B. See also, parent set, child set.

I/O

Disk I/O and caching occurs when the database engine reads and writes blocks containing records to and from a disk into memory. The next time the engine needs that block, it can access it from memory, rather than reading it from the disk.

join

A binary operator that is used to combine data from two sets of documents. The result of a join is the set of all combinations of documents in the two sets of documents that are equal on their common attribute names. For information about the different join strategies that are available, see Configuring joins by type.

join query

The type of query syntax to use when you want to perform a join. See also, query. For more information, see Query DSL .

left-side set

See parent set. Also known as the 'left index'.

node

An instance of Elasticsearch that belongs to a cluster. A node can combine different roles, such as a master-eligible node, a data node, an ingestion node, a transformation node, or a machine-learning (ML) node.

parallelization

A method of processing, whereby many operations are performed simultaneously - as opposed to serial processing, in which the computational steps are performed sequentially. Parallelization improves system performance through the simultaneous processing of various operations, such as loading data, building indexes, and evaluating queries.

parent set

During a join of indices A and B, a search is performed against index A as it is filtered by its relation to index B. In this example, the parent set is index A (the filtered set) and the child set is index B (the filtering set). Note: A set of documents may come from multiple indices.

partitioning

The process of breaking data in a database down into partitions. Each piece of data resides in exactly one partition. Partitioning is performed to ensure scalability, as entire data might not fit into a single node. Different partitions can reside on different nodes and each node can serve the queries with its own partition. See also, shard.

primary shard

Each document is stored in a single primary shard. When you index a document, it is indexed first on the primary shard, then on all replicas of the primary shard.

query

A request for information from Elasticsearch. A query represents a question, which is written in a way that Elasticsearch understands. A search consists of one or more queries combined.

replica shard

Each primary shard can have zero or more replicas. A replica is a copy of the primary shard, and has two purposes:

  • Increased failover: A replica shard can be promoted to a primary shard if the existing primary shard fails.

  • Improved performance: The get and search requests can be handled by primary or replica shards.

right-side set

See child set. Also known as the 'right index'.

routing join

A distributed join execution strategy, which uploads the child set's tuples to specific nodes of the cluster. Those nodes are the ones hosting the parent set's shards that may contain a join match for the tuples given Elasticsearch document routing.

semi-join

Filters the parent set (A), based on the child set (B). A semi-join returns the documents of A that satisfy the join condition with the documents of B. This is equivalent to the EXISTS() operator in SQL.

shard

A partition of an index in Elasticsearch. Each shard is held on a separate node to spread load. See also, partitioning and primary shard.

tuple

A single row that is composed of one or more columns, where one column is mapped to one field of a document. For example, a tuple can be a row that is composed of two elements, such as the document identifier and the key value of the join condition. If a document has a multi-valued field, this will generate as many tuples as there are values.