Query domain-specific language (DSL)

Join query

The join filter enables the filtering of one set of documents, considered as the parent set, with another, considered as the child set, based on shared field values. It accepts the following parameters:

type: The type of join algorithm to use. Valid values are BROADCAST_JOIN, HASH_JOIN, INDEX_JOIN, or ROUTING_JOIN. If this parameter is not specified, the query planner will automatically select the optimal one. For more information, see Configuring joins by type.
indices: The index names for the child set. Multiple indices can be specified using the Elasticsearch syntax. Defaults to all indices.
on: An array of two elements that is used as an equality join condition between two field paths - one for the parent and the other for the child set, respectively.
request: The search request that is used to compute the set of documents of the child set before performing the join. A match_all query is used if no request is given.

Example

In this example, we join all the documents from parent_index with the documents of child_index using the HASH_JOIN algorithm. The query first filters documents from child_index and of type type with the query { "terms" : { "tag" : [ "aaa" ] } }. It then retrieves the ids of the documents from the field id specified by the parameter on. The list of ids is then used as filter and applied on the field foreign_key of the documents from parent_index.

curl -H 'Content-Type: application/json' -XGET 'http://localhost:9200/siren/parent_index/_search' -d '
{
  "query" : {
    "join" : {
      "type": "HASH_JOIN",
      "indices" : ["child_index"],
      "on" : ["foreign_key", "id"], (1)
      "request" : { (2)
        "query" : {
          "terms" : {
            "tag" : [ "aaa" ]
          }
        }
      }
    }
  }
}
'

1	The equality join condition: the value of the `foreign_key` field must be equal to the value of the `id` field.
2	The search request that will be used to filter out the child set (i.e. `child_index`)

Extended join condition syntax

The traditional join syntax is a powerful tool for combining relational data from multiple tables, but sometimes the join condition might require more complex logic than just comparing keys for equality. You can use the extended join condition syntax, which is a more flexible and customizable way to define join conditions. By using the extended syntax, you can create joins with complex conditions including those that combine multiple conditions and various join condition operators. To establish an extended join condition, use the following syntax:

"on": {
  "<boolean expression operator>": {
    "<binary operator>": {
      "fields": ["<left_field>","<right_field>"]
    }

on: An object to specify the join condition configuration.
<boolean expression operator>: The Boolean expression operator specifies how the binary operator is interpreted. Currently, you can only use must.
<binary operator>: The operator used to compare the left_field and the right_field from the parent and the child set.
fields: An array of two elements that specifies the field paths for the join keys in the parent and the child set, respectively.

The following is the list of binary operators and their descriptions.

Operator Example Syntax sugar Description

Operator	Example	Syntax sugar	Description
eq	`"on": { "must":{ "eq":{ "fields": [ "left_field", "right_field" ] } } }`	`"on" : [ "left_field", "right_field" ]`	`left_field` is equal to `right_field`
lt	`"on": { "must":{ "lt":{ "fields":[ "left_field", "right_field" ] } } }`		`left_field` is less than `right_field`
lte	`"on": { "must":{ "lte":{ "fields":[ "left_field", "right_field" ] } } }`		`left_field` is less than or equal to `right_field`
gt	`"on": { "must":{ "gt":{ "fields":[ "left_field", "right_field" ] } } }`		`left_field` is greater than `right_field`
gte	`"on": { "must":{ "gte":{ "fields":[ "left_field", "right_field" ] } } }`		`left_field` is greater than or equal to `right_field`
contains	`"on": { "must":{ "contains":{ "fields":[ "left_range_field", "right_field" ] } } }`		`left_range_field` contains `right_field` The left field must be a range field type. The right field must be a simple field type.
within	`"on": { "must":{ "within":{ "fields":[ "left_field", "right_range_field" ] } } }`		`left_field` is within `right_range_field` The left field must be a simple type. The right field must be a range field type.
intersects	`"on": { "must":{ "intersects":{ "fields":[ "left_range_field", "right_range_field" ] } } }`		`left_range_field` intersects `right_range_field` Both fields must be range field types.

"on": {
  "must":{
    "eq":{
      "fields": [
        "left_field",
        "right_field"
      ]
    }
  }
}

"on" : [
  "left_field",
  "right_field"
]

left_field is equal to right_field

"on": {
  "must":{
    "lt":{
      "fields":[
        "left_field",
        "right_field"
      ]
    }
  }
}

left_field is less than right_field

lte

"on": {
  "must":{
    "lte":{
      "fields":[
        "left_field",
        "right_field"
      ]
    }
  }
}

left_field is less than or equal to right_field

"on": {
  "must":{
    "gt":{
      "fields":[
        "left_field",
        "right_field"
      ]
    }
  }
}

left_field is greater than right_field

gte

"on": {
  "must":{
    "gte":{
      "fields":[
        "left_field",
        "right_field"
      ]
    }
  }
}

left_field is greater than or equal to right_field

contains

"on": {
  "must":{
    "contains":{
      "fields":[
        "left_range_field",
        "right_field"
      ]
    }
  }
}

left_range_field contains right_field

The left field must be a range field type. The right field must be a simple field type.

within

"on": {
  "must":{
    "within":{
      "fields":[
        "left_field",
        "right_range_field"
      ]
    }
  }
}

left_field is within right_range_field

The left field must be a simple type. The right field must be a range field type.

intersects

"on": {
  "must":{
    "intersects":{
      "fields":[
        "left_range_field",
        "right_range_field"
      ]
    }
  }
}

left_range_field intersects right_range_field

Both fields must be range field types.

To see the accepted data types for each operator, see the supported field types .

`allow_missing_bounds`

Operators lt, lte, gt, and gte also support allow_missing_bound to specify how to handle documents that are missing a value for a field (if the field is set to null or not set at all). The allow_missing_bound parameter can be set to "left" for the field in the parent index, "right" for the field in the child index, "both" to apply to both fields, and "none" which is the default behavior. For an example of usage scenarios, see Spatial join with missing fields.

Multi-conditional join

A multi-conditional join can be implemented by expressing the join condition as a composition of binary relations. The must field, which was previously described, can be used as an array of condition statements instead of an object, allowing for more than one condition in a join. It is important to note that the must field represents an array of conditions that must be met in a document for the join to occur. Essentially, the join is a conjunction of all the conditions provided.

must: An array of join conditions. The join is the conjunction of all the conditions provided, i.e., all conditions must be met in a document for the join to occur.

Example

curl -H 'Content-Type: application/json' -XGET 'http://localhost:9200/siren/parent_index/_search' -d '
{
  "query" : {
    "join" : {
      "indices" : ["child_index"],
      "on" :
        {
          "must": [ (1)
            {"eq": {"fields": ["foreign_key", "id"] }}, (2)
            {"gt": {"fields": ["foreign_key_2", "field2"] }} (3)
          ],
        }
        "request" : {
          "query" : {
            "match_all" : {}
          }
        }
    }
  }
}
'

1	The join condition is a conjunction of 2 separate conditions.
2	Match documents where the `foreign_key` field of `parent_index` is equal to the `id` field of `child_index`.
3	In addition, this matches documents where the `foreign_key_2` field of `parent_index` is greater than `field2` field of `child_index`.

Each additional condition can potentially affect performance. A multi-conditional join query is not processed if the planner estimates that the join will take too long to perform. See the planner module for more details.

If all conditions are using equality operators, consider using runtime fields instead and create a composite key.

Project

When joining a child set with a parent set, the fields from the child set may be projected to the parent set. The projected fields and associated values are mapped to the matching documents of the parent set.

A projection is defined in the request body search of the join clause using the parameter project. The project parameter accepts an array of field’s objects, each one defining a field to project.

The projected fields from a child set are accessible in the scope of the parent’s request. One can refer to a projected field in a project context or in a script context such as in a script field, a script-based sort, and so on.

For information about accepted projected fields in a join, see Requirements for a join .

Field

A standard field object specifies the projection of a field from a set. It is composed of the following parameters:

name: The name of a field from a child set to project.
alias: An alias name to give to the projected field. It is not possible to have multiple fields with identical names in the same set scope as this leads to ambiguity. It is therefore important to carefully select alias names to avoid such ambiguity.

Following is an example of a projection:

{
"join" : {
  "indices" : ["people"],
  "on" : ["id", "employed_by"],
  "request" : {
    "project" : [ (1)
      {
        "field" : {
          "name" : "age", (2)
          "alias" : "employee_age" (3)
        }
      }
    ],
    "query" : {"match_all" : {} }
  }
}

1	An array of fields to project
2	The name of the field to project
3	An alias for the field name

Federate Object Runtime Field

This is an experimental feature. The syntax or API might undergo changes in the future. These changes might not be backwards compatible.

The runtime field mapping defined in the search request enables the creation of a field that exists only as part of the query. You can use the federate_object runtime field to construct custom objects (map-like structures). For example, you can use it to project multiple values from a child index while preserving their relationships and structure.

To use this feature, first include the fields parameter in the search request. The fields parameter accepts an array of field names, where some might be referring to a federate_object runtime field defined in the join query. The response will then have the custom JSON object built through the various joins in the search request.

The federate_object runtime field allows for the inclusion of any Elasticsearch field types within its map-like structure. This means you can define fields of types such as keyword, date, and number, within a single federate_object. You can nest`federate_object` runtime fields within each other and so, you can create an object within an object, allowing for hierarchical data representations .

The federate_object runtime field feature relies on the fact that projected fields from a child set are accessible in the scope of the parent’s request. Like in Retrieving a federate_object field , you can define runtime field mapping in the search request in order to create a field that exists only as part of the query.

The following is a simple usage example of the federate_object runtime field:

curl -H 'Content-Type: application/json' -XPOST 'http://localhost:9200/siren/parent_index/_search' -d '
{
  "fields": ["MyCustomObj"],(1)
  "query" : {
    "join" : {
      "indices" : ["child_index"],
      "on" : ["foreign_key", "id"],
      "request" : {
        "project": [ (2)
          { "field" : { "name": "MyCustomObj" } }
        ],
        "runtime_mappings": {
          "myid": {  (3)
              "type" : "keyword",
              "script" : {
                  "source":"emit(params._fields._id.value)",
                  "lang":"painless"
              }
          },
          "name": { (4)
              "type" : "keyword",
              "script" : {
                  "source":"emit(doc['first_name.keyword'].value)",
                  "lang":"painless"
              }
          },
          "MyCustomObj" : { (5)
            "type" : "federate_object",
            "fields" : [
                "Id",
                "myid",  (3)
                "name"  (4)
            ]
          }
        },
        "query" : { "match_all" : {} }
      }
    }
  }
}
'

1	The `fields` parameter contains the `federate_object` runtime fields to include in the response, in this example it’s `MyCustomObj`
2	Projects the `federate_object` runtime field named `MyCustomObj` from the index `child_index` used in the parent’s `fields` parameter
3	Example of how we can include `_id` within `federate_object` fields
4	Example of how we can rename fields within `federate_object` fields
5	Defines a `federate_object` runtime field: the type must be `federate_object` and `fields` is required and contains the list of fields to include in the generated object

Sorting and joining on a federate_object runtime field is not supported.

The federate_object runtime field is supported only at query time, it cannot be used in the runtime section of the index mappings.

Performance considerations

Using the federate_object runtime field to project a single field that has only one value for all documents, increases the network load compared to simply projecting that field. This is because the federate_object runtime field needs to serialize the internal binary representation of the object, which uses extra bytes.

The federate_object runtime field is most useful when it projects several fields that are multivalued. The usage and the amount of data transferred across the network is reduced because the Cartisen product is not computed. It has the same benefits when handling one-to-many or many-to-many relationships.

The federate_object runtime field can help improve performance when a slow join query projecting fields is due to the Cartesian product (duplicated values are a tell-tale sign).