Paginating a search request
You can paginate a search request in Federate by using the search_after parameter. The process starts by opening a Point-In-Time (PIT) on the parent indices at the root. This operation creates an identifier that is then passed to the search request to be paginated. This effectively caches the results of the request and ensures consistency of the hits later on. Subsequent pages are then retrieved by re-executing the request and updating the search_after
parameter. Finally, you must close the PIT in order to free memory.
Open and close Point-In-Times
Federate provides two REST endpoints that allow the opening and closing of Point-In-Times on indices. For the duration of the PIT, the state of the indices in the PIT remain unchanged even if they are updated in that time. This allows search requests to be executed against a consistent index over a long period of time, unaffected by any potential changes to the indices.
The default duration for a PIT is 5 minutes. You can adjust the duration using the optional keep_alive
parameter.
POST /siren/<index>/_pit (1)
POST /siren/<index>/_pit?keep_alive=10m (2)
DELETE /siren/_pit (3)
{
"id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
}
1 | Open a PIT with the default duration. |
2 | Open a PIT with a custom duration. |
3 | Close an existing PIT. |
The POST method opens a PIT on the given index pattern and returns an identifier. The DELETE method closes the PIT referenced by the identifier in its body.
Pagination
Paginating a search request requires the PIT identifier returned by REST API, and a tiebreaker sort
parameter. The sort parameter is needed to paginate hits: this adds a sort field in the search response that is then passed to the search_after
. Getting the next page is done by getting the sort
value of the last returned hit and setting it to the search_after
.
The tiebreaker |
Below is a search request that contains a join, where the parent set is machine-*
, and the child set is beat-*
.
GET /siren/machine-*/_search
{
"query": {
"join": {
"indices": [
"beat-*"
],
"on": [
"id",
"machine"
],
"request": {
"query": {
"match_all": {}
}
}
}
}
}
A PIT over the parent set at the root is created, i.e., over the index pattern machine-*
:
POST /siren/machine-*/_pit
{
"id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
}
In order to retrieve the first page, we issue the search request with the identifier and a sort parameter. The index pattern that is normally passed as part of the _search
endpoint is omitted: indices resolved during the PIT creation are retrieved from the given PIT identifier.
GET /siren/_search
{
"sort": { (1)
"_shard_doc": "asc"
},
"pit": { (2)
"id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
},
"query": {
"join": {
"indices": [
"beat-*"
],
"on": [
"id",
"machine"
],
"request": {
"query": {
"match_all": {}
}
}
}
},
"size": 2 (3)
}
1 | A sort explicitly set with the tiebreaker field _shard_doc . |
2 | The PIT identifier returned by the call to the _pit REST API. |
3 | The number of hits returned in a page. |
In order to retrieve the next pages, the search_after
parameter must be added, using the sort
value from the last returned hit.
Keep in mind that the PIT identifier could change, always use the id
from the latest response in the new request.
GET /siren/_search
{
"sort": {
"_shard_doc": "asc"
},
"pit": { (1)
"id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
},
"query": {
"join": {
"indices": [
"beat-*"
],
"on": [
"id",
"machine"
],
"request": {
"query": {
"match_all": {}
}
}
}
},
"size": 2,
"search_after": [ (2)
1
]
}
1 | The PIT id is given the value of the last returned PIT id |
2 | The search_after is given the value of the last returned hit’s sort field. |
Examples with projection
Paginating a search request with a project
clause in a nested join.
GET /siren/_search
{
"sort": {
"_shard_doc": "asc"
},
"pit": {
"id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
},
"query": {
"join": {
"indices": [
"beat-*"
],
"on": [
"id",
"machine"
],
"request": {
"query": {
"join": {
"indices": [
"beat-*"
],
"on": [
"id",
"id"
],
"request": {
"project": [
{
"field": {
"name": "date"
}
}
],
"query": {
"match_all": {}
}
}
}
}
}
}
},
"size": 2
}
Paginating a search request with a project
clause in the root join.
GET /siren/_search
{
"sort": {
"_shard_doc": "asc"
},
"pit": {
"id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
},
"query": {
"join": {
"indices": [
"beat-*"
],
"on": [
"id",
"machine"
],
"request": {
"project": [
{
"field": {
"name": "date"
}
}
],
"query": {
"match_all": {}
}
}
}
},
"size": 2
}