Sample transform pipelines
During the import process, you can specify an additional transform pipeline.
A pipeline is a definition of a series of processors that are to be executed in the same order as they are declared.
A pipeline consists of two main fields: a description and a list of processors. The pipeline is structured as follows:
{
"description": "...",
"processors": []
}
description
: Contains a helpful description of what the pipeline does.
processors
: Specifies a list of processors to be executed in order.
The following section contains some sample transform pipelines that will help you to get started.
Split fields
To split a string, separated by delimiter |
into a list of sub-strings, and if no initial string exists, fill the target field with an empty string.
{
"description": "_description",
"processors": [
{
"split": {
"on_failure": [
{
"set": {
"field": "parents",
"value": ""
}
}
],
"field": "parents",
"separator": "\\|"
}
}
]
}
Split fields to a "long"
To accomplish a similar goal, but this time convert each sub-string to a long, and if no value exists in the initial field, on failure set the target field to -1
.
{
"description": "_description",
"processors": [
{
"split": {
"on_failure": [
{
"set": {
"field": "parents",
"value": -1
}
}
],
"field": "parents",
"separator": "\\|"
},
"convert": {
"field": "parents",
"type": "long"
}
}
]
}
To extract text and create a new field (Using regex)
Extract the text between the first set of parentheses in the Title field and create a new field for it called Patent_ID
.
You must first enable regex in the elasticsearch.yml file by setting the parameter to |
{
"description": "extract the text between the first set of parentheses",
"processors": [
{
"script": {
"source": "def f = ctx['Title']; if(f != null){ def m= /\\((.*?)\\)/.matcher(f); m.find(); ctx.Patent_ID=m.group(1);)}"
}
}
]
}
Merge two fields to create a geo_point
Merge two fields that contain 'latitude' and 'longitude' values to create a single Elasticsearch geo_point
field:
{
"description": "Create geo point field",
"processors": [
{
"drop": {
"if": "ctx.latitude_field == null || ctx.longitude_field == null"
}
},
{
"set": {
"field": "geo_location",
"value": {
"lat": "{{latitude_field}}",
"lon": "{{longitude_field}}"
}
}
}
]
}