Entity resolution in Siren Investigate
Siren entity resolution (ER) identifies different records that refer to the same real-world entity, even when variations exist in the data. For example, it can recognize that two records: one listing "John Smith" and another as "Jon Smyth", actually refer to the same individual, despite the differences in name spelling. By linking these seemingly distinct records, Siren uncovers hidden relationships and connections within the data, empowering users to conduct more thorough investigations, improve data quality, and gain deeper insights into complex datasets.
You can use then use entity resolution in Siren Search to remove clutter from your search results. See Entity resolution in Siren Search results.
Getting started
Siren supports data produced by entity resolution software such as Senzing. You can import this data to an index in Elasticsearch. Using this index, you can create an entity table within Siren Investigate. The documents within this index contain references to other documents in your data model.
To set up and use entity resolution in Siren Investigate, do the following:
-
Enable ER on an entity table.
-
Ensure you understand the data.
-
Create ER types in Advanced Settings.
-
Create ER enabled relations in Data model.
-
Use entity resolution on equivalent records in Siren Search results.
These steps are explained in the following sections using the example below:
Enabling ER on an entity table
The entity2record
table in the example contains data from the entity resolution software. But you must tell Investigate to use it as an entity resolution (ER) table. ER tables utilize ER types and relations to support the entity resolution process.
To enable an ER table, go to Data model > Info tab, and select Entity resolution. For more information, see Editing entity tables, Info tab.
For information about creating entity resolved data, see Siren ER User Guide.
Understanding the data within the ER table
Before you create relations and ER types, ensure you understand how the data is structured within your ER table and what it represents. The following is a sample document found in the entity2record
example table:
{
"EntityID": 1582550,
"company": ["company/smith-consultancy", "company/smith-accountancy"],
"investor": ["investor/peter-smith"],
"organization": ["org/smith-services"]
}
Each of the fields company
, investor
, and organization
contain references to values found in other documents in the entity tables - Companies
, Investors
and Organizations
. These fields are used to create relations and attach ER types so that you can perform entity resolution.
Creating ER types
In Advanced Settings, specify the ER types to define the concept of an entity. In the example there is an ER type called 'Company'. This means only documents discovered from relations marked with 'Company' are considered conceptually the same entity.
For more information about creating entity resolution types, see Entity resolution types.
Creating ER enabled relations
Relations can be marked with ER types and used to determine which relations to use to discover records representing the same entity. In the example’s data model, there are three entity tables connected to the ER table - Organizations
, Companies
, and Investor
. There is a relation from each entity table to the ER table, each relation is also marked with an ER type.
The example Companies
table contains the following document:
{
"id": "company/smith-consultancy",
"city": "San Francisco",
"number_of_employees": 150
}
You could create a relation between Companies
and entity2record
using the following fields: id → company
. You can then mark this relation with the Company
ER type. You can do the same for the Organizations
and Investors
tables using their respective fields.
To set ER types on relations, go to Data model > Relations tab > Advanced Settings. For more information about setting ER types on relations, see Configuring advanced settings for relations.
Result
It is now possible to perform searches within Siren Search and discover documents that represent the same entity. See Entity Resolution in Siren Search results.