Performance Considerations

Join Types

Siren Federate offers three join strategies: the hash join, the broadcast join and the index join.

All strategies have advantages and disadvantages, but choosing the right one can help to optimize system performance. For more information, see Configuring joins by type.

Numeric vs String Attributes

Joining numeric attributes is more efficient than joining string attributes. If you are planning to join attributes of type string, we recommend to generate a murmur hash of the string value at indexing time into a new attribute, and use this new attribute for the join. Such index-time data transformation can be easily done using Logstash’s fingerprint plugin.

Tuple Collector Settings

Tuple Collectors are sending batches of tuples of fixed size. The size of a batch has an impact on the performance. Smaller batches will take less memory but will increase cpu times on the receiver side since it will have to reconstruct a tuple collection from many small batches (especially for sorted tuple collection). By default, the size of a batch of tuple is set to 1,048,576 tuples (which represents 8mb for a column of long datatype). The size can be configured using the setting key siren.io.tuple.collector.batch_size with a integer value representing the maximum number of tuples in a batch.