Fuse - Elasticsearch

yaml

serverless: preview
stack: preview 9.2.0

The FUSE processing command merges rows from multiple result sets and assigns new relevance scores.

FUSE enables hybrid search to combine and score results from multiple queries, together with the FORK command.

FUSE works by:

Merging rows with matching <key_columns> values
Assigning new relevance scores using the specified <fuse_method> algorithm and the values from the <group_column> and <score_column>

:::::{tip} FUSE is for search use cases: it merges ranked result sets and computes relevance. Learn more about how search works in ES|QL. :::::

A LIMIT is required before FUSE, because FUSE can only work with a finite set of rows.

::::{applies-switch}

:::{applies-item} { serverless: ga , stack: ga 9.4+ } FORK branches do not have an implicit LIMIT 1000. When using FUSE after FORK, a LIMIT must be added to each FORK branch. :::

:::{applies-item} stack: preview 9.1-9.3 An implicit LIMIT 1000 is added to each FORK branch. When using FUSE after FORK, FUSE does not require an explicit LIMIT in each FORK branch. However, as a best practice and to avoid issues when upgrading to newer versions, it is advised to still add an explicit LIMIT before FUSE. :::

::::

Syntax

Use default parameters:

esql

FUSE

Specify custom parameters:

esql

FUSE <fuse_method> SCORE BY <score_column> GROUP BY <group_column> KEY BY <key_columns> WITH <options>

Parameters

fuse_method : Defaults to RRF. Can be one of RRF (for Reciprocal Rank Fusion) or LINEAR (for linear combination of scores). Designates which method to use to assign new relevance scores.

options : Options for the fuse_method.

::::{tab-set} :::{tab-item} RRF When fuse_method is RRF, options supports the following parameters:

rank_constant : Defaults to 60. Represents the rank_constant used in the RRF formula.

weights : Defaults to {}. Allows you to set different weights for RRF scores based on group_column values. Refer to the Set custom weights example. :::

:::{tab-item} LINEAR When fuse_method is LINEAR, options supports the following parameters:

normalizer : Defaults to none. Can be one of none or minmax. Specifies which score normalization method to apply.

weights : Defaults to {}. Allows you to different weights for scores based on group_column values. Refer to the Set custom weights example. ::: ::::

score_column : Defaults to _score. Designates which column to use to retrieve the relevance scores of the input row and where to output the new relevance scores of the merged rows.

group_column : Defaults to _fork. Designates which column represents the result set.

key_columns : Defaults to _id, _index. Rows with matching key_columns values are merged.

Examples

The following examples use FORK to run parallel queries and FUSE to merge the results.

Use RRF

Run a lexical and a semantic query in parallel with FORK, then merge with FUSE (applies RRF by default):

esql

FROM books METADATA _id, _index, _score  # Include document ID, index name, and relevance score
| FORK (WHERE title:"Shakespeare" | SORT _score DESC | LIMIT 100)  # Fork 1: Lexical search on title field, sorted by relevance score
       (WHERE semantic_title:"Shakespeare" | SORT _score DESC | LIMIT 100)  # Fork 2: Semantic search on semantic_title field, sorted by relevance score
| FUSE  # Merge results using RRF algorithm by default
| SORT _score DESC # sort results by the new scores, since `FUSE` does not do any sorting.

Use linear combination

FUSE can also use linear score combination:

esql

FROM books METADATA _id, _index, _score
| FORK (WHERE title:"Shakespeare" | SORT _score DESC | LIMIT 100)  # Fork 1: Lexical search on title
       (WHERE semantic_title:"Shakespeare" | SORT _score DESC | LIMIT 100)  # Fork 2: Semantic search on semantic_title
| FUSE LINEAR  # Merge results using linear combination of scores (equal weights by default)
| SORT _score DESC # sort results by the new scores, since `FUSE` does not do any sorting.

Normalize scores

When combining results from semantic and lexical queries through linear combination, we recommend first normalizing the scores from each result set.

The following example uses minmax score normalization. This means the scores normalize and assign values between 0 and 1, before combining the rows:

esql

FROM books METADATA _id, _index, _score
| FORK (WHERE title:"Shakespeare" | SORT _score DESC | LIMIT 100)  # Fork 1: Lexical search
       (WHERE semantic_title:"Shakespeare" | SORT _score DESC | LIMIT 100)  # Fork 2: Semantic search
| FUSE LINEAR WITH { "normalizer": "minmax" }  # Linear combination with min-max normalization (scales scores to 0-1 range)
| SORT _score DESC # sort results by the new scores, since `FUSE` does not do any sorting.

Set custom weights

FUSE allows you to specify different weights to scores, based on the _fork column values, enabling you to control the relative importance of each query branch in the final results.

esql

FROM books METADATA _id, _index, _score
| FORK (WHERE title:"Shakespeare" | SORT _score DESC | LIMIT 100)  # Fork 1: Lexical search
       (WHERE semantic_title:"Shakespeare" | SORT _score DESC | LIMIT 100)  # Fork 2: Semantic search
| FUSE LINEAR WITH { "weights": { "fork1": 0.7, "fork2": 0.3 }, "normalizer": "minmax" }  # Weighted linear combination: 70% lexical, 30% semantic, with min-max normalization
| SORT _score DESC # sort results by the new scores, since `FUSE` does not do any sorting.

Limitations

These limitations can be present either when:

FUSE is not combined with FORK
FUSE doesn't use the default metadata columns _id, _index, _score and _fork
1. FUSE assumes that key_columns are single valued. When key_columns are multivalued, FUSE can produce unreliable relevance scores.
2. FUSE automatically assigns a score value of NULL if the <score_column> or <group_column> are multivalued.
3. FUSE assumes that the combination of key_columns and group_column is unique. If not, FUSE can produce unreliable relevance scores.