Back to Elasticsearch

Chunk

docs/reference/query-languages/esql/_snippets/functions/functionNamedParams/chunk.md

9.4.01.8 KB
Original Source

% This is generated by ESQL's AbstractFunctionTestCase. Do not edit it. See ../README.md for how to regenerate it.

Supported function named parameters

strategy : (keyword) The chunking strategy to use. Default value is sentence.

max_chunk_size : (integer) The maximum size of a chunk in words. This value cannot be lower than 20 (for sentence strategy) or 10 (for word or recursive strategies). This model should not exceed the window size for any associated models using the output of this function.

overlap : (integer) The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

sentence_overlap : (integer) The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

separator_group : (keyword) Sets a predefined lists of separators based on the selected text type. Values may be markdown or plaintext. Only applicable to the recursive chunking strategy. When using the recursive chunking strategy one of separators or separator_group must be specified.

separators : (keyword) A list of strings used as possible split points when chunking text. Each string can be a plain string or a regular expression (regex) pattern. The system tries each separator in order to split the text, starting from the first item in the list. After splitting, it attempts to recombine smaller pieces into larger chunks that stay within the max_chunk_size limit, to reduce the total number of chunks generated. Only applicable to the recursive chunking strategy. When using the recursive chunking strategy one of separators or separator_group must be specified.