docs/reference/elasticsearch/mapping-reference/semantic-text-ingestions.md
semantic_text fields [set-up-configuration-semantic-text]This page provides instructions for ingesting data into semantic_text fields. Learn how to index pre-chunked content, use copy_to and multi-fields to collect values from multiple fields, and perform updates and partial updates to optimize ingestion costs.
stack: ga 9.1
To index pre-chunked content, provide your text as an array of strings. Each element in the array represents a single chunk that will be sent directly to the {{infer}} service without further chunking.
:::::{stepper}
::::{step} Disable automatic chunking
Disable automatic chunking in your index mapping by setting chunking_settings.strategy to none:
PUT test-index
{
"mappings": {
"properties": {
"my_semantic_field": {
"type": "semantic_text",
"chunking_settings": {
"strategy": "none" <1>
}
}
}
}
}
my_semantic_field.::::
::::{step} Index documents
Index documents with pre-chunked text as an array:
PUT test-index/_doc/1
{
"my_semantic_field": ["my first chunk", "my second chunk", ...] <1>
...
}
::::
:::::
:::{important} When providing pre-chunked input:
none to avoid additional processing.elastic and elasticsearch) will automatically truncate the input.
:::copy_to and multi-fields with semantic_text [use-copy-to-with-semantic-text]You can use a single semantic_text field to collect values from multiple fields for semantic search. The semantic_text field type can serve as the target of copy_to fields, be part of a multi-field structure, or contain multi-fields internally.
Use copy_to to copy values from source fields to a semantic_text field:
PUT test-index
{
"mappings": {
"properties": {
"source_field": {
"type": "text",
"copy_to": "infer_field"
},
"infer_field": {
"type": "semantic_text",
"inference_id": ".elser-2-elasticsearch"
}
}
}
}
% TEST[skip:Requires {{infer}} endpoint]
Declare semantic_text as a multi-field:
PUT test-index
{
"mappings": {
"properties": {
"source_field": {
"type": "text",
"fields": {
"infer_field": {
"type": "semantic_text",
"inference_id": ".elser-2-elasticsearch"
}
}
}
}
}
}
% TEST[skip:Requires {{infer}} endpoint]
When updating documents that contain semantic_text fields, it's important to understand how {{infer}} is triggered:
Full document updates
: Full document updates re-run {{infer}} on all semantic_text fields, even if their values did not change. This ensures that embeddings remain consistent with the current document state but can increase ingestion costs.
Partial updates using the Bulk API
: Partial updates submitted through the Bulk API reuse existing embeddings when you omit semantic_text fields. {{infer}} does not run for omitted fields, which can significantly reduce processing time and cost.
Partial updates using the Update API
: Partial updates submitted through the Update API re-run {{infer}} on all semantic_text fields, even when you omit them from the doc object. Embeddings are re-generated regardless of whether field values changed.
To preserve existing embeddings and avoid unnecessary {{infer}} costs:
semantic_text fields that did not change from the doc object in your request.For indices containing semantic_text fields, updates that use scripts have the
following behavior:
semantic_text fields.