Back to Elasticsearch

Semantic text field type [semantic-text]

docs/reference/elasticsearch/mapping-reference/semantic-text.md

9.4.16.5 KB
Original Source

Semantic text field type [semantic-text]

:::::{warning} The semantic_text field mapping can be added regardless of license state. However, it typically calls the {{infer-cap}} API, which requires an appropriate license. In these cases, using semantic_text in a cluster without the appropriate license causes operations such as indexing and reindexing to fail. :::::

The semantic_text field type simplifies semantic search by providing sensible defaults that automate most of the manual work typically required for vector search. Using semantic_text, you don't have to manually configure mappings, set up ingestion pipelines, or handle chunking. The field type automatically:

  • Configures index mappings: Chooses the correct field type (sparse_vector or dense_vector), dimensions, similarity functions, and storage optimizations based on the {{infer}} endpoint.
  • Generates embeddings during indexing: Automatically generates embeddings when you index documents, without requiring ingestion pipelines or {{infer}} processors.
  • Handles chunking: Automatically chunks long text documents during indexing.

Basic semantic_text mapping example

The following example creates an index mapping with a semantic_text field, using default values:

console
PUT semantic-embeddings 
{
  "mappings": { 
    "properties": {
      "content": { 
        "type": "semantic_text"
      }
    }
  }
}

:::{important} If you don't specify an inference_id, like in the example above, and upgrade to a later version, newly created indices might use a different embedding model than existing ones. Queries that target these indices together can produce unexpected ranking results. For details, refer to potential issues when mixing embedding models across indices. :::

Extended semantic_text mapping example

The following example creates an index mapping with a semantic_text field that uses dense vectors:

console
PUT semantic-embeddings
{
  "mappings": {
    "properties": {
      "content": {
        "type": "semantic_text",
        "inference_id": "my-inference-endpoint", <1>
        "search_inference_id": "my-search-inference-endpoint", <2>
        "index_options": { <3>
          "dense_vector": {
            "type": "bbq_disk"
          }
        },
        "chunking_settings": { <4>
          "strategy": "word",
          "max_chunk_size": 120,
          "overlap": 40
        }
      }
    }
  }
}
  1. (Optional) Specifies the {{infer}} endpoint used to generate embeddings at index time. If you don’t specify an inference_id, the semantic_text field uses a default {{infer}} endpoint.
  2. (Optional) The {{infer}} endpoint used to generate embeddings at query time. If not specified, the endpoint defined by inference_id is used at both index and query time.
  3. (Optional) Configures how the underlying vector representation is indexed. In this example, bbq_disk is selected for dense vectors. You can configure different index options depending on whether the field uses dense or sparse vectors. Learn how to set index_options for sparse_vectors and how to set index_options for dense_vectors.
  4. (Optional) Overrides the chunking settings from the {{infer}} endpoint. In this example, the word strategy splits text on individual words with a maximum of 120 words per chunk and an overlap of 40 words between chunks. The default chunking strategy is sentence.

:::{tip} For a complete example, refer to the Semantic search with semantic_text tutorial. :::

Overview

The semantic_text field type documentation is organized into reference content and how-to guides.

Reference

The Reference section provides technical reference content:

How-to guides

The How-to guides section organizes procedure descriptions and examples into the following guides:

  • Set up and configure semantic_text fields: Learn how to configure {{infer}} endpoints, including default and preconfigured options, ELSER on EIS, custom endpoints, and dedicated endpoints for ingestion and search operations.

  • Ingest data with semantic_text fields: Learn how to index pre-chunked content, use copy_to and multi-fields to collect values from multiple fields, and perform updates and partial updates to optimize ingestion costs.

  • Search and retrieve semantic_text fields: Learn how to query semantic_text fields, retrieve indexed chunks, return field embeddings, and highlight the most relevant fragments from search results.