docs-site/content/30.0/api/stemming.md
Stemming is a technique that helps handle variations of words during search. When stemming is enabled, a search for one form of a word will also match other grammatical forms of that word. For example:
Typesense provides two approaches to handle word variations:
Basic stemming uses the Snowball stemmer algorithm to automatically detect and handle word variations. Being rules-based, it works well for common word patterns in the configured language, but may produce unintended side effects with brand names, proper nouns, and locations. Since these rules are designed primarily for common nouns, applying them to specialized content like company names or locations can sometimes degrade search relevance.
To enable basic stemming for a field, set "stem": true in your collection schema:
curl "http://localhost:8108/collections" -X POST \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -d '{
"name": "companies",
"fields": [
{"name": "description", "type": "string", "stem": true}
]
}'
The language used for stemming is automatically determined from the locale parameter of the field. For example, setting "locale": "fr" will use French-specific stemming rules.
For cases where you need more precise control over word variations, or when dealing with irregular forms that algorithmic stemming can't handle well, you can use stemming dictionaries. These allow you to define exact mappings between words and their root forms.
Typesense provides a pre-made English plurals dictionary that handles common singular/plural variations. You can download it here.
This dictionary is particularly useful when you need reliable handling of English plural forms without the potential side effects of algorithmic stemming.
First, create a JSONL file with your word mappings:
{"word": "people", "root": "person"}
{"word": "children", "root": "child"}
{"word": "geese", "root": "goose"}
Then upload it using the stemming dictionary API:
<Tabs :tabs="['Shell']"> <template v-slot:Shell>curl "http://localhost:8108/stemming/dictionaries/import?id=irregular-plurals" \
-X POST \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
--data-binary @dictionary.jsonl
{
"id": "irregular-plurals",
"words": [
{"root": "person", "word": "people"},
{"root": "child", "word": "children"},
{"root": "goose", "word": "geese"}
]
}
To use a stemming dictionary, specify it in your collection schema using the stem_dictionary parameter:
curl "http://localhost:8108/collections" -X POST \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -d '{
"name": "companies",
"fields": [
{"name": "title", "type": "string", "stem_dictionary": "irregular-plurals"}
]
}'
:::tip Understanding Stemming Options When configuring a field for stemming:
"stem": true alone applies the default Porter stemmer algorithm"stem_dictionary": "dictionary_name" automatically enables stemming functionality ("stem": true is implied)When you specify only stem_dictionary in your configuration, you'll notice "stem": true appears automatically in your schema because the system enables basic stemming by default when dictionary stemming is configured.
:::
curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
"http://localhost:8108/stemming/dictionaries/irregular-plurals"
curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
"http://localhost:8108/stemming/dictionaries"
{
"dictionaries": ["irregular-plurals", "company-terms"]
}
Start with Basic Stemming: For most use cases, basic stemming with the appropriate locale setting will handle common word variations well.
Use Dictionaries for Exceptions: Add stemming dictionaries when you need to handle:
Language-Specific Considerations: Remember that basic stemming behavior changes based on the locale parameter. Set this appropriately for your content's language.