manual/english/Searching/Autocomplete.md
Autocomplete, or word completion, predicts and suggests the end of a word or phrase as you type. It's commonly used in:
Manticore offers an advanced autocomplete feature that gives suggestions while you type, similar to those in well-known search engines. This helps speed up searches and lets users find what they need faster.
In addition to basic autocomplete functionality, Manticore includes advanced features to enhance the user experience:
Manticore's autocomplete can be tailored to match different needs and settings, making it a flexible tool for many applications.
<!-- example call_autocomplete -->NOTE:
CALL AUTOCOMPLETEand/autocompleterequire Manticore Buddy. If it doesn't work, make sure Buddy is installed.
To use autocomplete in Manticore, use the CALL AUTOCOMPLETE SQL statement or its JSON equivalent /autocomplete. This feature provides word completion suggestions based on your indexed data.
Before you proceed, ensure that the table you intend to use for autocomplete has infixes enabled.
Note: There's an automatic check for min_infix_len in the table settings, which uses a 30-second cache to improve the performance of CALL AUTOCOMPLETE. After making changes to your table, there may be a brief delay the first time you use CALL AUTOCOMPLETE (though this is usually not noticeable). Only successful results are cached, so if you remove the table or disable min_infix_len, CALL AUTOCOMPLETE may temporarily return incorrect results until it eventually starts showing an error related to min_infix_len.
CALL AUTOCOMPLETE('query_beginning', 'table', [...options]);
POST /autocomplete
{
"table":"table_name",
"query":"query_beginning"
[,"options": {<autocomplete options>}]
}
layouts: A comma-separated string of keyboard layout codes for detecting typing errors caused by keyboard layout mismatches (e.g., typing "ghbdtn" instead of "привет" when using wrong layout). Manticore compares character positions across different layouts to suggest corrections. Requires at least 2 layouts to effectively detect mismatches. Available options: us, ru, ua, se, pt, no, it, gr, uk, fr, es, dk, de, ch, br, bg, be (more details here). Default: nonefuzziness: 0, 1, or 2 (default: 2). Maximum Levenshtein distance for finding typos. Set to 0 to disable fuzzy matchingpreserve: 0 or 1 (default: 0). When set to 1, keeps words that don't have fuzzy matches in the search results (e.g., "hello wrld" returns both "hello wrld" and "hello world"). When set to 0, only returns words with successful fuzzy matches (e.g., "hello wrld" returns only "hello world"). Particularly useful for preserving short words or proper nouns that may not exist in Manticore Searchprepend: Boolean (0/1 in SQL). If true(1), adds an asterisk before the last word for prefix expansion (e.g., *word)append: Boolean (0/1 in SQL). If true(1), adds an asterisk after the last word for suffix expansion (e.g., word*)expansion_len: Number of characters to expand in the last word. Default: 10force_bigrams: Boolean (0/1 in SQL). Forces the use of bigrams (2-character n-grams) instead of trigrams for all word lengths, which can improve matching for words with transposition errors. Default: 0 (use trigrams for words ≥6 characters)mysql> CALL AUTOCOMPLETE('hello', 'comment');
+------------+
| query |
+------------+
| hello |
| helio |
| hell |
| shell |
| nushell |
| powershell |
| well |
| help |
+------------+
mysql> CALL AUTOCOMPLETE('hello', 'comment', 0 as fuzziness);
+-------+
| query |
+-------+
| hello |
+-------+
POST /autocomplete
{
"table":"comment",
"query":"hello"
}
[
{
"total": 8,
"error": "",
"warning": "",
"columns": [
{
"query": {
"type": "string"
}
}
],
"data": [
{
"query": "hello"
},
{
"query": "helio"
},
{
"query": "hell"
},
{
"query": "shell"
},
{
"query": "nushell"
},
{
"query": "powershell"
},
{
"query": "well"
},
{
"query": "help"
}
]
}
]
mysql> CALL AUTOCOMPLETE('hello wrld', 'comment', 1 as preserve);
+------------+
| query |
+------------+
| hello wrld |
| hello world|
+------------+
POST /autocomplete
{
"table":"comment",
"query":"hello wrld",
"options": {
"preserve": 1
}
}
[
{
"total": 2,
"error": "",
"warning": "",
"columns": [
{
"query": {
"type": "string"
}
}
],
"data": [
{
"query": "hello wrld"
},
{
"query": "hello world"
}
]
}
]
The force_bigrams option can help with words that have transposition errors, such as "ipohne" vs "iphone". By using bigrams instead of trigrams, the algorithm can better handle character transpositions.
mysql> CALL AUTOCOMPLETE('ipohne', 'products', 1 as force_bigrams);
+--------+
| query |
+--------+
| iphone |
+--------+
POST /autocomplete
{
"table":"products",
"query":"ipohne",
"options": {
"force_bigrams": 1
}
}
[
{
"total": 1,
"error": "",
"warning": "",
"columns": [
{
"query": {
"type": "string"
}
}
],
"data": [
{
"query": "iphone"
}
]
}
]
While CALL AUTOCOMPLETE is the recommended method for most use cases, Manticore also supports other controllable and customizable approaches to implement autocomplete functionality:
To autocomplete a sentence, you can use infixed search. You can find the end of a document field by providing its beginning and:
* to match any characters^ to start from the beginning of the field"" for phrase matchingThere is an article about it in our blog and an interactive course. A quick example is:
My cat loves my dog. The cat (Felis catus) is a domestic species of small carnivorous mammal.^, "", and * so as the user is typing, you make queries like: ^"m*", ^"my *", ^"my c*", ^"my ca*" and so on<b>My cat</b> loves my dog. The cat ( ...In some cases, all you need is to autocomplete a single word or a couple of words. In this case, you can use CALL KEYWORDS.
CALL KEYWORDS is available through the SQL interface and offers a way to examine how keywords are tokenized or to obtain the tokenized forms of specific keywords. If the table enables infixes, it allows you to quickly find possible endings for given keywords, making it suitable for autocomplete functionality.
This is a great alternative to general infixed search, as it provides higher performance since it only needs the table's dictionary, not the documents themselves.
CALL KEYWORDS(text, table [, options])
The CALL KEYWORDS statement divides text into keywords. It returns the tokenized and normalized forms of the keywords, and if desired, keyword statistics. Additionally, it provides the position of each keyword in the query and all forms of tokenized keywords when the table enables lemmatizers.
| Parameter | Description |
|---|---|
| text | Text to break down to keywords |
| table | Name of the table from which to take the text processing settings |
| 0/1 as stats | Show statistics of keywords, default is 0 |
| 0/1 as fold_wildcards | Fold wildcards, default is 0 |
| 0/1 as fold_lemmas | Fold morphological lemmas, default is 0 |
| 0/1 as fold_blended | Fold blended words, default is 0 |
| N as expansion_limit | Override expansion_limit defined in the server configuration, default is 0 (use value from the configuration) |
| docs/hits as sort_mode | Sorts output results by either 'docs' or 'hits'. No sorting is applied by default. |
| jieba_mode | Jieba segmentation mode for the query. See jieba_mode for more details |
The examples show how it works if assuming the user is trying to get an autocomplete for "my cat ...". So on the application side all you need to do is to suggest the user the endings from the column "normalized" for each new word. It often makes sense to sort by hits or docs using 'hits' as sort_mode or 'docs' as sort_mode.
MySQL [(none)]> CALL KEYWORDS('m*', 't', 1 as stats);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1 | m* | my | 1 | 2 |
| 1 | m* | mammal | 1 | 1 |
+------+-----------+------------+------+------+
MySQL [(none)]> CALL KEYWORDS('my*', 't', 1 as stats);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1 | my* | my | 1 | 2 |
+------+-----------+------------+------+------+
MySQL [(none)]> CALL KEYWORDS('c*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+-------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+-------------+------+------+
| 1 | c* | cat | 1 | 2 |
| 1 | c* | carnivorous | 1 | 1 |
| 1 | c* | catus | 1 | 1 |
+------+-----------+-------------+------+------+
MySQL [(none)]> CALL KEYWORDS('ca*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+-------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+-------------+------+------+
| 1 | ca* | cat | 1 | 2 |
| 1 | ca* | carnivorous | 1 | 1 |
| 1 | ca* | catus | 1 | 1 |
+------+-----------+-------------+------+------+
MySQL [(none)]> CALL KEYWORDS('cat*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1 | cat* | cat | 1 | 2 |
| 1 | cat* | catus | 1 | 1 |
+------+-----------+------------+------+------+
There is a nice trick how you can improve the above algorithm - use bigram_index. When you have it enabled for the table what you get in it is not just a single word, but each pair of words standing one after another indexed as a separate token.
This allows to predict not just the current word's ending, but the next word too which is especially beneficial for the purpose of autocomplete.
<!-- intro -->MySQL [(none)]> CALL KEYWORDS('m*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1 | m* | my | 1 | 2 |
| 1 | m* | mammal | 1 | 1 |
| 1 | m* | my cat | 1 | 1 |
| 1 | m* | my dog | 1 | 1 |
+------+-----------+------------+------+------+
MySQL [(none)]> CALL KEYWORDS('my*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1 | my* | my | 1 | 2 |
| 1 | my* | my cat | 1 | 1 |
| 1 | my* | my dog | 1 | 1 |
+------+-----------+------------+------+------+
MySQL [(none)]> CALL KEYWORDS('c*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+--------------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+--------------------+------+------+
| 1 | c* | cat | 1 | 2 |
| 1 | c* | carnivorous | 1 | 1 |
| 1 | c* | carnivorous mammal | 1 | 1 |
| 1 | c* | cat felis | 1 | 1 |
| 1 | c* | cat loves | 1 | 1 |
| 1 | c* | catus | 1 | 1 |
| 1 | c* | catus is | 1 | 1 |
+------+-----------+--------------------+------+------+
MySQL [(none)]> CALL KEYWORDS('ca*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+--------------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+--------------------+------+------+
| 1 | ca* | cat | 1 | 2 |
| 1 | ca* | carnivorous | 1 | 1 |
| 1 | ca* | carnivorous mammal | 1 | 1 |
| 1 | ca* | cat felis | 1 | 1 |
| 1 | ca* | cat loves | 1 | 1 |
| 1 | ca* | catus | 1 | 1 |
| 1 | ca* | catus is | 1 | 1 |
+------+-----------+--------------------+------+------+
MySQL [(none)]> CALL KEYWORDS('cat*', 't', 1 as stats, 'hits' as sort_mode);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1 | cat* | cat | 1 | 2 |
| 1 | cat* | cat felis | 1 | 1 |
| 1 | cat* | cat loves | 1 | 1 |
| 1 | cat* | catus | 1 | 1 |
| 1 | cat* | catus is | 1 | 1 |
+------+-----------+------------+------+------+
CALL KEYWORDS supports distributed tables so no matter how big your data set you can benefit from using it.