wiki/vidx/vidx_readme.md
[ AliSQL | Vector Index | 向量索引 ]
AliSQL natively supports storage and computation of up to 16,383 dimensional vector data, integrates mainstream vector operation functions such as cosine similarity (COSINE) and Euclidean distance (EUCLIDEAN), and builds efficient nearest neighbor search capabilities based on deeply optimized HNSW (Hierarchical Navigable Small World) algorithm, supporting indexing of full-dimensional vector columns.
AliSQL's vector capabilities can provide out-of-the-box vectorized solutions for large-scale semantic retrieval, intelligent recommendation, multimodal analysis and other scenarios. Users can seamlessly achieve high-precision vector matching and complex business logic fusion computing through standard SQL interfaces.
Vector fields use a special Field_vector type definition, inheriting from Field_varstring, using binary character set to store floating-point arrays.
CREATE TABLE table_name (
id INT PRIMARY KEY,
vector_col VECTOR(128) -- 128-dimensional vector
);
Vector indexes can be created using the following syntax:
CREATE VECTOR INDEX vidx_name ON table_name (vector_col); -- Using default parameters
Or specify directly in table definition:
CREATE TABLE table_name (
id INT PRIMARY KEY,
vector_col VECTOR(128),
VECTOR INDEX vidx_name (vector_col) M=6 DISTANCE=COSINE -- Specifying parameters
);
| Function Name | Meaning |
|---|---|
| VEC_FROMTEXT, TO_VECTOR, STRING_TO_VECTOR | String to vector |
| VEC_TOTEXT, FROM_VECTOR, VECTOR_TO_STRING | Vector to string |
| Function Name | Meaning |
|---|---|
| VECTOR_DIM | Vector dimension |
| VEC_DISTANCE, VEC_DISTANCE_EUCLIDEAN, VEC_DISTANCE_COSINE | Calculate distance between two vectors |
| If one of the arguments is a column in the vector index, distance type does not need to be specified, the vector index distance type will be automatically recognized |
Usage examples:
-- Sort using vector distance
SELECT * FROM table_name ORDER BY VEC_DISTANCE(vector_col, VEC_FROMTEXT("[1,2,3,4,5]")) LIMIT 10;
-- Display distance value in results
SELECT id, VEC_DISTANCE(vector_col, VEC_FROMTEXT("[1,2,3,4,5]")) AS distance
FROM table_name ORDER BY distance LIMIT 10;
| Variable Name | Description | Type | Default Value | Range |
|---|---|---|---|---|
| vidx_disabled | Disable creation of vector columns and vector indexes | global | ON | ON, OFF |
| vidx_default_distance | Default vector distance type | session | EUCLIDEAN | EUCLIDEAN, COSINE |
| vidx_hnsw_default_m | HNSW algorithm default m | session | 6 | [3, 200] |
| vidx_hnsw_ef_search | HNSW algorithm default ef_search | session | 20 | [1, 10000] |
| vidx_hnsw_cache_size | HNSW algorithm default memory usage limit | global | 1024 * 1024 | [1048576,18446744073709551615] |
M: Controls the number of connections for each node in the graph, default value is 6, valid range is 3 to 200DISTANCE: Distance type for building index, default value is EUCLIDEANER_NOT_SUPPORTED_YET: Unsupported transaction isolation levelER_WRONG_ARGUMENTS: Function argument errorER_VECTOR_INDEX_USAGE: Vector index usage errorER_VECTOR_INDEX_FAILED: Vector index operation failureAs one of the most popular ANN algorithms, HNSW has gained widespread recognition and validation in institutional evaluations and engineering implementations. Currently, AliSQL prioritizes support for vector indexes based on the HNSW algorithm. The overall architecture of vector search is shown in the following diagram.
<div style="text-align: center;"> </div>HNSW (Hierarchical Navigable Small World) is an efficient approximate nearest neighbor (ANN) search algorithm based on multi-layer graph structure. Its design can be summarized as:
AliSQL introduces a public cache (MHNSW Share) and transaction cache (MHNSW Trx) for vector data to accelerate vector query performance and ensure transaction safety for vector updates, achieving a balance between resource isolation and performance optimization. Public and transaction caches are accessed by different operations and have different design goals:
Welcome to visit Alibaba Cloud RDS MySQL Vector Capabilities, open-source ecosystem, ready to use out-of-the-box.