api/providers/vdb/vdb-clickzetta/README.md
This module provides integration with Clickzetta Lakehouse as a vector database for Dify.
All seven configuration parameters are required:
# Authentication
CLICKZETTA_USERNAME=your_username
CLICKZETTA_PASSWORD=your_password
# Instance configuration
CLICKZETTA_INSTANCE=your_instance_id
CLICKZETTA_SERVICE=api.clickzetta.com
CLICKZETTA_WORKSPACE=your_workspace
CLICKZETTA_VCLUSTER=your_vcluster
CLICKZETTA_SCHEMA=your_schema
# Batch processing
CLICKZETTA_BATCH_SIZE=100
# Full-text search configuration
CLICKZETTA_ENABLE_INVERTED_INDEX=true
CLICKZETTA_ANALYZER_TYPE=chinese # Options: keyword, english, chinese, unicode
CLICKZETTA_ANALYZER_MODE=smart # Options: max_word, smart
# Vector search configuration
CLICKZETTA_VECTOR_DISTANCE_FUNCTION=cosine_distance # Options: l2_distance, cosine_distance
In your Dify configuration, set:
VECTOR_STORE=clickzetta
Clickzetta will automatically create tables with the following structure:
CREATE TABLE <collection_name> (
id STRING NOT NULL,
content STRING NOT NULL,
metadata JSON,
vector VECTOR(FLOAT, <dimension>) NOT NULL,
PRIMARY KEY (id)
);
-- Vector index for similarity search
CREATE VECTOR INDEX idx_<collection_name>_vec
ON TABLE <schema>.<collection_name>(vector)
PROPERTIES (
"distance.function" = "cosine_distance",
"scalar.type" = "f32"
);
-- Inverted index for full-text search (if enabled)
CREATE INVERTED INDEX idx_<collection_name>_text
ON <schema>.<collection_name>(content)
PROPERTIES (
"analyzer" = "chinese",
"mode" = "smart"
);
Clickzetta supports advanced full-text search with multiple analyzers:
keyword: No tokenization, treats the entire string as a single token
english: Designed for English text
chinese: Chinese text tokenizer
unicode: Multi-language tokenizer based on Unicode
MATCH_ALL(column, query): All terms must be presentMATCH_ANY(column, query): At least one term must be presentMATCH_PHRASE(column, query): Exact phrase matchingMATCH_PHRASE_PREFIX(column, query): Phrase prefix matchingMATCH_REGEXP(column, pattern): Regular expression matchingAdjust exploration factor for accuracy vs speed trade-off:
SET cz.vector.index.search.ef=64;
Use appropriate distance functions:
cosine_distance: Best for normalized embeddings (e.g., from language models)l2_distance: Best for raw feature vectorsChoose the right analyzer:
keyword for exact matchingCombine with vector search:
Verify vector index exists:
SHOW INDEX FROM <schema>.<table_name>;
Check if vector index is being used:
EXPLAIN SELECT ... WHERE l2_distance(...) < threshold;
Look for vector_index_search_type in the execution plan.
TOKENIZE() function to test tokenization:
SELECT TOKENIZE('your text', map('analyzer', 'chinese', 'mode', 'smart'));
ORDER BY or GROUP BY directly on vector columns