doc/GENAI.md
The GenAI (Generative AI) Module in ProxySQL provides asynchronous, non-blocking access to embedding generation and document reranking services. It enables ProxySQL to interact with LLM services (like llama-server) for vector embeddings and semantic search operations without blocking MySQL threads.
The GenAI module uses a non-blocking async architecture based on socketpair IPC and epoll event notification:
┌─────────────────┐ socketpair ┌─────────────────┐
│ MySQL_Session │◄────────────────────────────►│ GenAI Module │
│ (MySQL Thread) │ fds[0] fds[1] │ Listener Loop │
└────────┬────────┘ └────────┬────────┘
│ │
│ epoll │ queue
│ │
└── epoll_wait() ────────────────────────────────┘
(GenAI Response Ready)
struct GenAI_RequestHeader {
uint64_t request_id; // Client's correlation ID
uint32_t operation; // GENAI_OP_EMBEDDING, GENAI_OP_RERANK, or GENAI_OP_JSON
uint32_t query_len; // Length of JSON query that follows
uint32_t flags; // Reserved (must be 0)
uint32_t top_n; // For rerank: max results (0 = all)
};
// Followed by: JSON query (query_len bytes)
struct GenAI_ResponseHeader {
uint64_t request_id; // Echo of client's request ID
uint32_t status_code; // 0 = success, >0 = error
uint32_t result_len; // Length of JSON result that follows
uint32_t processing_time_ms;// Time taken by GenAI worker
uint64_t result_ptr; // Reserved (must be 0)
uint32_t result_count; // Number of results
uint32_t reserved; // Reserved (must be 0)
};
// Followed by: JSON result (result_len bytes)
| Variable | Type | Default | Description |
|---|---|---|---|
genai-threads | int | 4 | Number of GenAI worker threads (1-256) |
| Variable | Type | Default | Description |
|---|---|---|---|
genai-embedding_uri | string | http://127.0.0.1:8013/embedding | Embedding service endpoint |
genai-rerank_uri | string | http://127.0.0.1:8012/rerank | Reranking service endpoint |
| Variable | Type | Default | Description |
|---|---|---|---|
genai-embedding_timeout_ms | int | 30000 | Embedding request timeout (100-300000ms) |
genai-rerank_timeout_ms | int | 30000 | Reranking request timeout (100-300000ms) |
-- Load/Save GenAI variables
LOAD GENAI VARIABLES TO RUNTIME;
SAVE GENAI VARIABLES FROM RUNTIME;
LOAD GENAI VARIABLES FROM DISK;
SAVE GENAI VARIABLES TO DISK;
-- Set variables
SET genai-threads = 8;
SET genai-embedding_uri = 'http://localhost:8080/embed';
SET genai-rerank_uri = 'http://localhost:8081/rerank';
-- View variables
SELECT @@genai-threads;
SHOW VARIABLES LIKE 'genai-%';
-- Checksum
CHECKSUM GENAI VARIABLES;
GenAI queries use the special GENAI: prefix followed by JSON:
GENAI: {"type": "embed", "documents": ["text1", "text2"]}
GENAI: {"type": "rerank", "query": "search text", "documents": ["doc1", "doc2"]}
Generate vector embeddings for documents:
GENAI: {
"type": "embed",
"documents": [
"Machine learning is a subset of AI.",
"Deep learning uses neural networks."
]
}
Response:
+------------------------------------------+
| embedding |
+------------------------------------------+
| 0.123, -0.456, 0.789, ... |
| 0.234, -0.567, 0.890, ... |
+------------------------------------------+
Rerank documents by relevance to a query:
GENAI: {
"type": "rerank",
"query": "What is machine learning?",
"documents": [
"Machine learning is a subset of artificial intelligence.",
"The capital of France is Paris.",
"Deep learning uses neural networks."
],
"top_n": 2,
"columns": 3
}
Parameters:
query (required): Search query textdocuments (required): Array of documents to reranktop_n (optional): Maximum results to return (0 = all, default: all)columns (optional): 2 = {index, score}, 3 = {index, score, document} (default: 3)Response:
+-------+-------+----------------------------------------------+
| index | score | document |
+-------+-------+----------------------------------------------+
| 0 | 0.95 | Machine learning is a subset of AI... |
| 2 | 0.82 | Deep learning uses neural networks... |
+-------+-------+----------------------------------------------+
All GenAI queries return results in MySQL resultset format with:
columns: Array of column namesrows: Array of row dataSuccess:
{
"columns": ["index", "score", "document"],
"rows": [
[0, 0.95, "Most relevant document"],
[2, 0.82, "Second most relevant"]
]
}
Error:
{
"error": "Error message describing what went wrong"
}
-- Generate embedding for a single document
GENAI: {"type": "embed", "documents": ["Hello, world!"]};
-- Batch embedding for multiple documents
GENAI: {
"type": "embed",
"documents": ["doc1", "doc2", "doc3"]
};
-- Find most relevant documents
GENAI: {
"type": "rerank",
"query": "database optimization techniques",
"documents": [
"How to bake a cake",
"Indexing strategies for MySQL",
"Python programming basics",
"Query optimization in ProxySQL"
]
};
-- Get only top 3 most relevant documents
GENAI: {
"type": "rerank",
"query": "best practices for SQL",
"documents": ["doc1", "doc2", "doc3", "doc4", "doc5"],
"top_n": 3
};
-- Get only relevance scores (no document text)
GENAI: {
"type": "rerank",
"query": "test query",
"documents": ["doc1", "doc2"],
"columns": 2
};
genai_epoll_fd_ for monitoring GenAI responsesGENAI: query detected in handler___status_WAITING_CLIENT_DATA___STATE_SLEEP()check_genai_events() called on each iterationhandle_genai_response() processes responseThe GenAI event checking is integrated into the main MySQL handler loop:
handler_again:
switch (status) {
case WAITING_CLIENT_DATA:
handler___status_WAITING_CLIENT_DATA();
#ifdef epoll_create1
// Check for GenAI responses before processing new client data
if (check_genai_events()) {
goto handler_again; // Process more responses
}
#endif
break;
}
The GenAI module is designed to work with llama-server, a high-performance C++ inference server for LLaMA models.
# Start embedding server
./llama-server \
--model /path/to/nomic-embed-text-v1.5.gguf \
--port 8013 \
--embedding \
--ctx-size 512
# Start reranking server (using same model)
./llama-server \
--model /path/to/nomic-embed-text-v1.5.gguf \
--port 8012 \
--ctx-size 512
The GenAI module expects:
POST /embedding with JSON requestPOST /rerank with JSON requestCompatible with:
Comprehensive TAP tests are available in test/tap/tests/genai_async-t.cpp:
cd test/tap/tests
make genai_async-t
./genai_async-t
Test Coverage:
top_n and columns parameters-- Test embedding
mysql> GENAI: {"type": "embed", "documents": ["test document"]};
-- Test reranking
mysql> GENAI: {
-> "type": "rerank",
-> "query": "test query",
-> "documents": ["doc1", "doc2", "doc3"]
-> };
genai-threads (default: 4)genai-threads and GenAI service capacitygenai-threads to match expected concurrency| Error | Cause | Solution |
|---|---|---|
Failed to create GenAI communication channel | Socketpair creation failed | Check system limits (ulimit -n) |
Failed to register with GenAI module | GenAI module not initialized | Run LOAD GENAI VARIABLES TO RUNTIME |
Failed to send request to GenAI module | Write error on socketpair | Check connection stability |
GenAI module not initialized | GenAI threads not started | Set genai-threads > 0 and reload |
Requests exceeding genai-embedding_timeout_ms or genai-rerank_timeout_ms will fail with:
-- Check GenAI module status (not yet implemented, planned)
SHOW STATUS LIKE 'genai-%';
Planned status variables:
genai_threads_initialized: Number of worker threads runninggenai_active_requests: Currently processing requestsgenai_completed_requests: Total successful requestsgenai_failed_requests: Total failed requestsGenAI operations log at debug level:
# Enable GenAI debug logging
SET mysql-debug = 1;
# Check logs
tail -f proxysql.log | grep GenAI
include/GenAI_Thread.h - GenAI module interface and structureslib/GenAI_Thread.cpp - Implementation of listener and worker loopsinclude/MySQL_Session.h - Session integration (GenAI async state)lib/MySQL_Session.cpp - Async handlers and main loop integrationinclude/Base_Session.h - Base session GenAI memberstest/tap/tests/genai_module-t.cpp - Admin commands and variablestest/tap/tests/genai_embedding_rerank-t.cpp - Basic embedding/rerankingtest/tap/tests/genai_async-t.cpp - Async architecture testsSame as ProxySQL - See LICENSE file for details.
For contributions and issues:
v3.1-vec_genAI_moduleLast Updated: 2025-01-10 Module Version: 0.1.0