GenAI Module Documentation

Overview

The GenAI (Generative AI) Module in ProxySQL provides asynchronous, non-blocking access to embedding generation and document reranking services. It enables ProxySQL to interact with LLM services (like llama-server) for vector embeddings and semantic search operations without blocking MySQL threads.

Version

Module Version: 0.1.0
Last Updated: 2025-01-10
Branch: v3.1-vec_genAI_module

Architecture

Async Design

The GenAI module uses a non-blocking async architecture based on socketpair IPC and epoll event notification:

┌─────────────────┐         socketpair         ┌─────────────────┐
│  MySQL_Session  │◄────────────────────────────►│  GenAI Module   │
│  (MySQL Thread) │  fds[0]              fds[1]  │  Listener Loop  │
└────────┬────────┘                            └────────┬────────┘
         │                                               │
         │ epoll                                         │ queue
         │                                               │
         └── epoll_wait() ────────────────────────────────┘
                     (GenAI Response Ready)

Key Components

MySQL_Session - Client-facing interface that receives GENAI: queries
GenAI Listener Thread - Monitors socketpair fds via epoll for incoming requests
GenAI Worker Threads - Thread pool that processes requests (blocking HTTP calls)
Socketpair Communication - Bidirectional IPC between MySQL and GenAI modules

Communication Protocol

Request Format (MySQL → GenAI)

struct GenAI_RequestHeader {
    uint64_t request_id;      // Client's correlation ID
    uint32_t operation;       // GENAI_OP_EMBEDDING, GENAI_OP_RERANK, or GENAI_OP_JSON
    uint32_t query_len;       // Length of JSON query that follows
    uint32_t flags;           // Reserved (must be 0)
    uint32_t top_n;           // For rerank: max results (0 = all)
};
// Followed by: JSON query (query_len bytes)

Response Format (GenAI → MySQL)

struct GenAI_ResponseHeader {
    uint64_t request_id;        // Echo of client's request ID
    uint32_t status_code;       // 0 = success, >0 = error
    uint32_t result_len;        // Length of JSON result that follows
    uint32_t processing_time_ms;// Time taken by GenAI worker
    uint64_t result_ptr;        // Reserved (must be 0)
    uint32_t result_count;      // Number of results
    uint32_t reserved;          // Reserved (must be 0)
};
// Followed by: JSON result (result_len bytes)

Configuration Variables

Thread Configuration

Variable	Type	Default	Description
`genai-threads`	int	4	Number of GenAI worker threads (1-256)

Service Endpoints

Variable	Type	Default	Description
`genai-embedding_uri`	string	`http://127.0.0.1:8013/embedding`	Embedding service endpoint
`genai-rerank_uri`	string	`http://127.0.0.1:8012/rerank`	Reranking service endpoint

Timeouts

Variable	Type	Default	Description
`genai-embedding_timeout_ms`	int	30000	Embedding request timeout (100-300000ms)
`genai-rerank_timeout_ms`	int	30000	Reranking request timeout (100-300000ms)

Admin Commands

sql

-- Load/Save GenAI variables
LOAD GENAI VARIABLES TO RUNTIME;
SAVE GENAI VARIABLES FROM RUNTIME;
LOAD GENAI VARIABLES FROM DISK;
SAVE GENAI VARIABLES TO DISK;

-- Set variables
SET genai-threads = 8;
SET genai-embedding_uri = 'http://localhost:8080/embed';
SET genai-rerank_uri = 'http://localhost:8081/rerank';

-- View variables
SELECT @@genai-threads;
SHOW VARIABLES LIKE 'genai-%';

-- Checksum
CHECKSUM GENAI VARIABLES;

Query Syntax

GENAI: Query Format

GenAI queries use the special GENAI: prefix followed by JSON:

sql

GENAI: {"type": "embed", "documents": ["text1", "text2"]}
GENAI: {"type": "rerank", "query": "search text", "documents": ["doc1", "doc2"]}

Supported Operations

1. Embedding

Generate vector embeddings for documents:

sql

GENAI: {
    "type": "embed",
    "documents": [
        "Machine learning is a subset of AI.",
        "Deep learning uses neural networks."
    ]
}

Response:

+------------------------------------------+
| embedding                                |
+------------------------------------------+
| 0.123, -0.456, 0.789, ...               |
| 0.234, -0.567, 0.890, ...               |
+------------------------------------------+

2. Reranking

Rerank documents by relevance to a query:

sql

GENAI: {
    "type": "rerank",
    "query": "What is machine learning?",
    "documents": [
        "Machine learning is a subset of artificial intelligence.",
        "The capital of France is Paris.",
        "Deep learning uses neural networks."
    ],
    "top_n": 2,
    "columns": 3
}

Parameters:

query (required): Search query text
documents (required): Array of documents to rerank
top_n (optional): Maximum results to return (0 = all, default: all)
columns (optional): 2 = {index, score}, 3 = {index, score, document} (default: 3)

Response:

+-------+-------+----------------------------------------------+
| index | score | document                                    |
+-------+-------+----------------------------------------------+
| 0     | 0.95  | Machine learning is a subset of AI...        |
| 2     | 0.82  | Deep learning uses neural networks...        |
+-------+-------+----------------------------------------------+

Response Format

All GenAI queries return results in MySQL resultset format with:

columns: Array of column names
rows: Array of row data

Success:

json

{
    "columns": ["index", "score", "document"],
    "rows": [
        [0, 0.95, "Most relevant document"],
        [2, 0.82, "Second most relevant"]
    ]
}

Error:

json

{
    "error": "Error message describing what went wrong"
}

Usage Examples

Basic Embedding

sql

-- Generate embedding for a single document
GENAI: {"type": "embed", "documents": ["Hello, world!"]};

-- Batch embedding for multiple documents
GENAI: {
    "type": "embed",
    "documents": ["doc1", "doc2", "doc3"]
};

Basic Reranking

sql

-- Find most relevant documents
GENAI: {
    "type": "rerank",
    "query": "database optimization techniques",
    "documents": [
        "How to bake a cake",
        "Indexing strategies for MySQL",
        "Python programming basics",
        "Query optimization in ProxySQL"
    ]
};

Top N Results

sql

-- Get only top 3 most relevant documents
GENAI: {
    "type": "rerank",
    "query": "best practices for SQL",
    "documents": ["doc1", "doc2", "doc3", "doc4", "doc5"],
    "top_n": 3
};

Index and Score Only

sql

-- Get only relevance scores (no document text)
GENAI: {
    "type": "rerank",
    "query": "test query",
    "documents": ["doc1", "doc2"],
    "columns": 2
};

Integration with ProxySQL

Session Lifecycle

Session Start: MySQL session creates genai_epoll_fd_ for monitoring GenAI responses
Query Received: GENAI: query detected in handler___status_WAITING_CLIENT_DATA___STATE_SLEEP()
Async Send: Socketpair created, request sent, returns immediately
Main Loop: check_genai_events() called on each iteration
Response Ready: handle_genai_response() processes response
Result Sent: MySQL result packet sent to client
Cleanup: Socketpair closed, resources freed

Main Loop Integration

The GenAI event checking is integrated into the main MySQL handler loop:

cpp

handler_again:
    switch (status) {
        case WAITING_CLIENT_DATA:
            handler___status_WAITING_CLIENT_DATA();
#ifdef epoll_create1
            // Check for GenAI responses before processing new client data
            if (check_genai_events()) {
                goto handler_again;  // Process more responses
            }
#endif
            break;
    }

Backend Services

llama-server Integration

The GenAI module is designed to work with llama-server, a high-performance C++ inference server for LLaMA models.

Starting llama-server

bash

# Start embedding server
./llama-server \
    --model /path/to/nomic-embed-text-v1.5.gguf \
    --port 8013 \
    --embedding \
    --ctx-size 512

# Start reranking server (using same model)
./llama-server \
    --model /path/to/nomic-embed-text-v1.5.gguf \
    --port 8012 \
    --ctx-size 512

API Compatibility

The GenAI module expects:

Embedding endpoint: POST /embedding with JSON request
Rerank endpoint: POST /rerank with JSON request

Compatible with:

llama-server
OpenAI-compatible embedding APIs
Custom services with matching request/response format

Testing

TAP Test Suite

Comprehensive TAP tests are available in test/tap/tests/genai_async-t.cpp:

bash

cd test/tap/tests
make genai_async-t
./genai_async-t

Test Coverage:

Single async requests
Sequential requests (embedding and rerank)
Batch requests (10+ documents)
Mixed embedding and rerank
Request/response matching
Error handling (invalid JSON, missing fields)
Special characters (quotes, unicode, etc.)
Large documents (5KB+)
top_n and columns parameters
Concurrent connections

Manual Testing

sql

-- Test embedding
mysql> GENAI: {"type": "embed", "documents": ["test document"]};

-- Test reranking
mysql> GENAI: {
    ->   "type": "rerank",
    ->   "query": "test query",
    ->   "documents": ["doc1", "doc2", "doc3"]
    -> };

Performance Characteristics

Non-Blocking Behavior

MySQL threads: Return immediately after sending request (~1ms)
GenAI workers: Handle blocking HTTP calls (10-100ms typical)
Throughput: Limited by GenAI service capacity and worker thread count

Resource Usage

Per request: 1 socketpair (2 file descriptors)
Memory: Request metadata + pending response storage
Worker threads: Configurable via genai-threads (default: 4)

Scalability

Concurrent requests: Limited by genai-threads and GenAI service capacity
Request queue: Unlimited (pending requests stored in session map)
Recommended: Set genai-threads to match expected concurrency

Error Handling

Common Errors

Error	Cause	Solution
`Failed to create GenAI communication channel`	Socketpair creation failed	Check system limits (ulimit -n)
`Failed to register with GenAI module`	GenAI module not initialized	Run `LOAD GENAI VARIABLES TO RUNTIME`
`Failed to send request to GenAI module`	Write error on socketpair	Check connection stability
`GenAI module not initialized`	GenAI threads not started	Set `genai-threads > 0` and reload

Timeout Handling

Requests exceeding genai-embedding_timeout_ms or genai-rerank_timeout_ms will fail with:

Status code > 0 in response header
Error message in JSON result
Socketpair cleanup

Monitoring

Status Variables

sql

-- Check GenAI module status (not yet implemented, planned)
SHOW STATUS LIKE 'genai-%';

Planned status variables:

genai_threads_initialized: Number of worker threads running
genai_active_requests: Currently processing requests
genai_completed_requests: Total successful requests
genai_failed_requests: Total failed requests

Logging

GenAI operations log at debug level:

bash

# Enable GenAI debug logging
SET mysql-debug = 1;

# Check logs
tail -f proxysql.log | grep GenAI

Limitations

Current Limitations

document_from_sql: Not yet implemented (requires MySQL connection handling in workers)
Shared memory: Result pointer field reserved for future optimization
Request size: Limited by socket buffer size (typically 64KB-256KB)

Platform Requirements

Epoll support: Linux systems (kernel 2.6+)
Socketpair: Unix domain sockets
Threading: POSIX threads (pthread)

Future Enhancements

Planned Features

document_from_sql: Execute SQL to retrieve documents for reranking
Shared memory: Zero-copy result transfer for large responses
Connection pooling: Reuse HTTP connections to GenAI services
Metrics: Enhanced monitoring and statistics
Batch optimization: Better support for large document batches
Streaming: Progressive result delivery for large operations

SQLite3 Server Documentation - SQLite3 backend integration

Source Files

Core Implementation

include/GenAI_Thread.h - GenAI module interface and structures
lib/GenAI_Thread.cpp - Implementation of listener and worker loops
include/MySQL_Session.h - Session integration (GenAI async state)
lib/MySQL_Session.cpp - Async handlers and main loop integration
include/Base_Session.h - Base session GenAI members

Tests

test/tap/tests/genai_module-t.cpp - Admin commands and variables
test/tap/tests/genai_embedding_rerank-t.cpp - Basic embedding/reranking
test/tap/tests/genai_async-t.cpp - Async architecture tests

License

Same as ProxySQL - See LICENSE file for details.

Contributing

For contributions and issues:

GitHub: https://github.com/sysown/proxysql
Branch: v3.1-vec_genAI_module

Last Updated: 2025-01-10 Module Version: 0.1.0

GenAI Module Documentation

GenAI Module Documentation

Overview

Version

Architecture

Async Design

Key Components

Communication Protocol

Request Format (MySQL → GenAI)

Response Format (GenAI → MySQL)

Configuration Variables

Thread Configuration

Service Endpoints

Timeouts

Admin Commands

Query Syntax

GENAI: Query Format

Supported Operations

1. Embedding

2. Reranking

Response Format

Usage Examples

Basic Embedding

Basic Reranking

Top N Results

Index and Score Only

Integration with ProxySQL

Session Lifecycle

Main Loop Integration

Backend Services

llama-server Integration

Starting llama-server

API Compatibility

Testing

TAP Test Suite

Manual Testing

Performance Characteristics

Non-Blocking Behavior

Resource Usage

Scalability

Error Handling

Common Errors

Timeout Handling

Monitoring

Status Variables

Logging

Limitations

Current Limitations

Platform Requirements

Future Enhancements

Planned Features

Related Documentation

Source Files

Core Implementation

Tests

License

Contributing