Back to Localgpt

📚 API Reference (Backend & RAG API)

Documentation/api_reference.md

latest8.0 KB
Original Source

📚 API Reference (Backend & RAG API)

Last updated: 2025-01-07


Backend HTTP API (Python backend/server.py)

Base URL: http://localhost:8000

EndpointMethodDescriptionRequest BodySuccess Response
/healthGETHealth probe incl. Ollama status & DB stats–200 JSON { status, ollama_running, available_models, database_stats }
/chatPOSTStateless chat (no session){ message:str, model?:str, conversation_history?:[{role,content}]}200 { response:str, model:str, message_count:int }
/sessionsGETList all sessions–{ sessions:ChatSession[], total:int }
/sessionsPOSTCreate session{ title?:str, model?:str }201 { session:ChatSession, session_id }
/sessions/<id>GETGet session + msgs–{ session, messages }
/sessions/<id>DELETEDelete session–{ message, deleted_session_id }
/sessions/<id>/renamePOSTRename session{ title:str }{ message, session }
/sessions/<id>/messagesPOSTSession chat (builds history)See ChatRequest + retrieval opts â–¼{ response, session, user_message_id, ai_message_id }
/sessions/<id>/documentsGETList uploaded docs–{ files:string[], file_count:int, session }
/sessions/<id>/uploadPOST multipartUpload docs to sessionfield files[]{ message, uploaded_files, processing_results?, session_documents?, total_session_documents? }
/sessions/<id>/indexPOSTTrigger RAG indexing for session{ latechunk?, doclingChunk?, chunkSize?, ... }{ message }
/sessions/<id>/indexesGETList indexes linked to session–{ indexes, total }
/sessions/<sid>/indexes/<idxid>POSTLink index to session–{ message }
/sessions/cleanupGETRemove empty sessions–{ message, cleanup_count }
/modelsGETList generation / embedding models–{ generation_models:str[], embedding_models:str[] }
/indexesGETList all indexes–{ indexes, total }
/indexesPOSTCreate index{ name:str, description?:str, metadata?:dict }{ index_id }
/indexes/<id>GETGet single index–{ index }
/indexes/<id>DELETEDelete index–{ message, index_id }
/indexes/<id>/uploadPOST multipartUpload docs to indexfield files[]{ message, uploaded_files }
/indexes/<id>/buildPOSTBuild / rebuild index (RAG){ latechunk?, doclingChunk?, ...}200 { response?, message?} (idempotent)

RAG API (Python rag_system/api_server.py)

Base URL: http://localhost:8001

EndpointMethodDescriptionRequest BodySuccess Response
/chatPOSTRun RAG query with full pipelineSee RAG ChatRequest â–¼{ answer:str, source_documents:[], reasoning?:str, confidence?:float }
/chat/streamPOSTRun RAG query with SSE streamingSame as /chatServer-Sent Events stream
/indexPOSTIndex documents with full configurationSee Index Request â–¼{ message:str, indexed_files:[], table_name:str }
/modelsGETList available models–{ generation_models:str[], embedding_models:str[] }

RAG ChatRequest (Advanced Options)

jsonc
{
  "query": "string",                    // Required – user question
  "session_id": "string",               // Optional – for session context
  "table_name": "string",               // Optional – specific index table
  "compose_sub_answers": true,          // Optional – compose sub-answers 
  "query_decompose": true,              // Optional – decompose complex queries
  "ai_rerank": false,                   // Optional – AI-powered reranking
  "context_expand": false,              // Optional – context expansion
  "verify": true,                       // Optional – answer verification
  "retrieval_k": 20,                    // Optional – number of chunks to retrieve
  "context_window_size": 1,             // Optional – context window size
  "reranker_top_k": 10,                 // Optional – top-k after reranking
  "search_type": "hybrid",              // Optional – "hybrid|dense|fts"
  "dense_weight": 0.7,                  // Optional – dense search weight (0-1)
  "force_rag": false,                   // Optional – bypass triage, force RAG
  "provence_prune": false,              // Optional – sentence-level pruning
  "provence_threshold": 0.8,            // Optional – pruning threshold
  "model": "qwen3:8b"                   // Optional – generation model override
}

Index Request (Document Indexing)

jsonc
{
  "file_paths": ["path1.pdf", "path2.pdf"],  // Required – files to index
  "session_id": "string",                     // Required – session identifier
  "chunk_size": 512,                          // Optional – chunk size (default: 512)
  "chunk_overlap": 64,                        // Optional – chunk overlap (default: 64)
  "enable_latechunk": true,                   // Optional – enable late chunking
  "enable_docling_chunk": false,              // Optional – enable DocLing chunking
  "retrieval_mode": "hybrid",                 // Optional – "hybrid|dense|fts"
  "window_size": 2,                           // Optional – context window
  "enable_enrich": true,                      // Optional – enable enrichment
  "embedding_model": "Qwen/Qwen3-Embedding-0.6B",  // Optional – embedding model
  "enrich_model": "qwen3:0.6b",               // Optional – enrichment model
  "overview_model_name": "qwen3:0.6b",        // Optional – overview model
  "batch_size_embed": 50,                     // Optional – embedding batch size
  "batch_size_enrich": 25                     // Optional – enrichment batch size
}

Note on CORS – All endpoints include Access-Control-Allow-Origin: * header.


Frontend Wrapper (src/lib/api.ts)

The React/Next.js frontend calls the backend via a typed wrapper. Important methods & payloads:

MethodBackend EndpointPayload Shape
checkHealth()/health–
sendMessage({ message, model?, conversation_history? })/chatChatRequest
getSessions()/sessions–
createSession(title?, model?)/sessions–
getSession(sessionId)/sessions/<id>–
sendSessionMessage(sessionId, message, opts)/sessions/<id>/messagesChatRequest + retrieval opts
uploadFiles(sessionId, files[])/sessions/<id>/uploadmultipart
indexDocuments(sessionId)/sessions/<id>/indexopts similar to buildIndex
buildIndex(indexId, opts)/indexes/<id>/buildIndex build options
linkIndexToSession/sessions/<sid>/indexes/<idx>–

Payload Definitions (Canonical)

ChatRequest (frontend ⇄ backend)

jsonc
{
  "message": "string",              // Required – raw user text
  "model": "string",                // Optional – generation model id
  "conversation_history": [         // Optional – prior turn list
    { "role": "user|assistant", "content": "string" }
  ]
}

Session Chat Extended Options

jsonc
{
  "composeSubAnswers": true,
  "decompose": true,
  "aiRerank": false,
  "contextExpand": false,
  "verify": true,
  "retrievalK": 10,
  "contextWindowSize": 5,
  "rerankerTopK": 20,
  "searchType": "fts|hybrid|dense",
  "denseWeight": 0.75,
  "force_rag": false
}

Index Build Options

jsonc
{
  "latechunk": true,
  "doclingChunk": false,
  "chunkSize": 512,
  "chunkOverlap": 64,
  "retrievalMode": "hybrid|dense|fts",
  "windowSize": 2,
  "enableEnrich": true,
  "embeddingModel": "Qwen/Qwen3-Embedding-0.6B",
  "enrichModel": "qwen3:0.6b",
  "overviewModel": "qwen3:0.6b",
  "batchSizeEmbed": 64,
  "batchSizeEnrich": 32
}

This reference is derived from static code analysis of backend/server.py, rag_system/api_server.py, and src/lib/api.ts. Keep it in sync with route or type changes.