Documentation/improvement_plan.md
Revision: 2025-07-05
This document captures high-impact enhancements identified during the July 2025 code-review. Items are grouped by theme and include a short rationale plus suggested implementation notes. No code has been changed – this file is planning only.
| ID | Item | Rationale | Notes |
|---|---|---|---|
| 1.1 | Late-chunk result merging | Returned snippets can be single late-chunks → fragmented. | After retrieval, gather sibling chunks (±1) and concatenate before reranking / display. |
| 1.2 | Tiered retrieval (ANN pre-filter) | Large indexes → LanceDB full scan can be slow. | Use in-memory FAISS/HNSW to narrow to top-N, then exact LanceDB search. |
| 1.3 | Dynamic fusion weights | Different corpora favour dense vs BM25 differently. | Learn weight on small validation set; store in index metadata. |
| 1.4 | Query expansion via KG | Use extracted entities to enrich queries. | Requires Graph-RAG path clean-up first. |
| ID | Item | Rationale |
|---|---|---|
| 2.1 | Embed + cache document overviews | LLM router costs tokens; cosine-similarity pre-check is cheaper. |
| 2.2 | Session-level routing memo | Avoid repeated LLM triage for follow-up queries. |
| 2.3 | Remove legacy pattern rules | Simplifies maintenance once overview & ML routing mature. |
| ID | Item | Rationale |
|---|---|---|
| 3.1 | Parallel document conversion | PDF→MD + chunking is serial today; speed gains possible. |
| 3.2 | Incremental indexing | Re-embedding whole corpus wastes time. |
| 3.3 | Auto GPU dtype selection | Use FP16 on CUDA / MPS for memory and speed. |
| 3.4 | Post-build health check | Catch broken indexes (dim mismatch etc.) early. |
VACUUM when fragmentation > X %./metrics endpoint for Prometheus./health/deep) exercising end-to-end query.embedding_model_name, etc.).mypy --strict, pylint, and black in CI.Reduce complexity and improve maintainability.
Feel free to rearrange based on team objectives and resource availability.