.clinerules/01-basic.md
LightRAG is a mature, production-ready Retrieval-Augmented Generation (RAG) system with comprehensive knowledge graph capabilities. The system has evolved from experimental to production-ready status with extensive functionality across all major components.
space1 for data isolationPattern: Always handle both base64 and raw array embedding formats
Location: lightrag/llm/openai.py - openai_embed function
Issue: Custom OpenAI-compatible endpoints return embeddings as raw arrays, not base64 strings
Solution:
np.array(dp.embedding, dtype=np.float32) if isinstance(dp.embedding, list)
else np.frombuffer(base64.b64decode(dp.embedding), dtype=np.float32)
Impact: Document processing fails completely without this dual format support
Pattern: Always await coroutines before calling methods on the result
Common Error: coroutine.method() instead of (await coroutine).method()
Locations: MongoDB implementations, Neo4j operations
Example: await self._data.list_indexes() then await cursor.to_list()
Pattern: Always filter deprecated/incompatible fields during deserialization
Common Fields to Remove: content, _id (MongoDB), database-specific fields
Implementation: data.pop('field_name', None) before creating dataclass objects
Locations: All storage implementations (JSON, Redis, MongoDB, PostgreSQL)
Pattern: Always sort relationship pairs for consistent lock keys
Implementation: sorted_key_parts = sorted([src, tgt]) then f"{sorted_key_parts[0]}-{sorted_key_parts[1]}"
Impact: Prevents deadlocks in concurrent relationship processing
Pattern: Handle event loop mismatches during shutdown gracefully Implementation: Timeout + specific RuntimeError handling for "attached to a different loop" Location: Neo4j storage finalization Impact: Prevents application shutdown failures
Pattern: Never hold locks across async generator yields - create snapshots instead
Issue: Holding locks while yielding causes deadlock when consumers need the same lock
Location: lightrag/tools/migrate_llm_cache.py - stream_default_caches_json
Solution: Create snapshot of data while holding lock, release lock, then iterate over snapshot
# WRONG - Deadlock prone:
async with storage._storage_lock:
for key, value in storage._data.items():
batch[key] = value
if len(batch) >= batch_size:
yield batch # Lock still held!
# CORRECT - Snapshot approach:
async with storage._storage_lock:
matching_items = [(k, v) for k, v in storage._data.items() if condition]
# Lock released here
for key, value in matching_items:
batch[key] = value
if len(batch) >= batch_size:
yield batch # No lock held
Impact: Prevents deadlocks in Json→Json migrations and similar scenarios where source/target share locks Applicable To: Any async generator that needs to access shared resources while yielding
Pattern: Pass configuration through object constructors, not direct imports Example: OllamaAPI receives configuration through LightRAG object Benefit: Better testability and modularity
Pattern: Maintain comprehensive memory bank for development continuity Structure: Core files (projectbrief.md, activeContext.md, progress.md, etc.) Purpose: Essential for context preservation across development sessions
Pattern: Centralize defaults in constants.py, use environment variables for runtime config Implementation: Default values in constants, override via .env file Benefit: Consistent configuration across components
Package Manager: ALWAYS USE BUN - Never use npm or yarn unless Bun is unavailable Commands:
bun install - Install dependenciesbun run dev - Start development serverbun run build - Build for productionbun run lint - Run lintingbun test - Run testsbun run preview - Preview production buildPattern: All frontend operations must use Bun commands
Fallback: Only use npm/yarn if Bun installation fails
Testing: Use bun test for all frontend testing
Pattern: All tests must use pytest markers for proper CI/CD execution Test Categories:
@pytest.mark.offline - No external dependencies (runs in CI)@pytest.mark.integration - Requires databases/APIs (skipped by default)Commands:
pytest tests/ -m offline -v - CI default (~3 seconds for 21 tests)pytest tests/ --run-integration -v - Full test suite (all 46 tests)Best Practices:
Configuration:
tests/pytest.ini - Marker definitions and test discoverytests/conftest.py - Fixtures and custom options.github/workflows/tests.yml - CI/CD workflow (Python 3.10/3.11/3.12)Documentation: See memory-bank/testing-guidelines.md for complete testing guidelines
Impact: Ensures all tests run reliably in CI without external services while maintaining comprehensive integration test coverage for local development
Pitfall: Assuming all endpoints return base64-encoded embeddings Solution: Always check format and handle both base64 and raw arrays
Pitfall: Calling methods on coroutines instead of awaited results Solution: Always await coroutines before accessing their methods
Pitfall: Breaking changes when removing fields from dataclasses Solution: Filter deprecated fields during deserialization, don't break storage
Pitfall: Inconsistent lock key generation causing deadlocks Solution: Always sort keys for deterministic lock ordering
Pitfall: Event loop mismatches during shutdown Solution: Implement timeout and specific error handling for loop issues
The project has evolved from experimental to production-ready status. Key milestones:
The system now supports enterprise-level deployments with comprehensive functionality across all components.