.tasks/core/INDEX-002-five-phase-indexing-pipeline.md
Implement the multi-phase indexing pipeline that breaks filesystem discovery and processing into atomic, resumable stages. The ephemeral engine runs only Phase 1 (Discovery), while the persistent engine runs all five phases with full database writes and content analysis.
Used by: Ephemeral & Persistent
Parallel filesystem walk optimized for raw speed:
.git, node_modules, .gitignore)DirEntry objectsImplementation: core/src/ops/indexing/phases/discovery.rs
Used by: Persistent Only
Converts discovered entries into database records:
state.ephemeral_uuidsdirectory_paths table for O(1) lookupsImplementation: core/src/ops/indexing/phases/processing.rs
Used by: Persistent Only
Bottom-up recursive statistics calculation:
aggregate_size - Total bytes including subdirectorieschild_count - Direct children onlyfile_count - Recursive file countEnables instant "True Size" sorting without traversing descendants.
Implementation: core/src/ops/indexing/phases/aggregation.rs
Used by: Persistent Only
Content addressable storage via BLAKE3 hashing:
kind_id and mime_type_idcontent_identity tableImplementation: core/src/ops/indexing/phases/content.rs
Used by: Persistent Only
Post-processing and processor dispatch:
Implementation: Handled in core/src/ops/indexing/job.rs
core/src/ops/indexing/phases/discovery.rs - Phase 1core/src/ops/indexing/phases/processing.rs - Phase 2core/src/ops/indexing/phases/aggregation.rs - Phase 3core/src/ops/indexing/phases/content.rs - Phase 4core/src/ops/indexing/phases/mod.rs - Phase enum and orchestrationcore/src/ops/indexing/job.rs - IndexerJob runs phases sequentiallycore/src/ops/indexing/state.rs - IndexerState tracks current phase and progresscore/src/ops/indexing/progress.rs - Progress reporting per phaseThe pipeline supports three depth modes:
| Mode | Phases Run | Speed | Use Case |
|---|---|---|---|
| Shallow | 1, 2, 3 | Fast | UI navigation, quick scan |
| Content | 1, 2, 3, 4 | Medium | Normal indexing with dedup |
| Deep | 1, 2, 3, 4, 5 | Slow | Media libraries with thumbnails |
| Scope | Behavior | Use Case |
|---|---|---|
| Current | Index immediate directory only | Responsive UI navigation |
| Recursive | Index entire tree | Full location indexing |
| Configuration | Performance | Notes |
|---|---|---|
| Current + Shallow | <500ms | No subdirectories |
| Recursive + Shallow | ~10K files/sec | Metadata only |
| Recursive + Content | ~1K files/sec | With BLAKE3 hashing |
| Recursive + Deep | ~100 files/sec | Full analysis + thumbnails |
Each phase stores sufficient state in IndexerState to resume:
pub struct IndexerState {
pub phase: Phase,
pub dirs_to_walk: VecDeque<PathBuf>,
pub entry_batches: Vec<Vec<DirEntry>>,
pub entry_id_cache: HashMap<PathBuf, i32>,
pub ephemeral_uuids: HashMap<PathBuf, Uuid>,
pub stats: IndexerStats,
}
When interrupted: