.tasks/core/INDEX-011-rules-free-ephemeral-scan.md
The ephemeral and persistent indexers share the same RuleToggles configuration, which filters out node_modules, .git, gitignored files, and other development artifacts by default. For browsing this is correct. For file sync and smart copy it's wrong — these operations need to see every file on disk to guarantee completeness.
This task adds a "complete scan" mode to ephemeral indexing that bypasses all filtering rules. Operations like file sync and path intersection request this mode when they need full filesystem coverage.
node_modules, dist/, or gitignored filessync_conduit schema has use_index_rules and index_mode_override fields but neither is wired into anything// core/src/ops/indexing/rules.rs
impl RuleToggles {
/// All rules disabled. Indexes every file on disk.
/// Used by file sync, smart copy, and path intersection operations.
pub fn complete() -> Self {
Self {
no_system_files: false,
no_hidden: false,
no_git: false,
gitignore: false,
only_images: false,
no_dev_dirs: false,
}
}
}
The ephemeral indexer already accepts RuleToggles via IndexerJobConfig. Complete scan mode is just a configuration option, not a new code path.
// core/src/ops/indexing/input.rs
impl IndexerJobConfig {
/// Ephemeral scan with no filtering rules.
/// Returns complete filesystem state for sync and diff operations.
pub fn complete_scan(path: SdPath, scope: IndexScope) -> Self {
Self {
persistence: IndexPersistence::Ephemeral,
rule_toggles: RuleToggles::complete(),
is_volume_indexing: false,
..Self::ephemeral_browse(path, scope, false)
}
}
}
When file sync or path intersection needs complete coverage, it requests a complete scan instead of a regular ephemeral browse:
// In SyncResolver or PathIntersection operation
let config = IndexerJobConfig::complete_scan(
source_path.clone(),
IndexScope::Recursive,
);
// Submit indexing job and wait for completion
let job_id = ctx.job_manager().submit(IndexerJob::new(config)).await?;
ctx.job_manager().wait_for(job_id).await?;
// Now the ephemeral cache has complete filesystem state for source_path
A path can be indexed with rules (for browsing) and without rules (for sync) in the same session. The ephemeral index is additive — a complete scan adds entries that were previously filtered, it doesn't remove existing entries. Entries already present from a filtered scan keep their UUIDs.
// core/src/ops/indexing/ephemeral/index.rs
impl EphemeralIndex {
// add_entry() already skips duplicates:
// "Only adds if path not already indexed (prevents duplicates)"
// So a complete scan after a filtered scan fills gaps without overwriting.
}
Wire the existing use_index_rules column on sync_conduit to control whether the resolver requests a complete scan or uses the existing filtered index:
// core/src/service/file_sync/resolver.rs
impl SyncResolver {
async fn ensure_index_coverage(
&self,
conduit: &sync_conduit::Model,
path: &SdPath,
) -> Result<()> {
if !conduit.use_index_rules {
// Request complete ephemeral scan
let config = IndexerJobConfig::complete_scan(
path.clone(),
IndexScope::Recursive,
);
self.job_manager.submit_and_wait(IndexerJob::new(config)).await?;
}
// If use_index_rules is true, use whatever is already indexed
Ok(())
}
}
RuleToggles::complete() ConstructorSingle method on the existing struct. No structural changes needed.
File: core/src/ops/indexing/rules.rs
IndexerJobConfig::complete_scan() ConstructorNew constructor that sets RuleToggles::complete() and ephemeral persistence. Follows the same pattern as existing ephemeral_browse() and persistent_index() constructors.
File: core/src/ops/indexing/input.rs
use_index_rules in SyncResolverWhen sync_conduit.use_index_rules is false, the resolver triggers a complete ephemeral scan before calculating operations. This ensures the ephemeral cache has full filesystem state.
File: core/src/service/file_sync/resolver.rs
Confirm that running a complete scan on an already-indexed path adds new entries (previously filtered) without removing or duplicating existing ones. The current add_entry() logic already skips duplicates, but verify this works correctly when a filtered scan happened first.
File: core/src/ops/indexing/ephemeral/index.rs (verification, may not need changes)
Modified Files:
core/src/ops/indexing/rules.rs - Add RuleToggles::complete()core/src/ops/indexing/input.rs - Add IndexerJobConfig::complete_scan()core/src/service/file_sync/resolver.rs - Wire use_index_rules to trigger complete scansRuleToggles::complete() disables all filtering rulesIndexerJobConfig::complete_scan() creates ephemeral config with no rulessync_conduit.use_index_rules = false triggers complete scan in resolverThe ephemeral index is already a single shared instance. Creating a separate "complete" index would double memory usage and complicate UUID management. The additive approach (fill gaps in the existing index) is simpler and uses the same UUID reconciliation from INDEX-010.
A complete scan of a large project directory will index more entries than a filtered scan (potentially 10-100x more for projects with heavy node_modules). The ephemeral index handles this well at ~50 bytes/entry, but memory usage should be monitored for volume-level complete scans.
Even in complete mode, the OS kernel virtual filesystems (/dev, /sys, /proc) should probably still be excluded since they contain pseudo-files that can cause hangs. Consider keeping a minimal NEVER_INDEX rule that can't be disabled for truly dangerous paths.