Rules-Free Ephemeral Scan Mode

Description

The ephemeral and persistent indexers share the same RuleToggles configuration, which filters out node_modules, .git, gitignored files, and other development artifacts by default. For browsing this is correct. For file sync and smart copy it's wrong — these operations need to see every file on disk to guarantee completeness.

This task adds a "complete scan" mode to ephemeral indexing that bypasses all filtering rules. Operations like file sync and path intersection request this mode when they need full filesystem coverage.

Problem

File sync between two locations silently skips files excluded by indexer rules
A user syncing their project folder won't get node_modules, dist/, or gitignored files
There's no way to distinguish "file doesn't exist on target" from "file was filtered by rules"
The sync_conduit schema has use_index_rules and index_mode_override fields but neither is wired into anything

Design

RuleToggles Addition

rust

// core/src/ops/indexing/rules.rs

impl RuleToggles {
    /// All rules disabled. Indexes every file on disk.
    /// Used by file sync, smart copy, and path intersection operations.
    pub fn complete() -> Self {
        Self {
            no_system_files: false,
            no_hidden: false,
            no_git: false,
            gitignore: false,
            only_images: false,
            no_dev_dirs: false,
        }
    }
}

IndexerJobConfig Integration

The ephemeral indexer already accepts RuleToggles via IndexerJobConfig. Complete scan mode is just a configuration option, not a new code path.

rust

// core/src/ops/indexing/input.rs

impl IndexerJobConfig {
    /// Ephemeral scan with no filtering rules.
    /// Returns complete filesystem state for sync and diff operations.
    pub fn complete_scan(path: SdPath, scope: IndexScope) -> Self {
        Self {
            persistence: IndexPersistence::Ephemeral,
            rule_toggles: RuleToggles::complete(),
            is_volume_indexing: false,
            ..Self::ephemeral_browse(path, scope, false)
        }
    }
}

Invocation from File Sync / Smart Copy

When file sync or path intersection needs complete coverage, it requests a complete scan instead of a regular ephemeral browse:

rust

// In SyncResolver or PathIntersection operation

let config = IndexerJobConfig::complete_scan(
    source_path.clone(),
    IndexScope::Recursive,
);

// Submit indexing job and wait for completion
let job_id = ctx.job_manager().submit(IndexerJob::new(config)).await?;
ctx.job_manager().wait_for(job_id).await?;

// Now the ephemeral cache has complete filesystem state for source_path

Coexistence with Filtered Indexes

A path can be indexed with rules (for browsing) and without rules (for sync) in the same session. The ephemeral index is additive — a complete scan adds entries that were previously filtered, it doesn't remove existing entries. Entries already present from a filtered scan keep their UUIDs.

rust

// core/src/ops/indexing/ephemeral/index.rs

impl EphemeralIndex {
    // add_entry() already skips duplicates:
    // "Only adds if path not already indexed (prevents duplicates)"
    // So a complete scan after a filtered scan fills gaps without overwriting.
}

Sync Conduit Schema Wiring

Wire the existing use_index_rules column on sync_conduit to control whether the resolver requests a complete scan or uses the existing filtered index:

rust

// core/src/service/file_sync/resolver.rs

impl SyncResolver {
    async fn ensure_index_coverage(
        &self,
        conduit: &sync_conduit::Model,
        path: &SdPath,
    ) -> Result<()> {
        if !conduit.use_index_rules {
            // Request complete ephemeral scan
            let config = IndexerJobConfig::complete_scan(
                path.clone(),
                IndexScope::Recursive,
            );
            self.job_manager.submit_and_wait(IndexerJob::new(config)).await?;
        }
        // If use_index_rules is true, use whatever is already indexed
        Ok(())
    }
}

Implementation Steps

1. Add `RuleToggles::complete()` Constructor

Single method on the existing struct. No structural changes needed.

File: core/src/ops/indexing/rules.rs

2. Add `IndexerJobConfig::complete_scan()` Constructor

New constructor that sets RuleToggles::complete() and ephemeral persistence. Follows the same pattern as existing ephemeral_browse() and persistent_index() constructors.

File: core/src/ops/indexing/input.rs

3. Wire `use_index_rules` in SyncResolver

When sync_conduit.use_index_rules is false, the resolver triggers a complete ephemeral scan before calculating operations. This ensures the ephemeral cache has full filesystem state.

File: core/src/service/file_sync/resolver.rs

4. Verify Additive Behavior

Confirm that running a complete scan on an already-indexed path adds new entries (previously filtered) without removing or duplicating existing ones. The current add_entry() logic already skips duplicates, but verify this works correctly when a filtered scan happened first.

File: core/src/ops/indexing/ephemeral/index.rs (verification, may not need changes)

Files to Create/Modify

Modified Files:

core/src/ops/indexing/rules.rs - Add RuleToggles::complete()
core/src/ops/indexing/input.rs - Add IndexerJobConfig::complete_scan()
core/src/service/file_sync/resolver.rs - Wire use_index_rules to trigger complete scans

Acceptance Criteria

RuleToggles::complete() disables all filtering rules
IndexerJobConfig::complete_scan() creates ephemeral config with no rules
Complete scan indexes files that would be filtered by default rules (node_modules, .git, etc.)
Complete scan after a filtered scan adds missing entries without duplicating existing ones
Existing ephemeral UUIDs are preserved when a complete scan fills gaps
sync_conduit.use_index_rules = false triggers complete scan in resolver
Integration test: complete scan includes node_modules directory
Integration test: complete scan includes hidden files and .git
Integration test: complete scan after filtered scan preserves original UUIDs

Technical Notes

Why Not a Separate Cache?

The ephemeral index is already a single shared instance. Creating a separate "complete" index would double memory usage and complicate UUID management. The additive approach (fill gaps in the existing index) is simpler and uses the same UUID reconciliation from INDEX-010.

Performance Expectation

A complete scan of a large project directory will index more entries than a filtered scan (potentially 10-100x more for projects with heavy node_modules). The ephemeral index handles this well at ~50 bytes/entry, but memory usage should be monitored for volume-level complete scans.

System Files Exception

Even in complete mode, the OS kernel virtual filesystems (/dev, /sys, /proc) should probably still be excluded since they contain pseudo-files that can cause hangs. Consider keeping a minimal NEVER_INDEX rule that can't be disabled for truly dangerous paths.

INDEX-005 - Indexer Rules Engine (the rules system this extends)
INDEX-010 - Bidirectional UUID Reconciliation (reconcile complete scan UUIDs with persistent)
FSYNC-003 - FileSyncService Core (primary consumer of complete scans)
FILE-006 - Path Intersection & Smart Diff (needs complete coverage for accuracy)

Description

Problem

Design

RuleToggles Addition

IndexerJobConfig Integration

Invocation from File Sync / Smart Copy

Coexistence with Filtered Indexes

Sync Conduit Schema Wiring

Implementation Steps

1. Add RuleToggles::complete() Constructor

2. Add IndexerJobConfig::complete_scan() Constructor

3. Wire use_index_rules in SyncResolver

4. Verify Additive Behavior

Files to Create/Modify

Acceptance Criteria

Technical Notes

Why Not a Separate Cache?

Performance Expectation

System Files Exception

Related Tasks

1. Add `RuleToggles::complete()` Constructor

2. Add `IndexerJobConfig::complete_scan()` Constructor

3. Wire `use_index_rules` in SyncResolver