Back to Sapling

Repeated Large Traversal

eden/.llms/rules/ACR_repeated_large_traversal.md

latest3.7 KB
Original Source

Repeated Large Traversal

Severity: CRITICAL

What to Look For

  • Collecting file changes from a commit then iterating over the collection multiple times for different purposes instead of a single pass
  • Walking the same commit graph segment or manifest tree twice when one traversal suffices
  • Sequential manifest loading across a commit range -- loading manifests one by one in a loop instead of using batch derivation
  • Re-fetching data in an inner loop that was already available from an outer scope

When to Flag

  • let changes: Vec<_> = cs.file_changes().collect() followed by multiple for change in &changes or .iter().filter(...) passes that could be merged
  • Two separate traversals of the same commit range (e.g., one to collect IDs, another to load data for those IDs)
  • for cs_id in commit_range { derive::<RootManifestId>(ctx, repo, cs_id).await? } -- sequential manifest derivation instead of batch
  • Loading a changeset or manifest inside an inner loop when the same data was fetched (or could have been fetched) in an outer loop or earlier scope
  • Calling .file_changes() or .changed_files() on the same bonsai changeset multiple times

Do NOT Flag

  • Multiple passes over a small, fixed-size collection (e.g., 2-3 elements)
  • Separate traversals that genuinely need different starting points or parameters
  • Cases where the second traversal depends on results computed during the first (true data dependency)
  • Test code

Examples

BAD (multiple passes over file changes):

rust
let changes: Vec<_> = bonsai.file_changes().collect();

// Pass 1: check for large files
for (path, change) in &changes {
    if let Some(fc) = change.simplify() {
        check_size(fc.size())?;
    }
}

// Pass 2: check for restricted paths
for (path, change) in &changes {
    check_restricted_path(path)?;
}

GOOD (single pass):

rust
for (path, change) in bonsai.file_changes() {
    check_restricted_path(path)?;
    if let Some(fc) = change.simplify() {
        check_size(fc.size())?;
    }
}

BAD (sequential manifest derivation -- matches real performance issues):

rust
for cs_id in &commit_ids {
    let mf_id = repo
        .derive::<RootManifestId>(ctx, *cs_id)
        .await?;
    process_manifest(mf_id).await?;
}

GOOD (batch derivation):

rust
let mf_ids = repo
    .derive_batch::<RootManifestId>(ctx, commit_ids.clone())
    .await?;
for mf_id in mf_ids {
    process_manifest(mf_id).await?;
}

Or for linear stacks:

rust
derive_manifests_for_simple_stack_of_commits(ctx, repo, commit_ids).await?;

BAD (re-fetching in inner loop):

rust
for cs_id in &ancestors {
    let bonsai = cs_id.load(ctx, repo.repo_blobstore()).await?;
    for (path, _change) in bonsai.file_changes() {
        // Fetches the same manifest for cs_id again inside the loop
        let mf = repo.derive::<RootManifestId>(ctx, *cs_id).await?;
        let entry = mf.find_entry(ctx, repo.repo_blobstore(), path.clone()).await?;
        process(entry)?;
    }
}

GOOD (fetch once, reuse):

rust
for cs_id in &ancestors {
    let bonsai = cs_id.load(ctx, repo.repo_blobstore()).await?;
    let mf = repo.derive::<RootManifestId>(ctx, *cs_id).await?;
    let paths: Vec<_> = bonsai.file_changes().map(|(p, _)| p.clone()).collect();
    let entries = mf.find_entries(ctx, repo.repo_blobstore(), paths).await?;
    for entry in entries {
        process(entry)?;
    }
}

Recommendation

Merge multiple passes over the same collection into a single loop. Use derive_batch or derive_manifests_for_simple_stack_of_commits instead of deriving manifests one at a time in a loop. Hoist data fetches out of inner loops when they depend only on outer-loop variables. Use find_entries for batch path lookup instead of calling find_entry in a loop.