eden/.llms/rules/ACR_changeset_path_scaling.md
Severity: CRITICAL
Operations that scale with O(changeset_paths) — iterating, collecting, or processing all paths in a changeset. Large directories and bulk commits (e.g., codemod commits touching hundreds of thousands of files) make this a production risk.
BAD (collecting all paths into memory):
let all_paths: Vec<_> = changeset.file_changes().collect();
for path in &all_paths {
check_hook(path).await?;
}
BAD (per-path DB lookup without batching):
for path in changeset.file_changes() {
let metadata = db.get_file_metadata(path).await?;
// ...
}
GOOD (streaming with bounded concurrency):
changeset
.file_changes()
.try_for_each_concurrent(100, |path| async move {
check_hook(path).await
})
.await?;
GOOD (batched DB lookups):
for chunk in changeset.file_changes().chunks(1000) {
let metadata = db.get_file_metadata_batch(&chunk).await?;
// ...
}
Some repositories contain commits that touch hundreds of thousands of paths (codemod commits, large directory moves). Code that is O(changeset_paths) without streaming, batching, or pagination will OOM, timeout, or saturate downstream services when it hits these commits. Large directories (e.g., fbcode/third-party) compound the problem since a single directory listing can return millions of entries.
Always assume a changeset can touch an unbounded number of paths. Use streaming or pagination over collecting. Batch downstream calls. Apply bounded concurrency. If you must collect paths, add a size check and fail early with a clear error rather than OOMing.