eden/.llms/rules/ACR_sequential_blobstore_fetches.md
Severity: HIGH
.load() calls in a loop instead of concurrent loading via buffer_unorderedblobstore.get() calls in a loop instead of concurrent fetching via buffer_unorderedderive() calls when derive_batch / derive_exactly_batch / fetch_batch existsfetch_many_edges calls one at a time instead of batching changeset IDsget_hg_from_bonsai, get_git_sha1_from_bonsai) in a loop instead of using the bulk .get() method with all IDs at onceNote: The Blobstore and Loadable traits do NOT have bulk get/put methods. The only way to batch raw blobstore and loadable operations is stream::iter(...).buffer_unordered(N). However, the mapping traits (BonsaiHgMapping, BonsaiGitMapping, BonsaiGlobalrevMapping, BonsaiSvnrevMapping) DO have bulk APIs -- their .get() method accepts multiple IDs, and they have bulk_add / bulk_import for writes.
for id in ids { id.load(ctx, blobstore).await? } -- sequential load in a loopfor key in keys { blobstore.get(ctx, &key).await? } -- sequential blobstore get in a loop (no bulk API exists; use buffer_unordered)for cs_id in cs_ids { derive::<T>(ctx, cs_id).await? } -- sequential derivation when batch API existsfor cs_id in cs_ids { commit_graph.fetch_many_edges(ctx, &[cs_id]).await? } -- single-element batch calls in a loopfor cs_id in cs_ids { mapping.get_hg_from_bonsai(ctx, cs_id).await? } -- sequential single-ID mapping lookup when the bulk .get() accepts all IDs at oncefor loop or map + collect that awaits a blobstore/derivation/mapping future on each iterationstream::iter(...).map(...).buffer_unordered(N) -- this IS the correct patternderive_batch / derive_exactly_batch / fetch_batch calls -- these ARE bulk APIsBAD (sequential load in loop):
let mut changesets = Vec::new();
for cs_id in &changeset_ids {
let bonsai = cs_id.load(ctx, repo.repo_blobstore()).await?;
changesets.push(bonsai);
}
GOOD (buffer_unordered):
let changesets: Vec<_> = stream::iter(changeset_ids)
.map(|cs_id| async move {
cs_id.load(ctx, repo.repo_blobstore()).await
})
.buffer_unordered(100)
.try_collect()
.await?;
BAD (sequential blobstore get):
let mut blobs = Vec::new();
for key in &keys {
let blob = blobstore.get(ctx, key).await?;
blobs.push(blob);
}
GOOD (buffer_unordered):
let blobs: Vec<_> = stream::iter(keys)
.map(|key| async move {
blobstore.get(ctx, &key).await
})
.buffer_unordered(100)
.try_collect()
.await?;
BAD (sequential derive):
let mut derived = Vec::new();
for cs_id in &cs_ids {
let d = repo.derive::<SkeletonManifestId>(ctx, *cs_id).await?;
derived.push(d);
}
GOOD (batch derive):
let derived = repo
.derive_batch::<SkeletonManifestId>(ctx, cs_ids.clone())
.await?;
BAD (single-element fetch_many_edges in a loop):
for cs_id in &cs_ids {
let edges = commit_graph.fetch_many_edges(ctx, &[*cs_id]).await?;
process(edges)?;
}
GOOD (batch all IDs at once):
let all_edges = commit_graph.fetch_many_edges(ctx, &cs_ids).await?;
for (cs_id, edges) in all_edges {
process(edges)?;
}
BAD (sequential single-ID mapping lookup):
let mut hg_ids = Vec::new();
for cs_id in &changeset_ids {
let hg_id = bonsai_hg_mapping.get_hg_from_bonsai(ctx, *cs_id).await?;
hg_ids.push(hg_id);
}
GOOD (bulk mapping lookup):
let entries = bonsai_hg_mapping
.get(ctx, changeset_ids.clone().into())
.await?;
let hg_ids: HashMap<_, _> = entries
.into_iter()
.map(|entry| (entry.bcs_id, entry.hg_cs_id))
.collect();
The same pattern applies to BonsaiGitMapping::get(ctx, BonsaisOrGitShas::Bonsai(ids)), BonsaiGlobalrevMapping::get(ctx, BonsaisOrGlobalrevs::Bonsai(ids)), and BonsaiSvnrevMapping::get(ctx, BonsaisOrSvnrevs::Bonsai(ids)) -- all accept multiple IDs. Avoid calling the single-ID convenience methods (get_hg_from_bonsai, get_git_sha1_from_bonsai, get_bonsai_from_globalrev, etc.) in a loop.
For raw blobstore operations (Blobstore::get, Blobstore::put, Loadable::load, Storable::store), there are no bulk/batch methods on these traits -- the only way to batch is stream::iter(...).map(...).buffer_unordered(N). Choose N based on downstream capacity: 64-100 for blobstore operations, 10-50 for SQL-backed stores.
For higher-level operations, prefer dedicated batch APIs when available:
BonsaiDerivable::derive_batch / fetch_batch / store_mapping_batch -- batch derived data operations (note: fetch_batch's default impl itself uses buffer_unordered internally)CommitGraphStorage::fetch_many_edges -- batch commit graph edge fetchingderive_manifests_for_simple_stack_of_commits -- batch manifest derivation for linear stacksfind_entries -- batch path lookup in a manifest treeFor ID mapping lookups, use the bulk .get() method instead of single-ID convenience methods in a loop:
BonsaiHgMapping::get(ctx, BonsaiOrHgChangesetIds) + bulk_add -- bulk read/write of bonsai-to-hg mappingsBonsaiGitMapping::get(ctx, BonsaisOrGitShas) + bulk_add -- bulk read/write of bonsai-to-git mappingsBonsaiGlobalrevMapping::get(ctx, BonsaisOrGlobalrevs) + bulk_import -- bulk read/write of bonsai-to-globalrev mappingsBonsaiSvnrevMapping::get(ctx, BonsaisOrSvnrevs) + bulk_import -- bulk read/write of bonsai-to-svnrev mappingsThese mapping traits' .get() methods accept multiple IDs in a single SQL query. The convenience methods (get_hg_from_bonsai, get_git_sha1_from_bonsai, etc.) are wrappers that pass a single ID -- calling them in a loop issues N separate SQL queries instead of one.