etc/plan/performance.md
Imported on: 2026-04-30
Fully generated as overview, intiial motivation is to keep track of gix-index performance and possible improvements.
Source basis:
TODO(perf) markers in this checkout.TODO(performance) markers, as they are equivalent in intent./Users/byron/dev/github.com/git/git.Working assumption: priorities are based on ordinary interactive and networked Git use, not on synthetic microbenchmarks alone.
These workflows define the impact ordering:
git status, git add, and editor integrations that repeatedly refresh the index, classify the worktree, and scan untracked paths.git commit, git checkout, git switch, git merge, and git reset --hard, which depend on fast index reads, tree/index conversion, checkout, and content merge.git diff, git log -- <path>, and git blame, which traverse trees, objects, and diffs repeatedly.git fetch, git pull, and git clone, which stress negotiation, object lookup, pack ingestion, duplicate avoidance, and pack/index refresh.git push, git gc, git repack, and object verification, which stress pack traversal, delta reconstruction, bitmaps, and multi-pack metadata.git archive and export-style commands, which stream trees with attributes and filters.index-lookup-accelerator: Make index path lookup acceleration a default fast path, not only an ignore-case helper.
gix-dir/src/walk/classify.rs:396: build a multi-threaded hash table so lookups are always accelerated, even for case-sensitive paths.crate-status.md:844: always use multi-threaded initialization of the case-insensitive hash table to accelerate index lookups.gix-index/src/access/mod.rs:144: multi-threaded insertion needs a raw table with multiple bucket locks.status, add, checkout collision checks, untracked filtering, and pathspec-heavy commands in large worktrees.name-hash.c keeps index_state.name_hash and dir_hash for index lookups. It uses lazy initialization, a per-thread threshold (LAZY_THREAD_COST), bucket-derived locks, a two-phase directory/name hash build, and exact-then-ignore-case comparison. Git's read-cache.c wires name-hash updates into index entry insertion.gix-index::AccelerateLookup into a general name/directory lookup accelerator usable for case-sensitive and case-insensitive callers.status and dirwalk on large repos with core.ignoreCase true and false.status-refresh-heuristics: Close the status refresh heuristic gap.
gix-status/src/index_as_worktree/function.rs:144: decide when parallelization is not worth it; Git uses about 500 entries per thread, capped at 20 preload threads.src/plumbing/main.rs:433: make thread-limit tuning configurable; macOS and Linux scale differently.gix/src/status/mod.rs:165: make Repository::is_dirty() a dedicated early-stop implementation with parallelism.git status, and dirty checks in automation.preload-index.c uses MAX_PARALLEL = 20, THREAD_COST = 500, skips staged/submodule/up-to-date/skip-worktree/fsmonitor-valid entries, and marks entries up to date after lstat.is_dirty() path that stops after the first tree/index or index/worktree change.untracked-fsmonitor-cache: Use untracked-cache and fsmonitor data aggressively where valid.
crate-status.md:846: accelerated walk with the UNTR extension.gix-index/src/extension/untracked_cache.rs:29 and :33: understand directory stat data and extension semantics fully.gix-status/src/index_as_worktree/traits.rs:105: make streaming I/O interruptible.status in large repositories with many ignored/untracked files.wt-status.c passes istate->untracked into fill_directory() and avoids full scans when valid; read-cache.c also tweaks fsmonitor and untracked-cache state after index load.UNTR semantics.gix-dir walks and status collection.index-decode-storage: Fix index decode storage costs before adding more threads.
gix-index/src/decode/entries.rs:185: path_backing.extend_from_slice() causes large memmove time despite apparent capacity.crate-status.md:856: threaded index read spends most time storing paths and currently has little benefit.gix-index/src/decode/entries.rs:118: entries behave like an intrusive path-keyed collection; this likely affects ignore-case and lookup.gix-index/src/entry/flags.rs:11: use persisted path length to save in-memory entry size.read-cache.c memory-maps the index, allocates cache entries from mempools, uses the EOIE and IEOT extensions for threaded extension and entry loading, and stores path bytes inline in struct cache_entry (name[FLEX_ARRAY]). It also records an index entry offset table when threaded reads are requested.path_backing growth path and replace it with either pre-sized storage, chunked storage, or per-thread path arenas merged without repeated moves.tree-index-sort: Avoid post-sort work when initializing an index from a tree.
gix-index/src/init.rs:107: remove_file_directory_conflicts() sorts only to protect against invalid trees; typical valid trees already compare in order.cache-tree.c) validates tree/index order and can prime an index's cache tree directly from a tree. read-cache.c and cache-tree.c keep sorted index invariants central.HEAD.worktree-stream-traversal: Reduce double traversal in worktree streaming and archive creation.
gix/src/repository/worktree.rs:88: use the index at HEAD if possible.gix/src/repository/worktree.rs:89: non-HEAD trees are effectively traversed twice; object-cache sharing across copied ODB handles is not trivial.archive, export, and tooling that streams a tree with attributes.HEAD through an existing index plus cache-tree where available.odb-refresh-wait: Replace spin/yield waiting during dynamic pack-index refresh.
gix-odb/src/store_impls/dynamic/load_index.rs:152: a potentially hot loop should probably be a condition variable.packfile.c, and pack loading/re-preparation is explicit. It avoids leaving multiple callers in a spin loop while another caller is loading the same pack metadata.loose-object-single-lookup: Avoid duplicate loose-object lookup work.
gix-odb/src/store_impls/dynamic/find.rs:292: remove loose DB contains() plus try_find() double lookup.gix-odb/src/store_impls/dynamic/header.rs:166: same double lookup for headers.pack-delta-base-cache: Revisit pack delta reconstruction and base caching.
gix-pack/src/data/file/decode/entry.rs:381: optimize memory-intensive delta-chain reconstruction after more tests exist.packfile.c uses a delta_base_cache keyed by pack and base offset with an LRU size limit, a small preallocated delta stack, and delayed insertion into the cache to avoid races while unpacking.fetch-duplicate-objects: Investigate duplicate received objects during fetch.
gix/tests/gix/remote/fetch.rs:128: tests observe substantial duplication when receiving objects.blame-rename-diff-cache: Avoid duplicated tree diff work in blame rename detection.
gix-blame/src/file/function.rs:582: rename tracking repeats tree diff work after the no-rewrite pass.git blame, especially with -M, -C, and path history over rename-heavy repositories.blame.c first looks for unchanged origins, then renames, then optionally moves/copies. It carries blame_origin, blame_entry, score thresholds, and optional blame Bloom data rather than blindly repeating the same tree walk for every case.merge-conflict-structures: Audit merge tree conflict data structures.
gix-merge/src/tree/mod.rs:376: a better data structure may be needed for some directory/file conflict representation.diffcore-rename.c.diff-preprocess-rescans: Remove avoidable rescans in diff preprocessing.
gix-imara-diff/src/myers/preprocess.rs:132: do not unnecessarily rescan lines.gix-imara-diff already carries Git-inspired histogram behavior, so this should remain benchmark-led.packed-ref-lookup: Implement packed-buffer aware reference lookup in general handles.
gix-ref/src/store/general/handle/find.rs:28: implement lookup with packed-buffer handling.literal-path-normalization: Avoid path normalization through pattern machinery for single literal paths.
gitoxide-core/src/repository/blame.rs:29, gitoxide-core/src/repository/merge/file.rs:37, tests/it/src/commands/blame_copy_royal.rs:58: normalize paths without going through patterns.hex-prefix-stack-decode: Remove heap allocation from odd-length hex prefix parsing.
gix-hash/src/prefix.rs:120: decode odd hex prefixes without heap allocation.object-name.c and packfile.c works against fixed-size object IDs and pack fanout searches rather than allocating per prefix.Kind::longest() and copy directly into ObjectId.small-attribute-values: Use a small byte-string representation for attribute values.
gix-attributes/src/state.rs:8: a small byte string could provide an estimated 5 percent improvement.smallvec-backed byte string with display/serde wrappers.signed-data-reader: Stream signed commit data without allocation.
gix-object/src/commit/mod.rs:33: implement std::io::Read for SignedData.BString.packetline-borrow-decode: Remove extra packet-line decoding caused by borrow-checker workarounds.
gix-packetline/src/blocking_io/read.rs:110: avoid additional decoding of the internal buffer.tempfile-registry-map: Re-evaluate tempfile registry maps.
gix-tempfile/src/lib.rs:75: use a gix-hashtable slot-map once available.hp-hashmap feature.git-date-token-parser: Replace brute-force Git date parsing.
gix-date/src/parse/git.rs:11: learn from Git's parser instead of generated brute-force parsing.date.c uses a token scanner (parse_date_basic) with alpha, digit, and timezone matchers, then falls back to approxidate.sparse-index-parity: Sparse-index parity for worktree-heavy commands.
sparse-index.c can collapse full index ranges into sparse directory entries and expand on demand.gix-status, gix-dir, checkout, and merge code paths force full-index behavior.commit-graph-bloom: Commit-graph generation and Bloom-filter use in history walks.
revision.c uses commit-graph generation numbers to bound traversal, and blame can initialize Bloom data.log -- <path>, merge-base, blame, and negotiation.midx-revindex-bitmaps: Multi-pack-index reverse-index and bitmap-driven object enumeration.
midx.c, pack-revindex.c, and packfile.c use MIDX lookup, reverse-index chunks, and prefixed-object iteration across MIDX before falling back to individual packs.gix-pack and gix-odb MIDX support against Git's lookup order and reverse-index cache behavior.pack-reuse-delta-policy: Pack reuse and path-based delta compression policies.
builtin/pack-objects.c uses pack reuse, bitmap reuse, delta islands, path-based regions, threaded delta search, and delta-cache limits.GIT_TRACE_PERFORMANCE=true and GIT_TRACE2_PERF=1 baselines on the same repositories as gix measurements.git/git, Linux-sized history, a many-untracked-files worktree, a sparse checkout, and a many-pack repository.gix status on large case-sensitive and case-insensitive worktrees and compare against Git's name-hash/preload behavior.path_backing triggers memmove, then choose arena/chunked storage before expanding threaded decode.status and dirty checks scale with index size and untracked-file count within an agreed multiplier of Git on the benchmark corpus.