sync/src/README.md
This is a small Rust library for efficiently keeping a codebase index up to date.
Important definition: a tag is a (workspace, branch, provider_id) pair that uniquely identifies an index. Since we use content-based addressing within the index, much of the data is shared for efficiency.
The output of the sync_results function is a list of 4 lists of tuples. Each tuple contains a file path and a hash of the file contents. The 4 lists are:
The labels help us filter when retrieving results from an index like Meilisearch or Chroma. All ids of the items in these indices are the hash of the file contents (possibly plus a chunk index at the end).
The first time, a Merkle tree of the codebase folder is constructed, ignoring any files in .gitignore or .continueignore. Every file found will be returned as needing to be computed added to the index.
Thereafter, the following steps are performed:
add_labelcomputedeleteremove_labelSeveral files are stored and updated on disk in the ~/.continue/index folder to keep track of indexed files:
~/.continue/index/tags/<dir>/<branch>/<provider_id>/merkle_tree - the last computed Merkle tree of the codebase for a given tag~/.continue/index/tags/<dir>/<branch>/<provider_id>/.last_sync - the last time the tag was synced~/.continue/index/.index_cache - contains the global cache (flat file of hashes)~/.continue/index/tags/<dir>/<branch>/<provider_id>/.index_cache - contains the tag-specific cache (flat file of hashes)~/.continue/index/rev_tags - contains a mapping from hash to tags that the hash is currently indexed for. This is a directory of files, where each file is prefixed with the first 2 characters of the hash. The file is a JSON mapping from hash to list of tags.lib.rs contains just the top-level function that is called by the Python bindingssync/merkle.rs contains the Merkle tree implementation (for building and comparing trees)sync/mod.rs contains the main sync logic, which handles maintenance of the on-disk database of which hashes are included in which tags