src/net/disk_cache/sql/README.md
This directory contains an experimental SQL-based implementation of the disk
cache (disk_cache::Backend). It uses SQLite to store cache entries and is
designed to be more robust and performant than the default cache backends
(block file backend on Windows and simple cache backend on other OSes),
especially in scenarios with a large number of small entries.
The implementation is sharded to improve concurrency, with each shard managing its own SQLite database file.
SqlBackendImpl: This is the main entry point and public-facing class
for the SQL-based disk cache. It implements the disk_cache::Backend
interface, handling requests from the network stack to open, create, doom,
and enumerate cache entries. It owns and coordinates the
SqlPersistentStore.
SqlEntryImpl: Implements the disk_cache::Entry interface,
representing a single entry in the cache. It manages the data streams
(header and body) and metadata (last_used time, key, etc.) for an entry.
All I/O operations are passed to the SqlBackendImpl, which then delegates
them to the persistence layer.
SqlPersistentStore: This class serves as the primary interface to the
persistence layer, abstracting away the details of database sharding and
asynchronous operations. It owns multiple BackendShard instances and
distributes operations among them based on the cache entry's key.
SqlPersistentStore::BackendShard: Manages a single shard of the cache.
Each shard has its own SQLite database file, a dedicated background task
runner for database operations, and an in-memory index of its entries. It
owns a SequenceBound<SqlPersistentStore::Backend>.
SqlPersistentStore::Backend: This class encapsulates all direct
interactions with a single SQLite database. It runs entirely on a
background sequence to avoid blocking the main network thread. Its
responsibilities include:
CacheEntryKey: A memory-efficient wrapper for the cache key string.
Since cache keys can be long and are stored in multiple in-memory data
structures, this class uses a scoped_refptr<base::RefCountedString> to
share the underlying string data, reducing memory overhead.
SqlPersistentStoreInMemoryIndex: An in-memory index that maps a hash
of a CacheEntryKey to its ResId (the primary key in the database). This
allows for fast, synchronous checks to see if an entry is likely to exist in
the cache without needing to perform a slow, asynchronous database query. It
is highly optimized for memory usage.
ExclusiveOperationCoordinator: A synchronization primitive that
serializes access to resources. It ensures that "exclusive" operations (like
cache-wide eviction or cleanup) do not run concurrently with "normal"
operation (like reading or writing a single entry). Normal operations on
different cache keys can run in parallel.
EvictionCandidateAggregator: A thread-safe helper class used during
cache eviction. Each shard independently generates a list of its least
recently used entries as eviction candidates. This class aggregates these
lists from all shards, performs a final sort, and selects the global set of
entries to be evicted to bring the cache size back under its limit.
InFlightEntryModification: A mechanism to queue metadata updates
(last_used time, headers, body size) for a cache entry that is not
currently active (i.e., not held open as a SqlEntryImpl object). When an
operation modifies an entry, it records the change as an in-flight
modification. If the entry is opened again before the background database
write completes, these queued modifications are applied to the entry's data
as it is read from disk. This ensures that the in-memory representation of
an entry is always consistent with pending operations, even with fully
asynchronous database writes.
Initialization: SqlBackendImpl creates a SqlPersistentStore, which
in turn creates a number of BackendShard instances (e.g., 3), each with
its own background task runner. Each shard initializes its SQLite database.
Entry Operations (Create/Open):
SqlBackendImpl.BackendShard is
responsible for it.SqlPersistentStore::Backend for that shard executes the necessary
SQL commands to find or create the entry in its database.SqlEntryImpl or an error) is returned to the main thread
via a callback.Data I/O: Reading and writing data to a SqlEntryImpl follows a similar
pattern, with operations being posted to the appropriate shard's background
task runner.
Eviction:
SqlBackendImpl
initiates eviction.SqlPersistentStore.EvictionCandidateAggregator collects these lists, selects the
entries to be removed, and sends the list of doomed entries back to each
shard to be deleted from the database.Coordination: The ExclusiveOperationCoordinator ensures that
operations like eviction, which affect the entire cache, do not conflict
with ongoing reads and writes to individual entries. When an exclusive
operation is requested, the coordinator waits for all active normal
operations to complete, runs the exclusive operation, and then resumes
queued normal operations.
Each shard of the SQL disk cache uses a SQLite database with the following schema:
resources: Stores the main metadata for each cache entry.
res_id (INTEGER, PRIMARY KEY AUTOINCREMENT): Unique ID for the
resource.last_used (INTEGER): Timestamp for LRU.body_end (INTEGER): End offset of the body.bytes_usage (INTEGER): Total bytes consumed by the entry.doomed (INTEGER): Flag for entries pending deletion (0 for live, 1 for
doomed).check_sum (INTEGER): The checksum crc32(head + cache_key_hash).cache_key_hash (INTEGER): The hash of cache_key.cache_key (TEXT): The full cache key string.head (BLOB): Serialized response headers.blobs: Stores the data chunks of the cached body.
blob_id (INTEGER, PRIMARY KEY AUTOINCREMENT): Unique ID for the blob.res_id (INTEGER): Foreign key to resources.res_id.start (INTEGER): Start offset of this blob chunk.end (INTEGER): End offset of this blob chunk.check_sum (INTEGER): The checksum crc32(blob + cache_key_hash).blob (BLOB): The actual data chunk.index_resources_cache_key_hash_doomed:
ON resources(cache_key_hash,doomed)
doomed=0) by cache_key_hash.
Crucial for OpenEntry and similar operations.index_live_resources_last_used_bytes_usage:
ON resources(last_used, bytes_usage) WHERE doomed=0
last_used and bytes_usage for live entries.
Essential for efficient eviction logic, which targets the least recently
used entries without needing to access the resources table directly.index_blobs_res_id_start: (UNIQUE) ON blobs(res_id, start)
(res_id, start) in the blobs table. Ensures quick
retrieval of data blobs for a given entry at a specific offset and
maintains data integrity by preventing overlapping blobs for the same
entry.