src/docs/rfcs/005-all-docs-index.md
This document describes how to maintain an index of all the documents in a database backed by FoundationDB, one sufficient to power the _all_docs endpoint. It also addresses the individual metadata fields included in the response to a GET /dbname request.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
Normal requests to the _all_docs index will be powered by a dedicated subspace
containing a single key for each document in the database that has at least one
deleted=false entry in the revisions subspace. This dedicated subspace can be
populated by blind writes on each update transaction, as the revisions subspace
ensures proper coordination of concurrent writers trying to modify the same
document. The structure of the keys in this space looks like
(?BY_ID, DocID) = (ValueFormat, RevPosition, RevHash)
where the individual elements are defined as follows:
If a transaction deletes the last "live" edit branch of a document, it must also clear the corresponding entry for the document from this subspace.
A request that specifies include_docs=true can be implemented either by
performing a range request against this subspace and then N additional range
requests explicitly specifying the full revision information in the ?DOCS
subspace, or by doing a full range scan directly against that subspace,
discarding conflict bodies and any user data associated with deleted revisions.
As the implementation choice there has no bearing on the actual data model we
leave it unspecified in this RFC.
The so-called "dbinfo" JSON object contains various bits of metadata about a database. Here's how we'll carry those forward:
db_name: should be trivially accessible.
doc_count: this will be maintained as a single key mutated using
FoundationDB's atomic operations. Transactions that create a new document or
re-create one where all previous edit branches had been deleted should increment
the counter by 1.
doc_del_count: as above, this is a key mutated using atomic operations.
Transactions that tombstone the last deleted=false edit branch on a document
should increment it by 1. Transactions that add a new deleted=false edit branch
to a document where all previous edit branches were deleted must decrement it by
1.
The revisions model ensures that every transaction has enough information to know whether it needs to modify either or both of the above counters.
update_seq: the most efficient way to retrieve this value is to execute a
get_key operation using a last_less_than KeySelector on the end of the
?CHANGES subspace, so no additional writes are required.
purge_seq: TBD on a more detailed design for purge. If it ends up being
entirely transactional then this could be fixed to update_seq or dropped
entirely.
There are three distinct sizes that we currently track for every database:
sizes.external: described as the "number of bytes that would be required to
represent the contents outside of the database".sizes.active: a theoretical minimum number of bytes to store this database
on disk.sizes.file: the current number of bytes on disk.The relationship between sizes.active and sizes.file is used to guide
decisions on database compaction. FoundationDB doesn't require compaction, and
any distinction that might exist between these two quantities (e.g. from storage
engine compression) is not surfaced up to the clients, so it probably doesn't
make sense to have both.
The current implementation of sizes.external does not measure the length of
a JSON representation of the data, but rather the size of an uncompressed Erlang
term representation of the JSON. This is a somewhat awkward choice as the
internal Erlang term representation is liable to change over time (e.g. with the
introduction of Maps in newer Erlang releases, or plausibly even a JSON decoder
that directly emits the format defined in the document storage RFC).
Assuming we can agree on a set of sizes and how they should be calculated, the implementation will require two pieces: a single key for each size, mutated by atomic operations, and a record of the size of each revision in the ?REVISIONS subspace so that a transaction can compute the delta for each document.
The r, w, q, and n values in the cluster object were introduced in
CouchDB 2.x to describe the topology of a database and the default quorum
settings for operations against it. If we wanted to bring these forward, here's
how they'd be defined:
r: always fixed at 1
w: interpreted as the number of transaction logs that record a commit, this
is dependent on the redundancy mode for the underlying FoundationDB database
n: interpreted as number of storage servers that host a key, this is also
dependent on the redundancy mode for the underlying FoundationDB database
q: the closest analogue here would be to use the get_boundary_keys API and
report number of distinct ranges implied by the boundary keys
This interpretation could lead to some surprises, though. For example, "r=1,
w=4, n=3" is a popular configuration, but this is nonsensical for someone
expecting to see Dynamo-style numbers. Ignoring backwards compatibility, the
sensible thing is to point users toward the actual FoundationDB configuration
information, and to deprecate this entire cluster object. Open for discussion.
The underlying transaction in FoundationDB must complete within 5 seconds, which implicitly limits the number of results that can be returned in a single _all_docs invocation.
TBD depending on exact code layout going forward.
None.
The total_rows and offset fields are removed from the response to
_all_docs, which now has the simpler form
{"rows": [
{"id":"foo", "key":"foo", "value":{"rev":"1-deadbeef..."}},
...
]}
The following fields are removed in the dbinfo response:
compact_running
disk_format_version: this is a tricky one. We define "format versions" for
every single type of key we're storing in FoundationDB, and those versions
could vary on a key-by-key basis, so listing a single number for an entire
database is sort of ill-posed.
The following fields are already marked as deprecated and can be removed in the next major release, independent of the FoundationDB work:
instance_start_timeotherdata_sizedisk_sizeNone have been identified.