eden/mononoke/jobs/walker/src/README.md
The Mononoke Graph Walker (henceforth just the walker) defines a graph schema for mononoke across both its SQL and Blobstore stored types, and uses it to provide the following functionality.
In the future it is intended to provide other operations over the mononoke graph, including
The walker represents Mononoke as a graph schema of Nodes connected by edges. The Node's are kept small to just their identities, with NodeData representing the payload. A node also has a NodeType, typically used for filtering out a class of nodes.
The graph is dynamically discovered, so edges are represented as an OutgoingEdge pointing to a target and annotated by an EdgeType and optionally a WrappedPath (which represents a path in the repo). There is currently no type representing a full edge, if that is needed then a Node plus an OutgoingEdge would suffice.
The types are represented as Enum variants so they can be passed in heterogeneous collections, in particular the queues used by bounded_traversal_stream which does the dynamic unfolding of the graph.
Note that due to backlinks from file to commit graph (e.g. HgFileNodeToHgChangeset) the graph has cycles so tracking visits is important to be able to break cycles.
A walk is said to be deep if it will cover the entire history of the repo, or shallow if it will only look at data (e.g. FileContent) accessible at a given commit.
When extending the walker to cover step to a new type of Node, one needs to:
Node, NodeType, NodeData and to EdgeType to describe the transitions.
Node should hold the minimal key needed to lookup this node (e.g. hash or hash + repopath)NodeData should hold the data loaded by the Node that is used to derive the children. The key thing is that it should be Some() if the load succeeded. NodeData is currently populated for all Node's.Node's_step() functions in walk.rs that expand the Node to its children.
_step() that are sources of edges to your Node to produce OutgoingEdge with your Node listed.WalkStateCHashmapNodeThere are often multiple valid routes from A to B, and because the graph is dynamically unfolded and evaluated in parallel from async IO, which route is chosen can vary between runs. One way this is visible is in the number of checks for re-visits done per NodeType. If two nodes A and B expand to content X then at most one will visit it, but both will record a check.
Most Mononoke data represented on the graph is immutable, so visit tracking can prevent any re-visits at all.
For bookmarks the solution is that published bookmark data is resolved once per walk and then referred to during the walk to ensure it is consistent within a given tailing run.
However for a subset of data, currently BonsaiHgMapping and BonsaiPhaseMapping, the graph sees the data in more than one state. In this case the data is too large to efficiently reload in a snapshot approach on each tail iteration, so instead the WalkStateCHashmap state tracking will allow revisits until the Node has received its NodeData in the expected terminal state.
The WalkVisitor trait is called at the start and end of the unfolding of a graph node, with start_node() giving it a chance to setup CoreContext for sampling blobstore actions ( and maybe in the future for sampling SQL actions) before the walk attempts to load the target node, and visit() having the bulk of functionality where it sees the unfolded outgoing edges and has a chance to filter them to remove re-visits (e.g. WalkStateCHashmap) and to validate them e.g. (ValidatingVisitor)
Memory usage by the graph representation is one of the key design constraints of the walker, driven by two concerns:
There are several possible further memory usage improvements to consider.
WrappedPath, and Node
NodeData objects required. Currently we return NodeData unconditionally, however these usually don't dominate, except for very large manifests (large files return a stream which is conditionally consumed).Vec's from collect()bounded_traversal_stream to reduced the size of pending queues as much as possible based on expected unfold size of a Node's NodeType.To handle large repos, the walker supports sampling by node hash and repo path so that actions can be run on a subset of the data. Sampling may be based purely on Node identity (e.g. scrub), or on the route through the graph taken (e.g. WrappedPath tracking used in compression-benefit)
This is combined with the SamplingBlobstore to support sampling low level blobstore actions/data. The SamplingWalkVisitor connects up the high level Node's being sampled with the lower level blobstore sampling via the CoreContext's SamplingKey.
Combined with the in review --sampling-offset functionality, sampling with --sample-rate provides a way to operated on a slice of a repo e.g. 1/100th or 1/1000th of it at a time, and by incrementing the offset the entire repo can be covered a slice at a time. When sampling by Node hash this is stable, when sampling by repo path the are potentially multiple paths for a FileContent, so the slice assigned will be dependent on walk order.
Currently sampling is used only to restrict the output stage, e.g. which objects are attempted to be compressed or dumped to disk. It could also be used to restrict the walk, e.g. into batches of commits. Likely we'd still keep the WalkStateCHashmap or its equivalent populated between slices to avoid re-visits.
The walker's main production monitoring is via ODS metrics with scuba sampling of Node's with problems. Scuba logs can also optionally include route information, e.g. validate reports the source Node that a step originated from.
When running locally or in integration tests glog progress logging provides the stats, with scuba optionally logging to file or dropped.
This walker can dump out a corpus of blobs from a repo via the corpus subcommand. This is intended to allow shell level investigation and experimentation with the corpus, e.g. trying various compression techniques.
To form a corps it associates an in-repo WrappedPath with Nodes that either have a path as part of their identity (e.g. Node::HgManifest), can be associated with a path as part of an edge, (e.g. OutgoingEdge for EdgeType::BonsaiChangesetToFileContent), or can have a path associated based on the route taken through the graph (e.g. Node::FileContentMetadata).
The output directory is specified with --output-dir, and can be sampled by path-hash or path-regex
The blobs are dumped out in the output dir in the layout NodeType/<repo_path>/.mononoke,/aa/bb/cc/blob_key where the repo_path and blob_key are both percent_encoded and the aa/bb/cc etc are a subset of the hash used to prevent any one directory becoming too large. For types without any in-repo path (e.g. BonsaiChangeset) the repo_path component is omitted. The repo paths and key are percent encoded so that they can't match the magic .mononoke, directory name as it contains a comma, and comma is percent encoded. This property can be used in shell operations to separate the blob structure from the in-repo path.
e.g to find all blobs and list them
$ find . -type d -name .mononoke,| xargs -i find {} -type f | head -1
./FileContent/root/foo/bar/baz/space_usage.json/.mononoke,/49/f0/c1/repo2103.content.blake2.49f0c1d254974a7eb43ad25e93426e738aa27e3f5e391301a1402e26643542c6
vs to find all files we have dumped blobs for
$ find . -type d -name .mononoke, |xargs dirname| head -1
./FileContent/root/foo/bar/baz/space_usage.json
The walker can check and optional repair storage durability via the scrub subcommand. This checks each component of a multiplexed blobstore has data for each key, so that we could run on one side of the multiplex if necessary
Example issues this could correct:
Scrub makes sure that the above are detectable, and in the event of a component store failure we can run on one component store.
The scrub visits all graph nodes, with the underlying ScrubBlobstore providing a call back used when issues are detected.
The walker can check data validity via the validate subcommand
Example issues this could detect:
This provides a tool to measure effective compression ratio to a repo if we were to zstd compress each blob individually via the compression-benefit subcommand.