Code-map.md
Do you want to get familiar with the Vespa code base but don't know where to start?
Vespa consists of about 1.7 million lines of code, about equal parts Java and C++. Since it is mostly written by a team of developers selected for their ability to do this kind of thing unusually well, who have been given time to dedicate themselves to it for a long time, it is mostly easy to work with. However, one thing we haven't done is to create a module structure friendly to newcomers - the code simply organized in a flat structure of about 150 modules.
This document aims to provide a map of the functional elements of Vespa to the most important modules in the flat module structure in the code base on GitHub.
It covers the modules you are most likely to encounter as a developer. The rest are either small and needed for technical reasons or doing one thing which should be self-explanatory, or implementing the cloud service run by the Vespa team which we don't expect anybody else to run and therefore be interested in changing.
When a request is made to Vespa it first enters some stateless container cluster, called jDisc. This consists of:
The stateless container is implemented in Java.
jDisc core modules:
jDisc container modules, layered on jDisc core:
Search container layered on jDisc container:
Document operation modules:
Content nodes store all data in Vespa, maintain reverse and forward indexes, and perform the distributed parts of query execution - matching, ranking and grouping/aggregation. This is written in C++.
The third major subsystem in Vespa is responsible for managing configuration, clusters, application deployment and similar. It is implemented in Java.
Libraries used throughout the code.