ARCHITECTURE.md
PrestoDB is an open-source distributed SQL query engine designed for fast analytic queries against data sources of all sizes, ranging from gigabytes to petabytes. It was originally developed by Facebook and later made available to the broader community. It is governed by the Presto Foundation, a member of the Linux Foundation.
The primary mission of PrestoDB is to enable efficient and high-speed data processing for analytics and batch at scale. It aims to provide a single, unified query system that can access and process data stored in various formats and storage systems. Key aspects of its mission include:
Presto aims to accomplish the above goals for users by creating a broad, powerful, and collaborative open source community that strives for high standards in database engineering and design.
The Presto project believes that, while excellence in the code is table stakes for the project, of even greater importance is how the project develops the code. For more information, see Presto Community.
Presto follows a distributed system model with a coordinator and multiple worker nodes. See Presto Concepts for more information.
Long term initiatives in Presto have corresponding project boards. Current progress can be roughly understood by looking at the project boards.
Presto aims to be the top performing system for data lakes. The top priority for the project is to move fully onto a native evaluation engine, particularly Velox, with the same expectations outlined in the vision for users, emphasizing user-friendliness and connectivity.
Motivations for this effort are numerous:
Presto’s optimizer can learn from long established techniques in more mature systems to provide better plans for users. We are working on bringing these mature optimization techniques into Presto.
Originally, the way that Presto interacted with other systems was unique and proprietary. Today, libraries like Velox and DataFusion are standardizing execution, Ibis is standardizing data frames, Substrait is standardizing an intermediate representation of plans, and Arrow is standardizing data exchange. These standardizations point to a future that emphasizes interoperability with other data infrastructure. We believe that if Presto does not adapt to this future, it could be left behind.
In principle and design, we favor the direction of interoperability. We are first movers in this area with our early usage of Velox. We believe that interoperability can be both a differentiator for the project, and allow the project to focus on features that make users happy.
Presto is used at some of the largest data lakes in the world. To support this mission, it must be extremely reliable. This means that investments in testing infrastructure must always be made when required.