Back to Presto

Prestissimo - C++ Presto Worker Implementation using Velox

presto-native-execution/README.md

0.29717.4 KB
Original Source

Prestissimo - C++ Presto Worker Implementation using Velox

Prestissimo implements the Presto Worker REST API using Velox.

Table of Contents

Build from Source

  • Clone the Presto repository

git clone https://github.com/prestodb/presto.git

  • Run cd presto/presto-native-execution && make submodules

Dependency installation

Dependency installation scripts based on the operating system are available inside presto/presto-native-execution/scripts.

  • macOS: setup-macos.sh
  • CentOS Stream 9: setup-centos.sh
  • Ubuntu: setup-ubuntu.sh

The above setup scripts use the DEPENDENCY_DIR environment variable to set the location to download and build packages. This defaults to deps-download in the current working directory.

Use INSTALL_PREFIX to set the install directory of the packages. This defaults to deps-install in the current working directory on macOS and to the default install location (for example, /usr/local) on Linux. Using the default install location /usr/local on macOS is discouraged because this location is owned by root.

Manually add the INSTALL_PREFIX value in the IDE or bash environment, so subsequent Prestissimo builds can use the installed packages. Say export INSTALL_PREFIX=/Users/$USERNAME/presto/presto-native-execution/deps-install to ~/.zshrc.

The following libraries are installed by the above setup scripts. The Velox library installs other dependencies not listed below.

NameVersion
VeloxLatest
CMakeMinimum 3.10
gperfv3.1
proxygenv2024.07.01.00

Prestissimo sources the Velox scripts and the configuration for the installation location and other configuration applies to Prestissimo. Please make sure to also review the Velox README.

For build issues refer to the troubleshooting section in this document.

Supported architectures, operating systems, and compilers

The supported architectures are x86_64 (avx, sse), and AArch64 (apple-m1+crc, neoverse-n1).

Prestissimo can be built by a variety of compilers (and versions) but not all. Compilers (and versions) not mentioned are known to not work or have not been tried.

Minimum required

OScompiler
Ubuntu 22.04gcc11
macOSclang15
CentOS 9/RHEL 9gcc11

Recommended

OScompiler
CentOS 9/RHEL 9gcc12
Ubuntu 22.04gcc11
macOSclang15 (or later)

Build Prestissimo

Parquet and S3 Support

Parquet support is enabled by default. To disable it, add -DPRESTO_ENABLE_PARQUET=OFF to the EXTRA_CMAKE_FLAGS environment variable.

export EXTRA_CMAKE_FLAGS="${EXTRA_CMAKE_FLAGS -DPRESTO_ENABLE_PARQUET=OFF"

To enable S3 support, add -DPRESTO_ENABLE_S3=ON to the EXTRA_CMAKE_FLAGS environment variable.

export EXTRA_CMAKE_FLAGS="${EXTRA_CMAKE_FLAGS} -DPRESTO_ENABLE_S3=ON"

S3 support needs the AWS SDK C++ library. This dependency can be installed by running the target platform build script from the presto/presto-native-execution directory.

./velox/scripts/setup-centos9.sh install_aws_deps Or ./velox/scripts/setup-ubuntu.sh install_aws_deps

JWT Authentication

To enable JWT authentication support, add -DPRESTO_ENABLE_JWT=ON to the EXTRA_CMAKE_FLAGS environment variable.

export EXTRA_CMAKE_FLAGS="${EXTRA_CMAKE_FLAGS} -DPRESTO_ENABLE_JWT=ON"

JWT authentication support needs the JWT CPP library. This dependency can be installed by running the script below from the presto/presto-native-execution directory.

./scripts/setup-adapters.sh jwt

Worker Metrics Collection

To enable worker level metrics collection and to enable the REST API v1/info/metrics follow these steps:

Pre-build setup: ./scripts/setup-adapters.sh prometheus

CMake flags: PRESTO_STATS_REPORTER_TYPE=PROMETHEUS

export EXTRA_CMAKE_FLAGS="${EXTRA_CMAKE_FLAGS} -DPRESTO_STATS_REPORTER_TYPE=PROMETHEUS"

Runtime configuration: runtime-metrics-collection-enabled=true

  • After installing the above dependencies, from the presto/presto-native-execution directory, run make
  • For development, use make debug to build a non-optimized debug version.
  • Use make unittest to build and run tests.

Arrow Flight Connector

To enable Arrow Flight connector support, add to the EXTRA_CMAKE_FLAGS environment variable: export EXTRA_CMAKE_FLAGS="${EXTRA_CMAKE_FLAGS} -DPRESTO_ENABLE_ARROW_FLIGHT_CONNECTOR=ON"

The Arrow Flight connector requires the Arrow Flight library. You can install this dependency by running the following script from the presto/presto-native-execution directory:

./scripts/setup-adapters.sh arrow_flight

Nvidia cuDF GPU Support

To enable support with cuDF, add to the EXTRA_CMAKE_FLAGS environment variable: export EXTRA_CMAKE_FLAGS="${EXTRA_CMAKE_FLAGS} -DPRESTO_ENABLE_CUDF=ON"

In some environments, the CUDA_ARCHITECTURES and CUDA_COMPILER location must be explicitly set. The make command will look like:

CUDA_ARCHITECTURES=80 CUDA_COMPILER=/usr/local/cuda/bin/nvcc EXTRA_CMAKE_FLAGS=" -DPRESTO_ENABLE_CUDF=ON" make

The required dependencies are bundled from the Velox setup scripts.

Spatial type and function support

Spatial type and function support is enabled by default. To disable it, add to EXTRA_CMAKE_FLAGS environment variable: export EXTRA_CMAKE_FLAGS="${EXTRA_CMAKE_FLAGS} -PRESTO_ENABLE_SPATIAL=OFF"

The spatial support adds new types (OGC geometry types) and functionality for spatial calculations.

Makefile Targets

A reminder of the available Makefile targets can be obtained using make help

    make help
    all                     Build the release version
    clean                   Delete all build artifacts
    cmake                   Use CMake to create a Makefile build system
    build                   Build the software based in BUILD_DIR and BUILD_TYPE variables
    debug                   Build with debugging symbols
    release                 Build the release version
    unittest                Build with debugging and run unit tests
    format-fix              Fix formatting issues in the current branch
    format-check            Check for formatting issues on the current branch
    header-fix              Fix license header issues in the current branch
    header-check            Check for license header issues on the current branch
    tidy-fix                Fix clang-tidy issues in the current branch
    tidy-check              Check clang-tidy issues in the current branch
    help                    Show the help messages

Build using Dockerfile

Information on how to build a dependency and runtime image of Prestissimo can be found here.

Development

Setup Presto with IntelliJ IDEA and Prestissimo with CLion

Clone the whole Presto repository. Close IntelliJ and CLion if running.

From the Presto repo run the commands below:

  • git fetch upstream
  • git co upstream/master
  • mvn clean install -DskipTests -T1C -pl -presto-docs

Run IntelliJ IDEA:

Run HiveExternalWorkerQueryRunner,

  • Edit/Create HiveExternalWorkerQueryRunner Application Run/Debug Configuration (alter paths accordingly).
    • Main class: com.facebook.presto.nativeworker.HiveExternalWorkerQueryRunner.
    • VM options: -ea -Xmx5G -XX:+ExitOnOutOfMemoryError -Duser.timezone=America/Bahia_Banderas -Dhive.security=legacy.
    • Working directory: $MODULE_DIR$
    • Environment variables: PRESTO_SERVER=/Users/<user>/git/presto/presto-native-execution/cmake-build-debug/presto_cpp/main/presto_server;DATA_DIR=/Users/<user>/Desktop/data;WORKER_COUNT=0
    • Use classpath of module: choose presto-native-execution module.

Run IcebergExternalWorkerQueryRunner,

  • Edit/Create IcebergExternalWorkerQueryRunner Application Run/Debug Configuration (alter paths accordingly).
    • Main class: com.facebook.presto.nativeworker.IcebergExternalWorkerQueryRunner.

    • VM options: -ea -Xmx5G -XX:+ExitOnOutOfMemoryError -Duser.timezone=America/Bahia_Banderas -Dhive.security=legacy.

    • Working directory: $MODULE_DIR$

    • Environment variables:

      • PRESTO_SERVER: Absolute path to the native worker binary. For example: /Users/<user>/git/presto/presto-native-execution/cmake-build-debug/presto_cpp/main/presto_server
      • DATA_DIR: Base data directory for test data and catalog warehouses. For example: /Users/<user>/Desktop/data
      • WORKER_COUNT: Number of native workers to launch (default: 4)
      • CATALOG_TYPE: Iceberg catalog type to use. One of HADOOP | HIVE (default: HIVE)

      Example: PRESTO_SERVER=/Users/<user>/git/presto/presto-native-execution/cmake-build-debug/presto_cpp/main/presto_server;DATA_DIR=/Users/<user>/Desktop/data;WORKER_COUNT=1;CATALOG_TYPE=HIVE

    • Use classpath of module: choose presto-native-execution module.

Run NativeSidecarPluginQueryRunner:

  • Edit/Create NativeSidecarPluginQueryRunner Application Run/Debug Configuration (alter paths accordingly).
    • Main class: com.facebook.presto.sidecar.NativeSidecarPluginQueryRunner.
    • VM options : -ea -Xmx5G -XX:+ExitOnOutOfMemoryError -Duser.timezone=America/Bahia_Banderas -Dhive.security=legacy.
    • Working directory: $MODULE_DIR$
    • Environment variables: PRESTO_SERVER=/Users/<user>/git/presto/presto-native-execution/cmake-build-debug/presto_cpp/main/presto_server;DATA_DIR=/Users/<user>/Desktop/data;WORKER_COUNT=0
    • Use classpath of module: choose presto-native-sidecar-plugin module.

Run CLion:

  • File->Close Project if any is open.
  • Open presto/presto-native-execution directory as CMake project and wait till CLion loads/generates cmake files, symbols, etc.
  • Edit configuration for presto_server module (alter paths accordingly).
    • Program arguments: --logtostderr=1 --v=1 --etc_dir=/Users/<user>/git/presto/presto-native-execution/etc
    • Working directory: /Users/<user>/git/presto/presto-native-execution
  • For sidecar, Edit configuration for presto_server module (alter paths accordingly).
    • Program arguments: --logtostderr=1 --v=1 --etc_dir=/Users/<user>/git/presto/presto-native-execution/etc_sidecar
    • Working directory: /Users/<user>/git/presto/presto-native-execution
  • Edit menu CLion->Preferences->Build, Execution, Deployment->CMake
    • CMake options: -DVELOX_BUILD_TESTING=ON -DCMAKE_BUILD_TYPE=Debug
    • Build options: -- -j 12
    • Optional CMake options to enable Parquet and S3: -DPRESTO_ENABLE_PARQUET=ON -DPRESTO_ENABLE_S3=ON
  • Edit menu CLion->Preferences->Editor->Code Style->C/C++
    • Scheme: Project
  • To enable clang format you need
    • Open any h or cpp file in the editor and select Enable ClangFormat by clicking 4 spaces rectangle in the status bar (bottom right) which is next to UTF-8 bar.

Setup Presto C ++ with dev containers using CLion

See How to develop Presto C++ with dev-containers in CLion.

Run Presto Coordinator + Worker

  • Note that everything below can be done without using IDEs by running command line commands (not in this readme).
  • Run QueryRunner as per your choice,
    • For Hive, Run HiveExternalWorkerQueryRunner from IntelliJ and wait until it starts (======== SERVER STARTED ======== is displayed in the log output).
    • For Iceberg, Run IcebergExternalWorkerQueryRunner from IntelliJ and wait until it starts (======== SERVER STARTED ======== is displayed in the log output).
  • Scroll up the log output and find Discovery URL http://127.0.0.1:50555. The port is 'random' with every start.
  • Copy that port (or the whole URL) to the discovery.uri field in presto/presto-native-execution/etc/config.properties for the worker to announce itself to the Coordinator.
  • In CLion run "presto_server" module. Connection success will be indicated by Announcement succeeded: 202 line in the log output.
  • See Run Presto Client to start executing queries on the running local setup.

Run Presto Coordinator + Sidecar

  • Note that everything below can be done without using IDEs by running command line commands (not in this readme).
  • Add a property presto.default-namespace=native.default to presto-native-execution/etc/config.properties.
  • Run NativeSidecarPluginQueryRunner from IntelliJ and wait until it starts (======== SERVER STARTED ======== is displayed in the log output).
  • Scroll up the log output and find Discovery URL http://127.0.0.1:50555. The port is 'random' with every startup.
  • Copy that port (or the whole URL) to the discovery.uri field inpresto/presto-native-execution/etc_sidecar/config.properties for the sidecar to announce itself to the Coordinator.
  • In CLion run "presto_server" module. Connection success will be indicated by Announcement succeeded: 202 line in the log output.
  • See Run Presto Client to start executing queries on the running local setup.

Run Presto Client

  • Run the following command from the presto root directory to start the Presto client:
    java -jar presto-cli/target/presto-cli-*-executable.jar --catalog hive --schema tpch
    
  • You can start from show tables; and describe table; queries and execute more queries as needed.

Run Integration (End to End or E2E) Tests

  • Note that everything below can be done w/o using IDEs by running command line commands (not in this readme).
  • Open a test file which has the test(s) you want to run in IntelliJ from presto/presto-native-execution/src/test/java/com/facebook/presto/nativeworker path.
  • Click the green arrow to the left of the test class line of code and chose if you want to Run or Debug. This will run all tests in this class.
  • Alternatively click the green arrow to the left of the test class' test method line of code and chose if you want tor Run or Debug. This will run all tests only in this class's member.
  • The framework will launch single Coordinator and four native workers to test-run the queries.
  • Similarly, the unit tests of Velox and presto_cpp can be run from CLion.

Code formatting, headers, and clang-tidy

Code formatting, license headers, and other checks are handled by pre-commit.

The pre-commit configuration in .pre-commit-config.yaml provides Git hooks that run automatically before commits and pushes to check and fix formatting and license headers.

GitHub Actions run pre-commit checks as part of our continuous integration. Using pre-commit hooks locally ensures pull requests pass these checks before they have the chance to fail. When pre-commit automatically fixes issues on commit, it is a good idea to manually check the modified files to ensure pre-commit did not make unintended changes.

To install the pre-commit hooks, first ensure your Python version is 3.9 or higher. Then run:

pip install pre-commit
pre-commit install --allow-missing-config

The option --allow-missing-config will allow commits and pushes to succeed locally if the config is missing (e.g. you are working on an older branch).

In addition to the Git hooks, pre-commit can be run manually on changed files using pre-commit run or on all files using pre-commit run -a. To run a specific hook, use pre-commit run [hook-id] and refer to a specific hook id in .pre-commit-config.yaml.

The clang-tidy hook is not run locally or in CI by default, but can be run manually for optional checks using pre-commit run --hook-stage manual clang-tidy.

Create Pull Request

  • Submit PRs as usual following Presto repository guidelines.
  • Prestissimo follows the Velox coding style.
  • Add [native] prefix in the title as well as to the commit message for PRs modifying anything in presto-native-execution.
  • PRs that only change files in presto-native-execution should be approved by a Code Owner (team-velox) to have merging enabled.

Advance Velox Version

For Prestissimo to use a newer Velox version from the Presto repository root:

  • git -C presto-native-execution/velox checkout main
  • git -C presto-native-execution/velox pull
  • git add presto-native-execution/velox
  • Build and run tests (including E2E) to ensure everything works.
  • Submit a PR, get it approved and merged.

Functional test using containers

To build container images and do functional tests, see Prestissimo: Functional Testing Using Containers.

Troubleshooting

For known build issues check the wiki page Troubleshooting known build issues.