hudi-notebooks/README.md
This project provides a ready-to-use Docker Compose environment for running Apache Spark with Hudi, Hive Metastore, and MinIO (S3-compatible storage) for data lake development and testing. JupyterLab is included for interactive development.
Dockerfile.spark / Dockerfile.hive: Custom Dockerfiles for Spark and Hivebuild.sh: Build all Docker imagesrun_spark_hudi.sh: Start/stop/restart the stackconf/: Configuration files for Spark, Hive, and Hudinotebooks/: Jupyter notebooks (mounted in Spark container)data/: Persistent data for MinIO, Spark event logs, etc../build.sh
./run_spark_hudi.sh start
./run_spark_hudi.sh stop
./run_spark_hudi.sh restart
adminpasswordconf/ and automatically copied into containers.To remove all containers and volumes:
docker-compose down -v
Spark Quick Start Guide Python/Rust Quick Start Guide
Please check out our contribution guide to learn more about how to contribute. For code contributions, please refer to the developer setup.