examples/projects/spark-data-preparation/README.md
This repository contains the Dockerized example code for the "Data Preparation for Spark Analytics" showcase.
To run this example, follow these steps:
personal_access_token field in the ./github-config.yaml file. There is a comment there for guidance.docker build --no-cache -t spark-data-preparation .PATHWAY_LICENSE_KEY variable when launching the Docker container. Use the following command: docker run -e PATHWAY_LICENSE_KEY=YOUR_LICENSE_KEY -t spark-data-preparationYou can run this example similarly for the S3 case. To enable S3 output, pass the AWS_S3_OUTPUT_PATH environment variable to the container.
Additionally, you need to specify the following environment variables:
AWS_S3_ACCESS_KEY: Your S3 access keyAWS_S3_SECRET_ACCESS_KEY: Your S3 secret access keyAWS_BUCKET_NAME: The name of your S3 bucketAWS_REGION: The region of your S3 bucketThe launch command will look like this:
docker run \
-e PATHWAY_LICENSE_KEY=YOUR_LICENSE_KEY \
-e AWS_S3_OUTPUT_PATH=YOUR_OUTPUT_PATH_IN_S3_BUCKET \
-e AWS_S3_ACCESS_KEY=YOUR_S3_ACCESS_KEY \
-e AWS_S3_SECRET_ACCESS_KEY=YOUR_S3_SECRET_ACCESS_KEY \
-e AWS_BUCKET_NAME=YOUR_BUCKET_NAME \
-e AWS_REGION=YOUR_AWS_REGION \
-t spark-data-preparation