python/example_code/emr/README.md
Shows how to use the AWS SDK for Python (Boto3) to work with Amazon EMR.
<!--custom.overview.start--> <!--custom.overview.end-->Amazon EMR is a web service that makes it easy to process vast amounts of data efficiently using Apache Hadoop and services offered by Amazon Web Services.
For prerequisites, see the README in the python folder.
Install the packages required by these examples by running the following in a virtual environment:
python -m pip install -r requirements.txt
Code excerpts that show you how to call individual service functions.
Code examples that show you how to accomplish a specific task by calling multiple functions within the same service.
<!--custom.examples.start--> <!--custom.examples.end-->This example shows you how to create a short-lived Amazon EMR cluster that runs a step and automatically terminates after the step completes.
<!--custom.scenario_prereqs.emr_Scenario_ShortLivedEmrCluster.start-->Start the example by running the following at a command prompt:
<!--custom.scenario_prereqs.emr_Scenario_ShortLivedEmrCluster.end--> <!--custom.scenarios.emr_Scenario_ShortLivedEmrCluster.start-->Shows how to write a job step that uses Apache Spark to estimate the value of pi by performing a large number of parallelized calculations on cluster instances. Results are written to Amazon EMR logs and also to an S3 bucket.
<!--custom.scenarios.emr_Scenario_ShortLivedEmrCluster.end-->This example shows you how to use AWS Systems Manager to run a shell script on Amazon EMR instances that installs additional libraries. This way, you can automate instance management instead of running commands manually through an SSH connection.
<!--custom.scenario_prereqs.emr_Usage_InstallLibrariesWithSsm.start--> <!--custom.scenario_prereqs.emr_Usage_InstallLibrariesWithSsm.end-->Start the example by running the following at a command prompt:
python install_libraries.py
To install additional libraries on running cluster instances, run the following command at a command prompt:
python install_libraries.py CLUSTER_ID SHELL_SCRIPT_PATH
This example is intended to be run as part of the tutorial in Installing Additional Kernels and Libraries. The cluster specified by CLUSTER_ID must be set up to work with Systems Manager. You must also have previously uploaded a shell script to the Amazon S3 location specified by SHELL_SCRIPT_PATH.
<!--custom.scenarios.emr_Usage_InstallLibrariesWithSsm.end-->⚠ Running tests might result in charges to your AWS account.
To find instructions for running these tests, see the README
in the python folder.
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: Apache-2.0