javav2/usecases/workflow_sagemaker_pipes/README.md
This scenario demonstrates how to work with Amazon SageMaker pipelines and geospatial jobs.
A SageMaker pipeline is a series of interconnected steps that can be used to automate machine learning workflows. Pipelines use interconnected steps and shared parameters to support repeatable workflows that can be customized for your specific use case. You can create and run pipelines from SageMaker Studio using Python, but you can also do this by using AWS SDKs in other languages. Using the SDKs, you can create and run SageMaker pipelines and also monitor operations for them.
This example scenario demonstrates using AWS Lambda and Amazon Simple Queue Service (Amazon SQS) as part of an Amazon SageMaker pipeline. The pipeline itself executes a geospatial job to reverse geocode a sample set of coordinates into human-readable addresses. Input and output files are located in an Amazon Simple Storage Service (Amazon S3) bucket.
When you run the example console application, you can execute the following steps:
Pipeline steps define the actions and relationships of the pipeline operations. The pipeline in this example includes an AWS Lambda step and a callback step. Both steps are processed by the same example Lambda function.
The Lambda function handler is included as part of the example, with the following functionality:
The example pipeline uses parameters that you can reference throughout the steps. You can also use the parameters to change values between runs and control the input and output setting. In this example, the parameters are used to set the Amazon Simple Storage Service (Amazon S3) locations for the input and output files, along with the identifiers for the role and queue to use in the pipeline. The example demonstrates how to set and access these parameters before executing the pipeline using an SDK.
A SageMaker pipeline can be used for model training, setup, testing, or validation. This example uses a simple job for demonstration purposes: a Vector Enrichment Job (VEJ) that processes a set of coordinates to produce human-readable addresses powered by Amazon Location Service. Other types of jobs can be substituted in the pipeline instead.
To use this tutorial, you need the following:
To view pipelines in SageMaker Studio, you need to set up an Amazon SageMaker Domain. To use geospatial capabilities, you need to use a supported Region.
You must download and use these files to successfully run this code example:
These files are located on GitHub in this folder.
To successfully run this example, you need to create the Java Sagemaker Lambda function. This Lambda function is required. You can find this project here: Create the SageMaker geospatial Lambda function using the Lambda Java rumtime API. This project creates a JAR file that is input to this code example.
Once you create the Java Lambda project, you can build the required JAR file using the mvn package command. This will create the JAR file in the target folder. You can use this JAR file as input to this code example.
You can run this Java code example from within your Java IDE.
This example shows you how to do the following:
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.