Create and run a SageMaker geospatial pipeline using the SDK for Kotlin

Overview

This scenario demonstrates how to work with Amazon SageMaker pipelines and geospatial jobs.

Amazon SageMaker Pipelines

A SageMaker pipeline is a series of interconnected steps that can be used to automate machine learning workflows. Pipelines use interconnected steps and shared parameters to support repeatable workflows that can be customized for your specific use case. You can create and run pipelines from SageMaker Studio using Python, but you can also do this by using AWS SDKs in other languages. Using the SDKs, you can create and run SageMaker pipelines and also monitor operations for them.

Explore the scenario

This example scenario demonstrates using AWS Lambda and Amazon Simple Queue Service (Amazon SQS) as part of an Amazon SageMaker pipeline. The pipeline itself executes a geospatial job to reverse geocode a sample set of coordinates into human-readable addresses. Input and output files are located in an Amazon Simple Storage Service (Amazon S3) bucket.

When you run the example console application, you can execute the following steps:

Create the AWS resources and roles needed for the pipeline.
Create the AWS Lambda function.
Create the SageMaker pipeline.
Upload an input file into an S3 bucket.
Execute the pipeline and monitor its status.
Display some output from the output file.
Clean up the pipeline resources.

Pipeline steps

Pipeline steps define the actions and relationships of the pipeline operations. The pipeline in this example includes an AWS Lambda step and a callback step. Both steps are processed by the same example Lambda function.

The Lambda function handler is included as part of the example, with the following functionality:

Starts a SageMaker Vector Enrichment Job with the provided job configuration.
Processes Amazon SQS queue messages from the SageMaker pipeline.
Starts the export function with the provided export configuration.
Completes the pipeline when the export is complete.

Pipeline parameters

The example pipeline uses parameters that you can reference throughout the steps. You can also use the parameters to change values between runs and control the input and output setting. In this example, the parameters are used to set the Amazon Simple Storage Service (Amazon S3) locations for the input and output files, along with the identifiers for the role and queue to use in the pipeline. The example demonstrates how to set and access these parameters before executing the pipeline using an SDK.

Geospatial jobs

A SageMaker pipeline can be used for model training, setup, testing, or validation. This example uses a simple job for demonstration purposes: a Vector Enrichment Job (VEJ) that processes a set of coordinates to produce human-readable addresses powered by Amazon Location Service. Other types of jobs can be substituted in the pipeline instead.

⚠ Important

Running this code might result in charges to your AWS account.
Running the tests might result in charges to your AWS account.
We recommend that you grant your code least privilege. At most, grant only the minimum permissions required to perform the task. For more information, see Grant least privilege.
This code is not tested in every AWS Region. For more information, see AWS Regional Services.

Scenario

Prerequisites

To use this tutorial, you need the following:

An AWS account.
A Java IDE (this tutorial uses the IntelliJ IDE).
Java JDK 1.8.
Gradle 6.8 or higher.
You must set up your development environment. For more information, see Setting up the AWS SDK for Kotlin.

To view pipelines in SageMaker Studio, you need to set up an Amazon SageMaker Domain.

To use geospatial capabilities, you need to use a supported Region.

To successfully run this code example, you must download and use the following files:

GeoSpatialPipeline.json
latlongtest.csv

These files are located on GitHub in this folder.

Kotlin Lambda function

To successfully run this example, you need to create the Kotlin Sagemaker Lambda function by creating a .jar file and then placing the .jar file into an S3 bucket. You can find this project here: Create the SageMaker geospatial Lambda function using the AWS SDK for Kotlin.

Instructions

You can run this Kotlin code example from within your IDE.

Get started with geospatial jobs and pipelines

This example shows you how to do the following:

Set up resources for a pipeline.
Set up a pipeline that runs a geospatial job.
Start a pipeline run.
Monitor the status of the run.
View the output of the pipeline.
Clean up resources.

Create and run a SageMaker geospatial pipeline using the SDK for Kotlin

Create and run a SageMaker geospatial pipeline using the SDK for Kotlin

Overview

Amazon SageMaker Pipelines

Explore the scenario

Pipeline steps

Pipeline parameters

Geospatial jobs

⚠ Important

Scenario

Prerequisites

Kotlin Lambda function

Instructions

Get started with geospatial jobs and pipelines

Additional resources