examples/provider-amazon-sagemaker/README.md
This example demonstrates how to evaluate models deployed on Amazon SageMaker AI endpoints using promptfoo.
You can run this example with:
npx promptfoo@latest init --example provider-amazon-sagemaker
cd provider-amazon-sagemaker
This example shows how to:
npm install -g @aws-sdk/client-sagemaker-runtime
This example requires the following environment variables:
AWS_ACCESS_KEY_ID - Your AWS access keyAWS_SECRET_ACCESS_KEY - Your AWS secret keyAWS_REGION - Optional, can also be specified in the configurationYou can set these in a .env file or directly in your environment.
This example includes multiple configuration files demonstrating different SageMaker integration patterns:
# Run a specific configuration
promptfoo eval -c promptfooconfig.jumpstart.yaml
This directory includes a test script to validate your SageMaker AI endpoint configuration before running a full evaluation:
# Basic test for an OpenAI-compatible endpoint
node test-sagemaker-provider.js --endpoint=my-endpoint --model-type=openai
# Test with an embedding endpoint
node test-sagemaker-provider.js --endpoint=my-embedding-endpoint --embedding=true
# Test with transforms
node test-sagemaker-provider.js --endpoint=my-endpoint --model-type=llama --transform=true
# Test with a custom transform file
node test-sagemaker-provider.js --endpoint=my-endpoint --transform=true --transform-file=transform.js
The SageMaker provider supports transforming prompts before they're sent to the endpoint, which is particularly useful for formatting prompts according to specific model requirements.
providers:
- id: sagemaker:llama:your-endpoint
config:
region: us-west-2
modelType: llama
# Apply an inline transform
transform: |
return `<s>[INST] ${prompt} [/INST]`;
This example includes a sample transform file (transform.js) that shows how to create reusable transformations:
providers:
- id: sagemaker:jumpstart:your-endpoint
config:
region: us-west-2
modelType: jumpstart
# Reference an external transform file
transform: file://transform.js
The transform function receives the prompt and a context object containing the provider configuration:
module.exports = function (prompt, context) {
// Access config values
const maxTokens = context.config?.maxTokens || 256;
// Return transformed input
return {
inputs: prompt,
parameters: {
max_new_tokens: maxTokens,
temperature: 0.7,
},
};
};
JumpStart models require a specific input/output format. The provider handles this automatically when modelType: jumpstart is specified:
providers:
- id: sagemaker:jumpstart:your-jumpstart-endpoint
config:
region: us-west-2
modelType: jumpstart
maxTokens: 256
responseFormat:
path: 'json.generated_text'
For better rate limiting with SageMaker endpoints, you can add delays between API calls:
providers:
- id: sagemaker:your-endpoint
config:
region: us-west-2
delay: 500 # Add a 500ms delay between API calls
After running the evaluation, you should expect to see:
If you encounter "Batch inference failed" errors:
delay parameter (at least 500ms recommended)modelType for your endpoint:
modelType: jumpstartmodelType: huggingfacecontentType and acceptType as "application/json"If you're getting unusual responses or missing output:
responseFormat.path: "json.generated_text"responseFormat.path: "json[0].generated_text"If transforms aren't working correctly:
--transform=true to debug transform behaviorIf you're still experiencing errors even with the correct configuration: