hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/aws_sdk_upgrade.md
This document explains the upcoming work for upgrading S3A to AWS SDK V2. This work is tracked in HADOOP-18073.
The SDK V2 for S3 is very different from SDK V1, and brings breaking changes for S3A. A complete list of the changes can be found in the Changelog.
aws-java-sdk-bundle-1.12.x.jar becomes bundle-2.x.y.jarAs the module name is lost, in hadoop releases a large JAR file with the name "bundle" is now part of the distribution. This is the AWS V2 SDK shaded artifact.
The new and old SDKs can co-exist; the only place that the hadoop code may still use the original SDK is when a non-standard V1 AWS credential provider is declared.
Any deployment of the S3A connector must include this JAR or the subset of non-shaded aws- JARs needed for communication with S3 and any other services used. As before: the exact set of dependencies used by the S3A connector is neither defined nor comes with any commitments of stability or compatibility of dependent libraries.
The change in interface will mean that custom credential providers will need to be updated to now
implement software.amazon.awssdk.auth.credentials.AwsCredentialsProvider instead of
com.amazonaws.auth.AWSCredentialsProvider.
HADOOP-18980 introduces extended version of
the credential provider remapping. fs.s3a.aws.credentials.provider.mapping can be used to
list comma-separated key-value pairs of mapped credential providers that are separated by
equal operator (=).
The key can be used by fs.s3a.aws.credentials.provider or
fs.s3a.assumed.role.credentials.provider configs, and the key will be translated into
the specified value of credential provider class based on the key-value pair
provided by the config fs.s3a.aws.credentials.provider.mapping.
For example, if fs.s3a.aws.credentials.provider.mapping is set with value:
<property>
<name>fs.s3a.aws.credentials.provider.mapping</name>
<vale>
com.amazonaws.auth.AnonymousAWSCredentials=org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider,
com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper=org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider,
com.amazonaws.auth.InstanceProfileCredentialsProvider=org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider
</vale>
</property>
and if fs.s3a.aws.credentials.provider is set with:
<property>
<name>fs.s3a.aws.credentials.provider</name>
<vale>com.amazonaws.auth.AnonymousAWSCredentials</vale>
</property>
com.amazonaws.auth.AnonymousAWSCredentials will be internally remapped to
org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider by S3A while preparing
the AWS credential provider list.
Similarly, if fs.s3a.assumed.role.credentials.provider is set with:
<property>
<name>fs.s3a.assumed.role.credentials.provider</name>
<vale>com.amazonaws.auth.InstanceProfileCredentialsProvider</vale>
</property>
com.amazonaws.auth.InstanceProfileCredentialsProvider will be internally
remapped to org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider by
S3A while preparing the assumed role AWS credential provider list.
AWSCredentialsProvider interfaceNote how the interface begins with the capitalized "AWS" acronym. The V2 interface starts with "Aws". This is a very subtle change for developers to spot. Compilers will detect and report the type mismatch.
package com.amazonaws.auth;
public interface AWSCredentialsProvider {
public AWSCredentials getCredentials();
public void refresh();
}
The interface binding also supported a factory method, AWSCredentialsProvider instance() which,
if available, would be invoked in preference to using any constructor.
If the interface implemented Closeable or AutoCloseable, these would
be invoked when the provider chain was being shut down.
AwsCredentialsProvider interfacepackage software.amazon.awssdk.auth.credentials;
public interface AwsCredentialsProvider {
AwsCredentials resolveCredentials();
}
refresh() method any more.getCredentials() has become resolveCredentials().resolveCredentials().Closeable or AutoCloseable, these will
be invoked when the provider chain is being shut down.create() which returns an AwsCredentialsProvider or subclass; this will be used
in preference to a constructorAWSCredentialProviderList is now a V2 credential providerThe class org.apache.hadoop.fs.s3a.AWSCredentialProviderList has moved from
being a V1 to a V2 credential provider; even if an instance can be created with
existing code, the V1 methods will not resolve:
java.lang.NoSuchMethodError: org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials()Lcom/amazonaws/auth/AWSCredentials;
at org.apache.hadoop.fs.store.diag.S3ADiagnosticsInfo.validateFilesystem(S3ADiagnosticsInfo.java:903)
fs.s3a.aws.credentials.providerBefore: fs.s3a.aws.credentials.provider took a list of v1 credential providers,
This took a list containing
hadoop-aws module.aws-sdk-bundle library.And here is how they change
hadoop-aws credential providers migrated to V2.aws-sdk-bundle credential providers automatically remapped to their V2 equivalents.aws-sdk-bundle JAR is on the classpath.Because of (1) and (2), As result, standard fs.s3a.aws.credentials.provider configurations
should seamlessly upgrade. This also means that the same provider list, if restricted to
those classes, will work across versions.
hadoop-aws credential providers migration to V2All the fs.s3a credential providers have the same name and functionality as before.
| Hadoop module credential provider | Authentication Mechanism |
|---|---|
org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider | Session Credentials in configuration |
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider | Simple name/secret credentials in configuration |
org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider | Anonymous Login |
org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider | Assumed Role credentials |
org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider | EC2/k8s instance credentials |
aws-sdk-bundle credential provider remappingThe commonly-used set of V1 credential providers are automatically remapped to V2 equivalents.
| V1 Credential Provider | Remapped V2 substitute |
|---|---|
com.amazonaws.auth.AnonymousAWSCredentials | org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider |
com.amazonaws.auth.EnvironmentVariableCredentialsProvider | software.amazon.awssdk.auth.credentials.EnvironmentVariableCredentialsProvider |
com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper | org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider |
com.amazonaws.auth.InstanceProfileCredentialsProvider | org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider |
com.amazonaws.auth.profile.ProfileCredentialsProvider | software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider |
There are still a number of troublespots here:
com.amazonaws.auth. AWS providersThere should be equivalents in the new SDK, but as well as being renamed
they are likely to have moved different factory/builder mechanisms.
Identify the changed classes and use their
names in the fs.s3a.aws.credentials.provider option.
If a V2 equivalent is not found; provided the V1 SDK is added to the classpath, it should still be possible to use the existing classes.
Provided the V1 SDK is added to the classpath, it should still be possible to use the existing classes.
Adding a V2 equivalent is the recommended long-term solution.
Because all the standard hadoop credential providers have been upgraded, any subclasses of these are not going to link or work.
These will need to be manually migrated to being V2 Credential providers.
The major changes and how this affects S3A are listed below.
software.amazon.awssdk, SDK V1 classes
were under com.amazonaws.org.apache.hadoop.fs.s3a.adapter.Serializable.Most of these changes simply create what will feel to be gratuitous migration effort;
the removable of the Serializable nature from all message response classes can
potentially break applications -such as anything passing them between Spark workers.
See AWS SDK V2 issue Simplify Modeled Message Marshalling #82,
note that it was filed in 2017, then implement your own workaround pending that issue
being resolved.
Any code making use of V1 sdk classes will fail if they
hadoop-aws is declared as a dependencyS3AFileSystem class and associated classes.The sole solution to these problems is "move to the V2 SDK".
Some S3AUtils methods are deleted
cannot find symbol
[ERROR] symbol: method createAwsConf(org.apache.hadoop.conf.Configuration,java.lang.String)
[ERROR] location: class org.apache.hadoop.fs.s3a.S3AUtils
The signature and superclass of AWSCredentialProviderList has changed, which can surface in different
ways
Signature mismatch
cannot find symbol
[ERROR] symbol: method getCredentials()
[ERROR] location: variable credentials of type org.apache.hadoop.fs.s3a.AWSCredentialProviderList
It is no longer a V1 credential provider, cannot be used to pass credentials to a V1 SDK class
incompatible types: org.apache.hadoop.fs.s3a.AWSCredentialProviderList cannot be converted to com.amazonaws.auth.AWSCredentialsProvider
AmazonS3 replaced by S3Client; factory and accessor changed.The V1 s3 client class com.amazonaws.services.s3.AmazonS3 has been superseded by
software.amazon.awssdk.services.s3.S3Client
The S3ClientFactory interface has been replaced by one that creates a V2 S3Client.
InconsistentS3ClientFactory class has been deleted.S3AFileSystem method changes: S3AInternals.The low-level s3 operations/client accessors have been moved into a new interface,
org.apache.hadoop.fs.s3a.S3AInternals, which must be accessed via the
S3AFileSystem.getS3AInternals() method.
They have also been updated to return V2 SDK classes.
@InterfaceStability.Unstable
@InterfaceAudience.LimitedPrivate("testing/diagnostics")
public interface S3AInternals {
S3Client getAmazonS3V2Client(String reason);
S3AStore getStore();
@Retries.RetryTranslated
@AuditEntryPoint
String getBucketLocation() throws IOException;
@AuditEntryPoint
@Retries.RetryTranslated
String getBucketLocation(String bucketName) throws IOException;
@AuditEntryPoint
@Retries.RetryTranslated
HeadObjectResponse getObjectMetadata(Path path) throws IOException;
AWSCredentialProviderList shareCredentials(final String purpose);
@AuditEntryPoint
@Retries.RetryTranslated
HeadBucketResponse getBucketMetadata() throws IOException;
boolean isMultipartCopyEnabled();
@AuditEntryPoint
@Retries.RetryTranslated
long abortMultipartUploads(Path path) throws IOException;
}
S3AFileSystem.getAmazonS3ClientForTesting(String) moved and return type changedThe S3AFileSystem.getAmazonS3ClientForTesting() method has been been deleted.
Compilation
cannot find symbol
[ERROR] symbol: method getAmazonS3ClientForTesting(java.lang.String)
[ERROR] location: variable fs of type org.apache.hadoop.fs.s3a.S3AFileSystem
It has been replaced by an S3AInternals equivalent which returns the V2 S3Client
of the filesystem instance.
((S3AFilesystem)fs).getAmazonS3ClientForTesting("testing")
((S3AFilesystem)fs).getS3AInternals().getAmazonS3Client("testing")
S3AFileSystem.getObjectMetadata(Path path) moved to S3AInternals; return type changedThe getObjectMetadata(Path) call has been moved to the S3AInternals interface
and an instance of the software.amazon.awssdk.services.s3.model.HeadObjectResponse class
returned.
The original S3AFileSystem method has been deleted
Before:
((S3AFilesystem)fs).getObjectMetadata(path)
After:
((S3AFilesystem)fs).getS3AInternals().getObjectMetadata(path)
AWSCredentialProviderList shareCredentials(String) moved to S3AInternalsThe operation to share a reference-counted access to the AWS credentials used
by the S3A FS has been moved to S3AInternals.
This is very much an implementation method, used to allow extension modules to share an authentication chain into other AWS SDK client services (dynamoDB, etc.).
AWSCredentialProviderList has been upgraded to the V2 API.
refresh() method but this is now a deprecated no-op.Closeable; its close() method iterates through all entries in
the list; if they are Closeable or AutoCloseable then their close() method is invoked.Interface change: com.amazonaws.auth.Signer has been replaced by software.amazon.awssdk.core.signer.Signer.
The change in signers will mean the custom signers will need to be updated to implement the new interface.
There is no support to assist in this migration.
The callbacks from the SDK have all changed, as has
the interface org.apache.hadoop.fs.s3a.audit.AWSAuditEventCallbacks
Examine the interface and associated implementations to see how to migrate.
The option fs.s3a.audit.request.handlers to declare a list of v1 SDK
com.amazonaws.handlers.RequestHandler2 implementations to include
in the AWS request chain is no longer supported: a warning is printed
and the value ignored.
The V2 SDK equivalent, classes implementing software.amazon.awssdk.core.interceptor.ExecutionInterceptor
can be declared in the configuration option fs.s3a.audit.execution.interceptors.