Troubleshooting Amazon S3

Common problems that you may encounter while working with Amazon S3 include:

  1. Classpath Related Errors
  2. Errors When Deleting Files
  3. Authentication Failures
  4. S3 Inconsistency Side-Effects

Classpath is usually the first problem. The Hadoop S3 filesystem clients need the Hadoop-specific filesystem clients and third party S3 client libraries to be compatible with the Hadoop code, and any dependent libraries to be compatible with Hadoop and the specific JVM.

The classpath must be set up for the process talking to S3. If this is code running in the Hadoop cluster, then the JARs must be on that classpath. This includes distcp.

ClassNotFoundException Errors

ClassNotFoundException: org.apache.hadoop.fs.s3a.S3AFileSystem
ClassNotFoundException: org.apache.hadoop.fs.s3native.NativeS3FileSystem
ClassNotFoundException: org.apache.hadoop.fs.s3.S3FileSystem

These are the Hadoop classes, found in the hadoop-aws JAR. An exception reporting that one of these classes is missing means that this JAR is not on the classpath.

Similarly, this error

ClassNotFoundException: com.amazonaws.services.s3.AmazonS3Client

or similar errors related to another com.amazonaws class means that one or more of the aws-*-sdk JARs are missing.

Solution: Add the missing JARs to the classpath.

Missing Method in com.amazonaws Class

This can be triggered by incompatibilities between the AWS SDK on the classpath and the version with which Hadoop was compiled.

The AWS SDK JARs change their signature between releases often, so the only way to safely update the AWS SDK version is to recompile Hadoop against the later version.

There is nothing the Hadoop team can do here: if you get this problem, then you are on your own. The Hadoop developer team did look at using reflection to bind to the SDK, but there were too many changes between versions for this to work reliably. All it did was postpone version compatibility problems until the specific codepaths were executed at runtime. This was actually a backward step in terms of fast detection of compatibility problems.

Missing Method in a Jackson Class

This is usually caused by version mismatches between Jackson JARs on the classpath. All Jackson JARs on the classpath must be of the same version.

"Bad Request" Exception when Working with AWS S3 Frankfurt, Seoul, or Elsewhere

S3 Frankfurt and Seoul only support the V4 authentication API. Consequently, any requests using the V2 API will be rejected with 400 Bad Request.

$ bin/hadoop fs -ls s3a://frankfurt/
WARN s3a.S3AFileSystem: Client: Amazon S3 error 400: 400 Bad Request; Bad Request (retryable)

com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 923C5D9E75E44C06), S3 Extended Request ID: HDwje6k+ANEeDsM6aJ8+D5gUmNAMguOk2BvZ8PH3g9z0gpH+IuwT7N19oQOnIr5CIx7Vqb/uThE=
    at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1182)
    at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:770)
    at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:489)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:310)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3785)
    at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1107)
    at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:1070)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:307)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:284)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2793)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:101)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2830)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2812)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
    at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:325)
    at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:235)
    at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:218)
    at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:103)
    at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
    at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
    at org.apache.hadoop.fs.FsShell.main(FsShell.java:373)
ls: doesBucketExist on frankfurt-new: com.amazonaws.services.s3.model.AmazonS3Exception:
  Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request;

This happens when you are trying to work with any S3 service which only supports the "V4" signing API — and the client is configured to use the default S3A service endpoint.

Solution: The S3A client needs to be given the endpoint to use via the fs.s3a.endpoint property:

<property>
  <name>fs.s3a.endpoint</name>
  <value>s3.eu-central-1.amazonaws.com</value>
</property>

Error Message "The bucket you are attempting to access must be addressed using the specified endpoint"

This surfaces when fs.s3a.endpoint is configured to use S3 service endpoint which is neither the original AWS one (s3.amazonaws.com) nor the one where the bucket is hosted.

org.apache.hadoop.fs.s3a.AWSS3IOException: purging multipart uploads on landsat-pds:
 com.amazonaws.services.s3.model.AmazonS3Exception:
  The bucket you are attempting to access must be addressed using the specified endpoint.
  Please send all future requests to this endpoint.
   (Service: Amazon S3; Status Code: 301; Error Code: PermanentRedirect; Request ID: 5B7A5D18BE596E4B),
    S3 Extended Request ID: uE4pbbmpxi8Nh7rycS6GfIEi9UH/SWmJfGtM9IeKvRyBPZp/hN7DbPyz272eynz3PEMM2azlhjE=:

    at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1182)
    at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:770)
    at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:489)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:310)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3785)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3738)
    at com.amazonaws.services.s3.AmazonS3Client.listMultipartUploads(AmazonS3Client.java:2796)
    at com.amazonaws.services.s3.transfer.TransferManager.abortMultipartUploads(TransferManager.java:1217)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.initMultipartUploads(S3AFileSystem.java:454)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:289)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2715)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:96)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2749)
    at org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:2737)
    at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:430)

Solution:

  1. Use the specific endpoint of the bucket's S3 service.
  2. If not using "V4" authentication, the original S3 endpoint can be used:
<property>  
  <name>fs.s3a.endpoint</name>
  <value>s3.amazonaws.com</value>  
</property>  

Using the explicit endpoint for the region is recommended for speed and the ability to use the V4 signing API.

Errors When Deleting Files

MultiObjectDeleteException During Delete or Rename of Files

Exception in thread "main" com.amazonaws.services.s3.model.MultiObjectDeleteException: 
    Status Code: 0, AWS Service: null, AWS Request ID: null, AWS Error Code: null,
    AWS Error Message: One or more objects could not be deleted, S3 Extended Request ID: null
  at com.amazonaws.services.s3.AmazonS3Client.deleteObjects(AmazonS3Client.java:1745)

This error happens when you are trying to delete multiple objects but one of the objects cannot be deleted. Typically, this is due to the fact that the caller lacks the permission to delete these objects.

This error should not occur just because the objects are missing.

Solution:

Consult the log to see the specifics of which objects could not be deleted. Determine whether or not you have the permission to do so.

If the operation is failing for reasons other than the caller lacking permissions:

  1. Try setting fs.s3a.multiobjectdelete.enable to false.
  2. Consult HADOOP-11572 for up-to-date advice.

Authentication Failures

If Hadoop cannot authenticate with the S3 service endpoint, the client retries a number of times before eventually failing. When it finally gives up, it will report a message about signature mismatch:

com.amazonaws.services.s3.model.AmazonS3Exception:
 The request signature we calculated does not match the signature you provided.
 Check your key and signing method.
  (Service: Amazon S3; Status Code: 403; Error Code: SignatureDoesNotMatch,

The likely cause is that you either have the wrong credentials for any of the current authentication mechanism(s) — or somehow the credentials were not readable on the host attempting to read or write the S3 bucket.

Enabling debug logging for the package org.apache.hadoop.fs.s3a can help provide more information.

Another common cause is that there is an errors in the configuration properties. There are a couple of system configuration problems (JVM version, system clock) that you should check.

First Troubleshooting Steps

  1. Make sure that the name of the bucket is the correct one. That is, check the URL.

  2. Make sure the property names are correct. For S3A, they are fs.s3a.access.key and fs.s3a.secret.key. You cannot just copy the S3N properties and replace s3n with s3a.

  3. Make sure that the properties are visible to the process attempting to talk to the object store. Placing them in core-site.xml is the standard mechanism.

  4. If using session authentication, the session may have expired. Generate a new session token and secret.

  5. If using environment variable-based authentication, make sure that the relevant variables are set in the environment in which the process is running.

Checking Environment Variables

The standard first step is: try to use the AWS command line tools with the same credentials, through a command such as:

hdfs fs -ls s3a://my-bucket/

Note the trailing "/" here; without that the shell thinks you are trying to list your home directory under the bucket, which will only exist if explicitly created.

Attempting to list a bucket using inline credentials is a means of verifying that the key and secret can access a bucket:

hdfs fs -ls s3a://key:secret@my-bucket/

Do escape any + or / symbols in the secret, as discussed below, and never share the URL, logs generated using it, or use such an inline authentication mechanism in production.

Finally, if you set the environment variables, you can take advantage of S3A's support of environment-variable authentication by attempting the same ls operation. That is, unset the fs.s3a secrets and rely on the environment variables.

Authentication Failure Due to Clock Skew

The timestamp is used in signing to S3, so as to defend against replay attacks. If the system clock is too far behind or ahead of Amazon's, requests will be rejected.

This can surface as the situation where read requests are allowed, but operations which write to the bucket are denied.

Solution: Check the system clock.

Authentication Failure When Using URLs with Embedded Secrets

If you are using the (strongly discouraged) mechanism of including the AWS Key and secret in a URL, make sure that both "+" and "/" symbols are encoded in the URL. As many AWS secrets include these characters, encoding problems are not uncommon.

Use this table for conversion:

Symbol Encoded Value
+ %2B
/ %2F

For example, a URL for an S3 bucket with AWS ID user1 and secret a+b/c will be represented as

s3a://user1:a%2Bb%2Fc@bucket

You only need to use this technique when placing secrets in the URL. Again, this is something we strongly advise against.

Authentication Failures When Running on Java 8u60+

A change in the Java 8 JVM broke some of the toString() string generation of Joda Time 2.8.0, which stopped the Amazon S3 client from being able to generate authentication headers suitable for validation by S3.

Solution: Make sure that the version of Joda Time is 2.8.1 or later, or use a new version of Java 8.

Visible S3 Inconsistency

Amazon S3 is an eventually consistent object store. That is, it is not a filesystem.

It offers read-after-create consistency, which means that a newly created file is immediately visible. Except, there is a small quirk: a negative GET may be cached, such that even if an object is immediately created, the fact that there "wasn't" an object is still remembered.

That means the following sequence on its own will be consistent:

touch(path) -> getFileStatus(path)

But this sequence may be inconsistent:

getFileStatus(path) -> touch(path) -> getFileStatus(path)

A common source of visible inconsistencies is that the S3 metadata database — the part of S3 which serves list requests — is updated asynchronously. Newly added or deleted files may not be visible in the index, even though direct operations on the object (HEAD and GET) succeed.

In S3A, that means that the getFileStatus() and open() operations are more likely to be consistent with the state of the object store than any directory list operations (listStatus(), listFiles(), listLocatedStatus(), listStatusIterator()).

FileNotFoundException, Even Though the File Was Just Written

This can be a sign of consistency problems. It may also surface if there is some asynchronous file write operation still in progress in the client: the operation has returned, but the write has not yet completed. While the S3A client code does block during the close() operation, we suspect that asynchronous writes may be taking place somewhere in the stack; This could explain why parallel tests fail more often than serialized tests.

File Not Found in a Directory Listing, Even Though getFileStatus() Finds It

Or — a deleted file is found in listing, even though getFileStatus() reports that it is not there.

This is a visible sign of updates to the metadata server, which is lagging behind the state of the underlying filesystem.

File Not Visible/Saved

The files in an object store are not visible until the write has been completed. In-progress writes are simply saved to a local file/cached in RAM and only uploaded at the end of a write operation. If a process terminated unexpectedly, or failed to call the close() method on an output stream, the pending data will have been lost.

File flush() and hflush() Calls Do Not Save Data to S3A

Again, this is due to the fact that the data is cached locally until the close() operation. The S3A filesystem cannot be used as a store of data if it is required that the data is persisted durably after every flush()/hflush() call. This includes resilient logging, HBase-style journaling and the like. The standard strategy here is to save to HDFS and then copy to S3.

Connectivity Problems

Unable to execute HTTP request: Read timed out

A read timeout means that the S3A client could not talk to the S3 service, and eventually gave up trying:

Unable to execute HTTP request: Read timed out
java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:170)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166)
    at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90)
    at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281)
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92)
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
    at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
    at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
    at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
    at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
    at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
    at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:66)
    at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
    at org.apache.http.impl.client.DefaultRequestDirector.createTunnelToTarget(DefaultRequestDirector.java:902)
    at org.apache.http.impl.client.DefaultRequestDirector.establishRoute(DefaultRequestDirector.java:821)
    at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:647)
    at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
    at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
    at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
    at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:384)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
    at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1111)
    at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91)

This is not uncommon in Hadoop client applications — there is a whole wiki entry dedicated to possible causes of the error.

For S3 connections, key causes are: