Working with Encrypted Amazon S3 Data

Amazon S3 supports a number of encryption mechanisms to better secure the data in S3:

Apache Hadoop only supports Server-Side Encryption ("SSE") and does not support Client-Side Encryption ("CSE").

For this server-side encryption to work, the S3 servers require secret keys to encrypt data, and the same secret keys to decrypt it. These keys can be managed in three ways:

In general, the specific configuration mechanism can be set via the property fs.s3a.server-side-encryption-algorithm in core-site.xml. However, some encryption options require extra settings.

SSE-S3: Amazon S3-Managed Encryption Keys

In SSE-S3, all keys and secrets are managed inside S3. This is the simplest encryption mechanism.

Enabling SSE-S3

To write S3-SSE encrypted files, the value of fs.s3a.server-side-encryption-algorithm must be set to that of the encryption mechanism used in core-site; currently only AES256 is supported.

<property>
  <name>fs.s3a.server-side-encryption-algorithm</name>
  <value>AES256</value>
</property>

Once set, all new data will be uploaded encrypted. There is no need to set this property when downloading data — the data will be automatically decrypted when read using the Amazon S3-managed key.

To learn more, refer to Protecting Data Using Server-Side Encryption with Amazon S3-Managed Encryption Keys (SSE-S3) in AWS documentation.

SSE-KMS: Amazon S3-KMS Managed Encryption Keys

Amazon offers a pay-per-use key management service, AWS KMS. This service can be used to encrypt data on S3 using keys which can be centrally managed and assigned to specific roles and IAM accounts.

The AWS KMS can be used by S3 to encrypt uploaded data. When uploading data encrypted with SSE-KMS, the named key that was used to encrypt the data is retrieved from the KMS service, and used to encode the per-object secret which encrypts the uploaded data. To decode the data, the same key must be retrieved from KMS and used to unencrypt the per-object secret key, which is then used to decode the actual file.

KMS keys can be managed by an organization's administrators in AWS, including having access permissions assigned and removed from specific users, groups, and IAM roles. Only those "principals" with granted rights to a key may access it, hence only they may encrypt data with the key, and decrypt data encrypted with it. This allows KMS to be used to provide a cryptographically secure access control mechanism for data stores on S3.

AWS KMS service is not related to the Key Management Service built into Hadoop (Hadoop KMS). The Hadoop KMS primarily focuses on managing keys for HDFS Transparent Encryption. Similarly, HDFS encryption is unrelated to S3 data encryption.

Enabling SSE-KMS

To enable SSE-KMS, the property fs.s3a.server-side-encryption-algorithm must be set to SSE-KMS in core-site:

<property>
  <name>fs.s3a.server-side-encryption-algorithm</name>
  <value>SSE-KMS</value>
</property>

The ID of the specific key used to encrypt the data should also be set in the property fs.s3a.server-side-encryption.key:

<property>
  <name>fs.s3a.server-side-encryption.key</name>
  <value>arn:aws:kms:us-west-2:360379543683:key/071a86ff-8881-4ba0-9230-95af6d01ca01</value>
</property>

If your account is set up set up with a default KMS key and fs.s3a.server-side-encryption.key is unset, the default key will be used.

Alternatively, organizations may define a default key in the Amazon KMS; if a default key is set, then it will be used whenever SSE-KMS encryption is chosen and the value of fs.s3a.server-side-encryption.key is empty.

Note

AWS Key Management Service (KMS) is pay-per-use, working with data encrypted via KMS keys incurs extra charges during data I/O.

To learn more, refer to Protecting Data Using Server-Side Encryption with AWS KMS–Managed Keys (SSE-KMS) in AWS documentation.

SSE-C: Server-Side Encryption with Customer-Provided Encryption Keys

In SSE-C, the client supplies the secret key needed to read and write data.

Note

SSE-C integration with Hadoop is still stabilizing; issues related to it are still surfacing. It is already clear that SSE-C with a common key must be used exclusively within a bucket if it is to be used at all. This is the only way to ensure that path and directory listings do not fail with "Bad Request" errors.

Enabling SSE-C

To use SSE-C, the configuration option fs.s3a.server-side-encryption-algorithm must be set to SSE-C, and a base-64 encoding of the key placed in fs.s3a.server-side-encryption.key.

<property>
  <name>fs.s3a.server-side-encryption-algorithm</name>
  <value>SSE-C</value>
</property>

<property>
  <name>fs.s3a.server-side-encryption.key</name>
  <value>RG8gbm90IGV2ZXIgbG9nIHRoaXMga2V5IG9yIG90aGVyd2lzZSBzaGFyZSBpdA==</value>
</property>

This property can be set in a Hadoop JCEKS credential file, which is significantly more secure than embedding secrets in the XML configuration file.

Configuring Encryption for Specific Buckets

S3A's per-bucket configuration mechanism can be used to configure the encryption mechanism and credentials for specific buckets. For example, to access the bucket called "production" using SSE-KMS with the key ID arn:aws:kms:us-west-2:360379543683:key/071a86ff-8881-4ba0-9230-95af6d01ca01, the settings are:

<property>
  <name>fs.s3a.bucket.production.server-side-encryption-algorithm</name>
  <value>SSE-KMS</value>
</property>

<property>
  <name>fs.s3a.bucket.production.server-side-encryption.key</name>
  <value>arn:aws:kms:us-west-2:360379543683:key/071a86ff-8881-4ba0-9230-95af6d01ca01</value>
</property>

Per-bucket configuration does not apply to secrets kept in JCEKS files; the core configuration properties must be used (for example fs.s3a.server-side-encryption.key), with the path to the JCEKS file instead configured for the bucket:

<property>
  <name>fs.s3a.bucket.production.server-side-encryption-algorithm</name>
  <value>SSE-KMS</value>
</property>

<property>`
  <name>fs.s3a.bucket.production.security.credential.provider.path</name>
  <value>hdfs://common/production.jceks</value>
</property>

To learn more, refer to Protecting Data Using Server-Side Encryption with Customer-Provided Encryption Keys (SSE-C) in AWS documentation.

Performance Impact of Encryption

Server Side encryption slightly slows down performance when reading data from S3, both in the reading of data during the execution of a query, and in scanning the files prior to the actual scheduling of work.

Amazon throttles reads and writes of S3-SSE data, which results in a significantly lower throughput than normal S3 IO requests. The default rate, 600 requests/minute, means that at most ten objects per second can be read or written using SSE-KMS per second — across an entire hadoop cluster (or indeed, the entire customer account). The default limits may be suitable during development — but in large scale production applications the limits may rapidly be reached. Contact Amazon to increase capacity.

Mandating Encryption for an S3 Bucket

To mandate that all data uploaded to a bucket is encrypted, it is possible to set a bucket policy declaring that clients must provide encryption information with all data uploaded.

Mandating encryption across a bucket offers significant benefits:

  1. It guarantees that all clients uploading data have encryption enabled; there is no need (or indeed, easy mechanism) to test this within a client.
  2. It guarantees that the same encryption mechanism is used by all clients.
  3. If applied to an empty bucket, it guarantees that all data in the bucket is encrypted.

We recommend selecting an encryption policy for a bucket when the bucket is created, and setting it in the bucket policy. This stops misconfigured clients from unintentionally uploading unencrypted data.

Note

Mandating an encryption mechanism on newly uploaded data does not encrypt existing data; existing data will retain whatever encryption (if any) applied at the time of creation.

Here is a policy to mandate SSE-S3/AES265 encryption on all data uploaded to a bucket. This covers uploads as well as the copy operations which take place when file/directory rename operations are mimicked.

{
  "Version": "2012-10-17",
  "Id": "EncryptionPolicy",
  "Statement": [
    {
      "Sid": "RequireEncryptionHeaderOnPut",
      "Effect": "Deny",
      "Principal": "*",
      "Action": [
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::BUCKET/*",
      "Condition": {
        "Null": {
          "s3:x-amz-server-side-encryption": true
        }
      }
    },
    {
      "Sid": "RequireAESEncryptionOnPut",
      "Effect": "Deny",
      "Principal": "*",
      "Action": [
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::BUCKET/*",
      "Condition": {
        "StringNotEquals": {
          "s3:x-amz-server-side-encryption": "AES256"
        }
      }
    }
  ]
}

To use SSE-KMS, a different restriction must be defined:

{
  "Version": "2012-10-17",
  "Id": "EncryptionPolicy",
  "Statement": [
    {
      "Sid": "RequireEncryptionHeaderOnPut",
      "Effect": "Deny",
      "Principal": "*",
      "Action": [
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::BUCKET/*",
      "Condition": {
        "Null": {
          "s3:x-amz-server-side-encryption": true
        }
      }
    },
    {
      "Sid": "RequireKMSEncryptionOnPut",
      "Effect": "Deny",
      "Principal": "*",
      "Action": [
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::BUCKET/*",
      "Condition": {
        "StringNotEquals": {
          "s3:x-amz-server-side-encryption": "SSE-KMS"
        }
      }
    }
  ]
}

To use one of these policies:

  1. Replace BUCKET with the specific name of the bucket being secured.
  2. Locate the bucket in the AWS console S3 section.
  3. Select the "Permissions" tab.
  4. Select the "Bucket Policy" tab in the permissions section.
  5. Paste the edited policy into the editor.
  6. Save the policy.

Troubleshooting S3-SSE

AccessDeniedException When Creating Directories and Files

Operations such as creating directories (mkdir()/innerMkdirs()) or files fail when trying to create a file or directory in an object store where the bucket permission requires encryption of a specific type, and the client is not configured to use this specific encryption mechanism.

To resolve the issue, you must configure the client to use the encryption mechanism that you specified in the bucket permissions.

java.nio.file.AccessDeniedException: /test: innerMkdirs on /test: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 398EB3738450B416), S3 Extended Request ID: oOcNg+RvbS5YaJ7EQNXVZnHOF/7fwwhCzyRCjFF+UKLRi3slkobphLt/M+n4KPw5cljSSt2f6/E=
    at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158)
    at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:101)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.mkdirs(S3AFileSystem.java:1528)
    at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2216)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 398EB3738450B416)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1586)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1254)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1035)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:747)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:721)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:704)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:672)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:654)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:518)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4185)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4132)
    at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1712)
    at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInOneChunk(UploadCallable.java:133)
    at com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:125)
    at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:139)
    at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:47)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

AES256 Is Enabled but an Encryption Key Was Set in fs.s3a.server-side-encryption.key

You will see this error when the encryption mechanism is set to SSE-S3/AES-256 but the configuration also declares an encryption key. The error happens because user-supplied keys are not supported in SSE-S3. Remove the fs.s3a.server-side-encryption.key setting or switch to SSE-KMS encryption.

testEncryption(org.apache.hadoop.fs.s3a.ITestS3AEncryptionSSECBlockOutputStream)  Time elapsed: 0.103 sec  <<< ERROR!
java.io.IOException: AES256 is enabled but an encryption key was set in fs.s3a.server-side-encryption.key (key length = 44)
  at org.apache.hadoop.fs.s3a.S3AUtils.getEncryptionAlgorithm(S3AUtils.java:758)
  at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:260)
  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3242)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:467)
  at org.apache.hadoop.fs.contract.AbstractBondedFSContract.init(AbstractBondedFSContract.java:72)
  at org.apache.hadoop.fs.contract.AbstractFSContractTestBase.setup(AbstractFSContractTestBase.java:177)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
  at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
  at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
  at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
  at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
  at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
  at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)

Unknown Server Side Encryption Algorithm SSE_C

This error means that the algorithm is unknown or mistyped; here "SSE_C" was used rather than "SSE-C":

java.io.IOException: Unknown Server Side Encryption algorithm SSE_C
  at org.apache.hadoop.fs.s3a.S3AEncryptionMethods.getMethod(S3AEncryptionMethods.java:67)
  at org.apache.hadoop.fs.s3a.S3AUtils.getSSEncryptionAlgorithm(S3AUtils.java:760)

Make sure to enter the correct algorithm name.