S3 Encryption Addons

14 September 2015

I wanted to share an overview of a new library named amazonka-s3-encryption, which was created to supplement amazonka-s3 with client-side encryption. Client-side encryption allows transmission and storage of sensitive information (Data in Motion), whilst ensuring that Amazon never receives any of your unencrypted data. Previously amazonka-s3 only supported server-side encryption (Data at Rest), which requires transmission of unencrypted data to S3. The cryptographic techniques used within the library are modeled as closely as possible upon the official AWS SDKs, specifically the Java AWS SDK. Haddock documentation is available here.

Chunked Encoding
Encryption and Decryption
Instruction Files
Compatibility and Status

Chunked Encoding

The version 4 signing algorithm supports two modes for signing requests when communicating with S3. The first requires a SHA256 hash of the payload to calculate the request signature and the second allows incremental signature calculation for fixed or variable chunks of the payload. Up until now, amazonka (and all other SDKs excepting Java) only supported the first method.

This poses a problem for encryption, where the need to calculate the SHA256 hash of the encrypted contents requires the use of a temporary file or another buffering mechanism. For example, the aws-sdk-ruby library performs the following procedure to send an encrypted PutObject request:

Copy and encrypt the payload to a temporary file.
Obtain the SHA256 hash and file size of the encrypted file.
Stream the file contents to the socket during transmission.

This means whatever the payload size is, you have to stream/encrypt a complete copy of the payload contents to a temporary file before sending.

To avoid this same pitfall, amazonka-s3 now uses streaming signature calculation when sending requests. This removes the need for the pre-calculated SHA256 hash and allows the encryption and signing to be performed incrementally as the request is sent.

Unfortunately, despite the documentation claiming that Transfer-Encoding: chunked is supported - it appears that you need to estimate the encrypted Content-Length (including metadata) and send this without the Transfer-Encoding header, otherwise the signature calculations simply fail with the usual obtuse S3 403 response.

The smart constructors emitted by the generation step for all amazonka-* operations now take into account streaming signature support and you’re likely to encounter the following parameters for operations utilising a streaming request body:

HashedBody - A request body requiring a pre-calculated SHA256 hash.
ChunkedBody - A request body which supports streaming signature calculation.
RqBody - A request body supporting any signing method.

ToHashedBody and ToBody type classes are provided to make it easy to convert values such as JSON, ByteString, etc into the appropriate request body. amazonka itself exports functions such as hashedFile, chunkedFile and others to assist in constructing streaming request bodies.

All regular S3 PutObject and UploadPart operations now take advantage of streaming signature calculation with the default chunk size set to 128 KB. This seems to be a decent trade off between streaming and the expense of incrementally performing signature calculations, but I’d recommend profiling for your particular use-case if performance and allocations are a concern.

The above information is available in a more context sensitive format within the documentation.

Encryption and Decryption

Client-side encryption of S3 objects is used to securely and safely store sensitive data in S3. When using client-side encryption, the data is encrypted before it is sent to S3, meaning Amazon does not receive your unencrypted object data. Unfortunately the object metadata (headers) still leak, so any sensitive information should be stored within the payload itself.

The procedure for encryption is as follows:

A one-time-use symmetric key a.k.a. a data encryption key (or data key) and initialisation vector (IV) are generated locally. This data key and IV are used to encrypt the data of a single S3 object using an AES256 cipher in CBC mode, with PKCS7 padding. (For each object sent, a completely separate data key and IV are generated.)
The generated data encryption key used above is encrypted using a symmetric AES256 cipher in ECB mode, asymmetric RSA, or KMS facilities, depending on the client-side master key you provided.
The encrypted data is uploaded and the encrypted data key and material description are attached as object metadata (either headers or a separate instruction file). If KMS is used, the material description helps determine which client-side master key is later used for decryption, otherwise the configured client-side key at time of decryption is used.

For decryption:

The encrypted object is downloaded from Amazon S3 along with any metadata. If KMS was used to encrypt the data then the master key id is taken from the metadata material description, otherwise the client-side master key in the current environment is used to decrypt the data key, which in turn is used to decrypt the object data.

If you’re unsure about which key mechanism to use, I’d recommend using KMS initially to avoid having to store and manage your own master keys.

Instruction Files

By default, the metadata (known as an envelope) required for encryption (except for the master key itself) is stored as S3 object metadata on the encrypted object. Due to user-defined S3 metadata being limited to 8KB when sending a PUT request, if you are utilising object metadata for another purpose which exceeds this limit, an alternative method of storing the encryption envelope in an adjacent S3 object is provided. This method removes the metadata overhead at the expense of an additional HTTP request to perform encryption/decryption. By default the library will store and retrieve a <your-object-key>.instruction object if the related *Instruction suffixed functions are used.

Compatibility and Status

Metadata and instruction envelopes are designed to be compatible with the official Java AWS SDK (both V1 and V2 formats), but only a limited set of the possible encryption options are supported. Therefore assuming defaults, objects stored with this library should be retrievable by any of the other official SDKs, and vice versa. Support for other cryptographic configurations will be added in future, as needed.

amazonka-s3-encryption can currently be considered an initial preview release. Despite this, it’s tied to the greater release process for the other amazonka-* libraries and therefore life will start somewhere after version 1.3.1. It is separated from amazonka-s3 proper, there are extra dependencies not desirable within the main S3 package, such as amazonka-kms and conduit-combinators. This way those using unencrypted S3 operations do not inadvertantly end up with an amazonka-kms dependency.

The library is currently being used in a limited capacity and the release to Hackage will be delayed until I’m confident of correctness, robustness and compatibility aspects. If you’re brave enough to experiment, it’s contained within the greater amazonka project on GitHub. Please open an issue with any problems/suggestions or drop into the Amazonka Gitter chat if you have questions.

Brendan Hay

Blog

S3 Encryption Addons

Contents

Chunked Encoding

Encryption and Decryption

Instruction Files

Compatibility and Status

Brendan Hay

Blog

S3 Encryption Addons

Contents

Chunked Encoding

Encryption and Decryption

Instruction Files

Compatibility and Status

Related Posts