Using HDCloud with Amazon S3
The following sections will help you get started using Amazon S3 with Hortonworks Data Cloud:
- Data Storage on Amazon S3 describes core features, uses cases, concepts, and limitations that you should be aware of before working with Amazon S3.
- Using the S3A Filesystem Client includes supported and unsupported hadoop sf shell commands and instructions on how to run them against data on Amazon S3 using the S3A connector.
- Copying Data Between a Cluster and Amazon S3 includes instructions for how to copy data between HDFS and Amazon S3 using the DistCp utility.
- Using Apache Spark with Amazon S3 describes issues specific to working with Spark.
- Using Apache Hive with Amazon S3 describes issues specific to working with Hive.
- Authenticating with Amazon S3 includes instructions for how to set up access to Amazon S3 buckets.
- Configuring S3A includes a list of available S3A configuration properties and instructions on how to set these properties on a per-bucket basic or in general for all buckets.
The following sections include additional tips for working with Amazon S3:
- Improving Amazon S3 Performance includes a list of configuration parameters that you can tune in order to achieve better performance when working with Amazon S3.
- Troubleshooting Amazon S3 will help you troubleshoot issues related to working with Amazon S3, such as classpath-related errors, authentication failures, and side effects of S3 inconsistency.
- Encrypting Data on Amazon S3 with S3-SSE includes information on how to encrypt files with S3-SSE.
If you are looking for data sets to play around, you can use Landsat 8 data sets made available by AWS in a public Amazon S3 bucket called "landsat-pds". For more information, refer to Landsat on AWS.
This chapter is meant to serve as a general reference for working with Amazon S3, so you will find it relevant even if your clusters were not launched through the HDCloud for AWS.