Using Apache Spark

Spark Tools

To access Spark command line tools such as Spark Shell, SSH to any cluster node.

After starting the Spark Shell, you will be notified which version of Spark will be used by default:

[cloudbreak@ip-10-0-1-180 ~]$ spark-shell
Multiple versions of Spark are installed but SPARK_MAJOR_VERSION is not set
Spark1 will be picked by default

To set your preferred version as default, exit the shell and set SPARK_MAJOR_VERSION to 1 or 2. For example:


When you log in to spark-shell again, you will see a message similar to:

cloudbreak@ip-10-0-1-180: ~]$ spark-shell
SPARK_MAJOR_VERSION is set to 1, using Spark1

Amazon S3 + Hive

If you are planning to access Amazon S3 data in Spark, refer to Using Apache Spark with Amazon S3.

Learn More

For general information about Apache Spark, refer to the Apache Spark Component Guide and the Apache documentation.