Using Apache Spark
To access Spark command line tools such as Spark Shell, SSH to any cluster node.
After starting the Spark Shell, you will be notified which version of Spark will be used by default:
[cloudbreak@ip-10-0-1-180 ~]$ spark-shell Multiple versions of Spark are installed but SPARK_MAJOR_VERSION is not set Spark1 will be picked by default
To set your preferred version as default, exit the shell and set
2. For example:
When you log in to spark-shell again, you will see a message similar to:
cloudbreak@ip-10-0-1-180: ~]$ spark-shell SPARK_MAJOR_VERSION is set to 1, using Spark1
Amazon S3 + Hive
If you are planning to access Amazon S3 data in Spark, refer to Using Apache Spark with Amazon S3.