Using Apache Hive

Depending on your needs, you can use Hive to access data with the following tools:

Tool Description Notes
Beeline Beeline is a JDBC client that communicates with Hive using the Hive Thrift APIs.

You can use Beeline:

  • Installed on the cluster master node (ssh to master node and run beeline), or
  • From a local machine (if you have installed the Beeline libraries locally).

Refer to Beeline below for more information.

Hive JDBC JDBC is useful for accessing Hive from many popular BI tools. Refer to Hive JDBC below for more information.
Hive CLI Hive CLI is a thick client and command line tool for accessing Hive. SSH to a cluster node and run the Hive client.
Hive View Hive View is a UI that enables you to access Hive from Ambari. You can access Hive View from the links in the cloud controller UI.
Zeppelin UI Zeppelin UI is a web-based notebook for interactive data analytics. If your selected cluster configuration includes Zeppelin, you can access Zeppelin from the links in the cloud controller UI.

Amazon S3 + Hive

If you are planning to access Amazon S3 data from Hive, refer to Using Apache Hive with Amazon S3.

Learn More

For general information about accessing data using Apache Hive, refer to the Data Access Guide, Apache Hive Performance Tuning, and the Apache documentation.

Download SSL Certificate

By default, the gateway has been configured with a self-signed certificate to protect the Hive endpoint via SSL. Therefore, in order to use Hive via JDBC or Beeline client, you must download the SSL certificate from the gateway and add it to your truststore.

Alternatively, you can configure your own trusted certificate. If you choose this option, do not perform the SSL certificate download steps listed below. Instead, follow the instructions for Using Your Own SSL Certificate.

On Linux or OSX, you can download the self-signed SSL certificate using the following commands:

export GATEWAY_HOSTNAME=PUBLIC_HOSTNAME_CONTROLLER_NODE
export GATEWAY_ADDRESS=PUBLIC_IP_ADDRESS_CONTROLLER_NODE
export GATEWAY_JKS_PASSWORD=GATEWAY_PASSWORD
openssl s_client -servername ${GATEWAY_HOSTNAME} -connect ${GATEWAY_ADDRESS}:443 -showcerts </dev/null | openssl x509 -outform PEM > gateway.pem
keytool -import -alias gateway-identity -file gateway.pem -keystore gateway.jks -storepass ${GATEWAY_JKS_PASSWORD}

Where:
GATEWAY_ADDRESS - Set this to the IP address of the controller node instance.
GATEWAY_JKS_PASSWORD - Create a password for the truststore that will hold the self-signed certificate. The password must be at least 6 characters length.

For example:

export GATEWAY_HOSTNAME=ec2-52-86-252-73.compute-1.amazonaws.com
export GATEWAY_ADDRESS=52.88.147.214
export GATEWAY_JKS_PASSWORD=Hdcjks123!
openssl s_client -servername ${GATEWAY_HOSTNAME} -connect ${GATEWAY_ADDRESS}:443 -showcerts </dev/null | openssl x509 -outform PEM > gateway.pem
keytool -import -alias gateway-identity -file gateway.pem -keystore gateway.jks -storepass ${GATEWAY_JKS_PASSWORD}

After executing these commands, gateway.pem and gateway.jks files will be downloaded onto your computer to the location where you ran the commands.

Beeline

To use Beeline on a node in the cluster:

  1. SSH to the master node in the cluster.
  2. Download the SSL certificate.
  3. Then you can run the Beeline client and connect to Hive.

For example:

export CLUSTER_USER=admin
export CLUSTER_PASSWORD=Admin123!
export CLUSTER_NAME=myllap
beeline -u "jdbc:hive2://${GATEWAY_HOSTNAME}:443/;ssl=true;sslTrustStore=gateway.jks;trustStorePassword=${GATEWAY_JKS_PASSWORD};transportMode=http;httpPath=${CLUSTER_NAME}/services/hive" -d org.apache.hive.jdbc.HiveDriver -n ${CLUSTER_USER} -p ${CLUSTER_PASSWORD}

Hive JDBC

Hive can be accessed via JDBC through the Protected Gateway that is automatically installed and configured in your cluster. If your cluster configuration includes Hive LLAP, then Hive LLAP is configured with the gateway; otherwise, HiveServer2 is configured. In either case, the transport mode is “http” and the gateway path to Hive is "${CLUSTER_NAME}/services/hive".

Before you can start using Hive JDBC, you must download the SSL certificate to your truststore. After downloading the SSL certificate, the Hive JDBC endpoint is:

jdbc:hive2://${GATEWAY_HOSTNAME}:443/;ssl=true;sslTrustStore=gateway.jks;trustStorePassword=${GATEWAY_JKS_PASSWORD};transportMode=http;httpPath=${CLUSTER_NAME}/services/hive

Here are two examples of using tools to connect to Hive via JDBC:

Example: SQL Workbench/J

SQL Workbench/J is a cross-platform SQL tool that can be used to access database systems. In this example, we will provide a high-level overview of the steps required to setup SQL Workbench to access Hive via JDBC.

Prerequisite: Download the SSL Certificate

Step 1: Set up SQL Workbench and Hive JDBC Driver

  1. Download and install SQL Workbench. Refer to http://www.sql-workbench.net/getting-started.html.
  2. Download the Hortonworks JDBC Driver for Apache Hive from https://hortonworks.com/downloads/#addons. Next, extract the driver package.
  3. Launch SQL Workbench.
  4. From the Select Connection Profile window, click Manage Drivers. The Manage drivers window will appear.
  5. Click to create a new driver, and enter the Name: “Hortonworks Hive JDBC”.
  6. Click and then browse to the Hortonworks JDBC Driver for Apache Hive package that you downloaded earlier. Next, select the JDBC Driver JAR files in the package.
  7. When prompted, select the “com.simba.hive.jdbc41.HS2Driver” driver.
  8. For the Sample URL, enter: jdbc:hive2://${GATEWAY_HOSTNAME}:443/
  9. Click OK to save the driver.

Step 2: Create a Connection to Hive

  1. From the Select Connection Profile window, select the “Hortonworks Hive JDBC" from the Driver dropdown.
  2. For URL , enter the URL to the controller, such as jdbc:hive2://ec2-52-86-252-73.compute-1.amazonaws.com:443/ (where ec2-52-86-252-73.compute-1.amazonaws.com is the public hostname of your controller node).
  3. For Username and Password, enter the credentials that you created when creating your cluster.
  4. Click Extended Properties and add the following properties:

    Property Value
    ssl 1
    transportMode http
    httpPath ${CLUSTER_NAME}/services/hive
    sslTrustStore Enter the path to the gateway.jks file. This file was generated when you downloaded the SSL certificate.
    trustStorePassword Enter the GATEWAY_JKS_PASSWORD that you specified when you downloaded the SSL certificate.

    After performing these steps, your configuration should look similar to:

  5. Click OK to save the properties.

  6. Click Test to confirm a connection can be established.
  7. Click OK to make the connection and start using SQL Workbench to query Hive.

Example: Tableau

Tableau is a business intelligence tool for interacting with and visualizing data via SQL. Connecting Tableau to Hive requires the use of an ODBC driver. In this example, we will provide high-level steps required to set up Tableau to access Hive.

Prerequisite: Download the SSL Certificate

Step 1: Set up ODBC Driver

  1. Download the Hortonworks ODBC Driver for Apache Hive from https://hortonworks.com/downloads/#addons. Next, extract and install the driver.

Step 2: Launch Tableau and Connect to Hive

  1. Launch Tableau. If you do not already have Tableau, you can download a trial version from https://www.tableau.com/trial/download-tableau.
  2. In Tableau, create a connection to a “Hortonworks Hadoop Hive” server. Enter the following:

    Property Value
    Server Enter the public hostname of your controller node instance.
    Port 443
    Type HiveServer2
    Authentication Username and Password
    Transport HTTP
    Username Enter the cluster username created when creating your cluster
    Password Enter the cluster password created when creating your cluster
    HTTP Path ${CLUSTER_NAME}/services/hive
  3. Check the Require SSL checkbox.

  4. Click on the text underneath the checkbox to add a configuration file link.
  5. Specify to use a custom SSL certificate, and then browse to the SSL certificate gateway.pem file that was generated when you downloaded the SSL certificate as a prerequisite.
  6. Click OK.

    After performing these steps, your configuration should look similar to:

  7. Click Sign In and you will be connected to Hive.