Managing Shared Metastores

When creating a cluster, you have an option to have a Hive metastore database created with the cluster, or to use an external Hive metastore that is backed by Amazon RDS. Using an external Amazon RDS database for a metastore allows you to preserve the metastore metadata and reuse it between clusters.

Additionally, when creating an HDP 2.6 cluster using the BI configuration, you have an option to have a Druid metastore database created with the cluster, or you can use an external Druid metastore that is backed by Amazon RDS.

A Druid metastore can only be used with HDP 2.6 clusters that use the BI: Druid configuration.

There are two ways to register an external metastore:

Important

Registering a Hive or Druid metastore does not create the Amazon RDS instance. You must create your Amazon RDS instance and database prior to registering the database in the cloud controller UI. For high-level steps to create an RDS instance, refer to Creating an Amazon RDS Instance.

Registering a Shared Metastore

You must create an RDS instance and database prior to registering it as a metastore. Only PostgreSQL Engine is supported at this time. For instructions, refer to Creating an Amazon RDS Instance. Next, register the metastore in the cloud controller UI.

Make sure to apply the following guidelines:

Creating an Amazon RDS Instance

Important

This section provides high-level guidelines for creating an Amazon RDS instance compatible with the requirements of the controller. Refer to the Amazon RDS documentation for detailed information about creating and managing RDS instances.

Follow these steps to create an RDS instance:

  1. Navigate to the RDS Dashboard in the AWS Management Console. In the top right corner, select the region in which you want to create your DB instance.

    Although not required, we recommend that you create your DB instance and the cloud controller in the same region. See AWS Regions for a list of supported regions.

  2. In the RDS Dashboard navigation pane, click Instances, and then click on Launch DB instance to launch the Launch DB Instance Wizard.

  3. For Step 1: Select Engine, select the PostgreSQL Engine.

    Only PostgreSQL Engine is supported at this time.

  4. For Step 2: Production?, choose Production or Dev/Test, depending on your requirements.

  5. For Step 3: Specify DB Details, select DB Engine Version 9.4 or later. There is no minimum requirement for the DB Instance Class.

    Only PostgreSQL 9.4 or later DB Engine is supported at this time.

  6. For Step 4: Configure Advanced Settings, in the Network & Security section, select the VPC where the RDS instance should be started. The important part here is the configuration of the security group:

    Scenario Public Access Security Group
    Start your cloud controller in the same VPC as the RDS instance The security group can be closed to the outside: "Publicly Accessible" can be set to "No". The security group can be configured to open access only from the internal network.
    Other scenarios "Publicly Accessible" must be set to "Yes" so the RDS instance can have a public IP address. The security group must be open to the cloud controller.

    There may be additional scenarios involving advanced AWS network setup. For these scenarios, the general guideline is that the RDS instance must be accessible from the cloud controller.

    To make sure that the chosen security group has the required access rules configured, verify the Connection Information in the right pane and, if needed, click on the link to update its inbound access rules, making sure that the cloud controller can connect to the RDS:

  7. In the Database Options section, enter Database Name.

    This parameter is optional, but if you don't provide it, you will have to manually create a database on the RDS instance before launching the controller.

  8. Click Launch DB Instance to create your RDS instance. When the RDS instance is ready, proceed to the next step.

  9. When launching the controller, use the ADVANCED CloudFormation template, which allows you to provide the RDS instance URL, username, password, and database name.

    To get the RDS Endpoint, copy the Endpoint from the RDS Dashboard > Instances:

Registering a Shared Metastore

  1. Create an instance of Amazon RDS with the PostgreSQL DB Engine.

    Only PostgreSQL Engine is supported at this time.

  2. From the controller UI navigation menu, select SHARED SERVICES.

  3. The list of registered shared services is displayed.

  4. Click +NEW and select Hive Metastore or Druid Metastore. The registration form is displayed.

  5. Enter the following parameters:

    Parameter Description
    Name Enter the name to use when registering this metastore to the cloud controller. Allowed characters are lowercase letters, numbers, and dashes. This is not the database name.
    HDP Version Select the version of HDP that this metastore can be used with.
    JDBC Connection Select the database type (PostgreSQL) and enter the JDBC connection string (HOST:PORT/DB_NAME).
    Authentication Enter the JDBC connection username and password.
  6. Click Test connection to validate and test the RDS connection information. If you experience connection issues, refer to the Amazon RDS Troubleshooting documentation.

  7. Once your settings are validated and working, click REGISTER HIVE METASTORE or REGISTER DRUID METASTORE to save the metastore. The metastore will now show up in the list of available metastores when creating a cluster.

Viewing Existing Shared Metastores

  1. From the controller UI navigation menu, select SHARED SERVICES.

  2. The list of registered shared services is displayed.

  3. Select HIVE METASORES OR DRUID METASTORES to view registered metastores.

  4. To view the details of a specific metastore, click on its corresponding entry.

Deleting an Existing Shared Metastore

  1. From the controller UI navigation menu, select SHARED SERVICES.

  2. The list of registered shared services is displayed.

  3. Select HIVE METASORES OR DRUID METASTORES to view registered metastores.

  4. Click on the control icon next to the metastore name.

  5. Click DELETE METASTORE and then YES, DELETE METASTORE. This unregisters the metastore with the cloud controller, but does not terminate the RDS instance.

Managing Metastores via CLI

You can view existing metastores and register new ones using the CLI.