Working with the Hive Data Source

Foresight Analytics Platform enables the use of the Hive infrastructure as a data source. Hive is a data management system with the use of SQL-like query language (HiveQL) and it works on the basis of the Hadoop system of data storing and handling.

To start working with the Hive data source, execute the following steps.

1. Application Installation

To create a connection to the Hive data source it is necessary to use the appropriate ODBC driver.

The Hortonworks Hive ODBC Driver driver is recommended to use as the ODBC driver. Load the distribution file and install the driver by executing the command:

sudo dpkg -i ./clouderahiveodbc*.deb

IMPORTANT. Bitness of the installed application must be the same as bitness of Foresight Analytics Platform.

Step 2. Setting Up ODBC Driver

Open the file for edit: /opt/cloudera/hiveodbc/lib/64/cloudera.hiveodbc.ini and finf the parameters:

The example of the cloudera.hiveodbc.ini file.

Step 3. Setting Up Connection

For details about connection setup see the guide.

There are two main methods for connection setup:

Full-String Connection Setup

  1. Prepare the /etc/odbcinst.ini file. The example of file contents:

[ODBC Drivers]

Cloudera ODBC Driver for Apache Hive 32-bit=Installed

Cloudera ODBC Driver for Apache Hive 64-bit=Installed

UsageCount=1

[Cloudera ODBC Driver for Apache Hive 32-bit]

Description=Cloudera ODBC Driver for Apache Hive (32-bit)

Driver=/opt/cloudera/hiveodbc/lib/32/libclouderahiveodbc32.so

UsageCount=1

[Cloudera ODBC Driver for Apache Hive 64-bit]

Description=Cloudera ODBC Driver for Apache Hive (64-bit)

Driver=/opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so

UsageCount=1

 

  1. Check connection using the command:

isql -v -k "Driver={Cloudera ODBC Driver for Apache Hive 64-bit};Host=10.0.2.118;Port=10000;AuthMech=3;UID=root;PWD=hadoop"

DSN-Based Connection Setup

  1. Change the /etc/odbc.ini file according to the guide. The example of file contents:

[ODBC Data Sources]

HiveDSN=Cloudera

[HiveDSN]

Driver=/opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so

HiveServerType=2

ServiceDiscoveryMode=0

Host=10.0.2.118

Port=10000

UID=root

PWD=root

 

  1. Check connection using the command:

isql -v "HiveDSN"

Step 4. Setting Up ODBC Data Source

Before working with Hive it is necessary to set up a data source:

  1. Open the control panel in the client computer, next go to the Administration > ODBC Data Source section.

  2. In the ODBC Data Source Administration dialog box go to the Custom DNS tab. Select the Add item.

  3. In the Create New Data Source dialog box select the Hortonworks Hive ODBC Driver:

  1. The Hortonworks Hive ODBC Driver DNS Setup dialog box opens:

Specify the following settings:

Step 5. Setting Up Database in Foresight Analytics Platform

To set up database in Foresight Analytics Platform:

  1. Create a database in the repository.

  2. Specify the OLE DB(ODBC HIVE) DBMS client type in the database properties.

  3. Specify a name of created data source (Data Source Name) as a server name.

  4. Select automatic connection.

  5. Fill the User Name/Password fields: specify user name of an account created in your configuration of Hadoop/Hive.

Database creation wizard window after specifying required parameters looks as follows:

After executing all configurations a new connection can be used to connect Hive tables to the platform repository and creating queries using the HiveQL language.

Step 6. Setting Up Platform Work with Hive Data Source via JDBC Driver

To connect to Hive data source correctly, set up JDBC driver:

  1. Make sure that BI server is set up and integration with the Python programming language is executed.

  2. Install pip3 with default parameters by executing the command:

sudo apt install python3-pip

  1. Download the jaydebeapi library:

pip3 install jaydebeapi

  1. Execute integration with the Java programming language.

  2. Create the opt/jdbc directory:

sudo mkdir /opt/jdbc

  1. Load the file with the JDBC driver HiveJDBC.jar to the /opt/jdbc folder.

  2. Make the www-data user the owner of the opt/jdbc directory:

sudo chown -R www-data:www-data /opt/jdbc

  1. Open the update file in the update manager.
    The example of the update file: Update_pefx.zip.

The file contains the objects:

  1. Start the Install Callback for Cube module for execution included in the update file. Set the parameters in the module:

Name Constant description
CUBE_ID Cube identifier for data population.
MOD_ID Identifier of the module executed on working with cube.
  1. Set the parameters in the config.ini module included in the update file:

Name Constant description
TABLE_DATA_ID Identifier of the table, from which data should be obtained.
PY_MOD_ID Python module identifier.
LINUX_PATH Path to JDBC driver.
JDBC connection parameters (Hive):
JDBC_CLASS Driver class.
JAR_NAME Name of file with JDBC driver in the operating system.
JDBC_URL Service URL.
DB_USER User name.
DB_PSWD User password.

After executing a sequence of operations, on each opening, data will be loaded to the cube from the Hive data source.

See also:

Connecting to Relational Databases