How to setup HBase (Part I)

HBase  is used to store (read/write access) big amounts of data in real time.  The goal of this project is hosting very large tables — billions of rows X millions of columns — atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, column-oriented store, modeled after Google’s Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leveraged the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.

Next, it is presented the basic procedure to set up HBase on a pseudo-distributed fashion. The first step is download HBase from the following link: Apache Download Hbase.

For this entry, we are going to work with the stable version, this means HBase-1.0.0

A. HBase setup

First, we unpack the .tar file by means of the following command:

[underclud@localhost Downloads]$ tar xfz hbase-1.0.0-bin.tar.gz

Then, move the extracted folder to /usr/local. Remember changing to the root user in order to have the permission to move the folder.

[underclud@localhost Downloads]$ su -
[root@localhost Downloads]# mv hbase-1.0.0 /usr/loca

Change the ownership of the hbase-1.0.0 folder to the undercloud user:

[root@localhost Downloads]# chown -R undercloud /usr/local/hbase-1.0.0

 

B. Configuring HBase

The next step is to configure HBase to run in pseudo-distributed mode. Pseudo-distributed mode means that HBase still runs completely on a single host, but each HBase daemon (HMaster, HRegionServer, and Zookeeper) runs as a separate process.

Configuring JAVA_HOME environment variable

It is required to set the JAVA_HOME environment variable before starting HBase. HBase provides a central mechanism to configure the JAVA_HOME variable – conf/hbase-env.sh. Edit this file, uncomment the line starting with JAVA_HOME, and set it to the appropriate location for your operating system. The JAVA_HOME variable should be set to a directory which contains the executable file bin/java.

[root@localhost hbase-1.0.0]# exit
logout
[undercloud@localhost Downloads]$ cd /usr/local/hbase-1.0.0
[undercloud@localhost hbase-1.0.0]$ nano conf/hbase-env.sh

Uncomment, the JAVA_HOME and set the java folder location.

export JAVA_HOME=/usr/java/jdk1.7.0_67

Next, it’s necesary to edit the conf/hbase-site.xml file to specify a custom configurations such as, set hbase.rootdir, the directory where HBase writes data to, and hbase.zookeeper.property.dataDir, the director where ZooKeeper writes its data too. Next, we present the content of our conf/hbase-site.xml file.

<configuration>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://localhost:9000/hbase</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/home/undercloud/zookeeper</value>
  </property>
</configuration>

C. Starting HBase

First, it is necessary start the Hadoop Distributed File System (HDFS) and YARN services as follow:

[undercloud@localhost hadoop-2.6.0]$ sbin/start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-undercloud-namenode-localhost.localdomain.out
localhost: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-undercloud-datanode-localhost.localdomain.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-undercloud-secondarynamenode-localhost.localdomain.out
[undercloud@localhost hadoop-2.6.0]$ sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-undercloud-resourcemanager-localhost.localdomain.out
localhost: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-undercloud-nodemanager-localhost.localdomain.out
[undercloud@localhost hadoop-2.6.0]$

Next, we are able to start the HBase by means of the following command:

[undercloud@localhost hadoop-2.6.0]$ cd /usr/local/hbase-1.0.0
[undercloud@localhost hbase-1.0.0]$ bin/start-hbase.sh
localhost: starting zookeeper, logging to /usr/local/hbase-1.0.0/bin/../logs/hbase-undercloud-zookeeper-localhost.localdomain.out
starting master, logging to /usr/local/hbase-1.0.0/bin/../logs/hbase-undercloud-master-localhost.localdomain.out
starting regionserver, logging to /usr/local/hbase-1.0.0/bin/../logs/hbase-undercloud-1-regionserver-localhost.localdomain.out
[undercloud@localhost hbase-1.0.0]$

Now, we can start working with HBase through its shell as follow:

[undercloud@localhost hbase-1.0.0]$ bin/hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hbase-1.0.0/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.0.0, r6c98bff7b719efdb16f71606f3b7d8229445eb81, Sat Feb 14 19:49:22 PST 2015

hbase(main):001:0> list
TABLE                                                                           
0 row(s) in 0.4580 seconds

=> []
hbase(main):002:0>

 D. Stopping HBase

Now, we can stop the HBase service with the command bin/stop-hbase.sh, but first, it is necessary quit of the HBase shell by means of the exit command as follow:

hbase(main):002:0> exit
[undercloud@localhost hbase-1.0.0]$ bin/stop-hbase.sh
stopping hbase.......................
localhost: stopping zookeeper.
[undercloud@localhost hbase-1.0.0]$