HBase is used to store (read/write access) big amounts of data in real time. The goal of this project is hosting very large tables — billions of rows X millions of columns — atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, column-oriented store, modeled after Google’s Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leveraged the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.
Next, it is presented the basic procedure to set up HBase on a pseudo-distributed fashion. The first step is download HBase from the following link: Apache Download Hbase.
For this entry, we are going to work with the stable version, this means HBase-1.0.0
A. HBase setup
First, we unpack the .tar file by means of the following command:
[underclud@localhost Downloads]$ tar xfz hbase-1.0.0-bin.tar.gz
Then, move the extracted folder to /usr/local. Remember changing to the root user in order to have the permission to move the folder.
[underclud@localhost Downloads]$ su - [root@localhost Downloads]# mv hbase-1.0.0 /usr/loca
Change the ownership of the hbase-1.0.0 folder to the undercloud user:
[root@localhost Downloads]# chown -R undercloud /usr/local/hbase-1.0.0
B. Configuring HBase
The next step is to configure HBase to run in pseudo-distributed mode. Pseudo-distributed mode means that HBase still runs completely on a single host, but each HBase daemon (HMaster, HRegionServer, and Zookeeper) runs as a separate process.
Configuring JAVA_HOME environment variable
It is required to set the JAVA_HOME environment variable before starting HBase. HBase provides a central mechanism to configure the JAVA_HOME variable – conf/hbase-env.sh. Edit this file, uncomment the line starting with JAVA_HOME, and set it to the appropriate location for your operating system. The JAVA_HOME variable should be set to a directory which contains the executable file bin/java.
[root@localhost hbase-1.0.0]# exit logout [undercloud@localhost Downloads]$ cd /usr/local/hbase-1.0.0 [undercloud@localhost hbase-1.0.0]$ nano conf/hbase-env.sh
Uncomment, the JAVA_HOME and set the java folder location.
Next, it’s necesary to edit the conf/hbase-site.xml file to specify a custom configurations such as, set hbase.rootdir, the directory where HBase writes data to, and hbase.zookeeper.property.dataDir, the director where ZooKeeper writes its data too. Next, we present the content of our conf/hbase-site.xml file.
<configuration> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.rootdir</name> <value>hdfs://localhost:9000/hbase</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/home/undercloud/zookeeper</value> </property> </configuration>
C. Starting HBase
First, it is necessary start the Hadoop Distributed File System (HDFS) and YARN services as follow:
[undercloud@localhost hadoop-2.6.0]$ sbin/start-dfs.sh Starting namenodes on [localhost] localhost: starting namenode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-undercloud-namenode-localhost.localdomain.out localhost: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-undercloud-datanode-localhost.localdomain.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-undercloud-secondarynamenode-localhost.localdomain.out [undercloud@localhost hadoop-2.6.0]$ sbin/start-yarn.sh starting yarn daemons starting resourcemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-undercloud-resourcemanager-localhost.localdomain.out localhost: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-undercloud-nodemanager-localhost.localdomain.out [undercloud@localhost hadoop-2.6.0]$
Next, we are able to start the HBase by means of the following command:
[undercloud@localhost hadoop-2.6.0]$ cd /usr/local/hbase-1.0.0 [undercloud@localhost hbase-1.0.0]$ bin/start-hbase.sh localhost: starting zookeeper, logging to /usr/local/hbase-1.0.0/bin/../logs/hbase-undercloud-zookeeper-localhost.localdomain.out starting master, logging to /usr/local/hbase-1.0.0/bin/../logs/hbase-undercloud-master-localhost.localdomain.out starting regionserver, logging to /usr/local/hbase-1.0.0/bin/../logs/hbase-undercloud-1-regionserver-localhost.localdomain.out [undercloud@localhost hbase-1.0.0]$
Now, we can start working with HBase through its shell as follow:
[undercloud@localhost hbase-1.0.0]$ bin/hbase shell SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hbase-1.0.0/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.0.0, r6c98bff7b719efdb16f71606f3b7d8229445eb81, Sat Feb 14 19:49:22 PST 2015 hbase(main):001:0> list TABLE 0 row(s) in 0.4580 seconds =>  hbase(main):002:0>
D. Stopping HBase
Now, we can stop the HBase service with the command bin/stop-hbase.sh, but first, it is necessary quit of the HBase shell by means of the exit command as follow:
hbase(main):002:0> exit [undercloud@localhost hbase-1.0.0]$ bin/stop-hbase.sh stopping hbase....................... localhost: stopping zookeeper. [undercloud@localhost hbase-1.0.0]$