How to install Hadoop 2.3.0 (YARN)

This entry presents the procedure to set up Hadoop 2.3.0 (YARN). Apache Hadoop 2.3.0 includes a lot of improvements over previous releases. In fact, Hadoop 2.3.0 or also known as Hadoop YARN is a breakthrough in Hadoop architecture because this provides a more general processing platform beyond MapReduce. The main feature of Hadoop YARN is that the management of resources is not focused only on MapReduce, but now is oriented to run multiple type of applications in Hadoop.

You can find more information about the new Hadoop 2.3.0 architecture in the following links:

Official website of Apache Hadoop 2.3.0

Hadoop YARN – Hortonworks
MapReduce 2.0 in Apache Hadoop 0.23 – Cloudera

Note: The following procedure is an update to Hadoop 2.6.0 version (January 2015)

The software used to set up Apache Hadoop 2.6.0 is:

CenOS 7 x86_64
Java JDK 7 update 67
Hadoop Apache 2.6.0

1 Installing JDK 7 on CenOS

1.1 First step is to change to root user

[undercloud@localhost ~]$ su -
Password: 

1.2 Install Java JDK 7 package

[root@localhost Downloads]# rpm -Uvh jdk-7u67-linux-x64.rpm

1.3 Install JDK java, javaws, libjavaplugin.so and javac with alternatives –install command

[root@localhost Downloads]# alternatives --install /usr/bin/java java /usr/java/latest/jre/bin/java 200000
[root@localhost Downloads]# alternatives --install /usr/bin/javaws javaws /usr/java/latest/jre/bin/javaws 200000
[root@localhost Downloads]# alternatives --install /usr/lib64/mozilla/plugins/libjavaplugin.so libjavaplugin.so.x86_64 /usr/java/latest/jre/lib/amd64/libnpjp2.so 200000
[root@localhost Downloads]# alternatives --install /usr/bin/javac javac /usr/java/latest/bin/javac 200000
[root@localhost Downloads]# alternatives --install /usr/bin/jar jar /usr/java/latest/bin/jar 200000
[root@localhost Downloads]#

1.4 Use Java JDK absolute version (/usr/java/jdk1.7.0_67)

[root@localhost Downloads]# alternatives --install /usr/bin/java java /usr/java/jdk1.7.0_67/jre/bin/java 200000
[root@localhost Downloads]# alternatives --install /usr/bin/javaws javaws /usr/java/jdk1.7.0_67/jre/bin/javaws 200000
[root@localhost Downloads]# alternatives --install /usr/lib64/mozilla/plugins/libjavaplugin.so libjavaplugin.so.x86_64 /usr/java/jdk1.7.0_67/jre/lib/amd64/libnpjp2.so 200000
[root@localhost Downloads]# alternatives --install /usr/bin/javac javac /usr/java/jdk1.7.0_67/bin/javac 200000
[root@localhost Downloads]# alternatives --install /usr/bin/jar jar /usr/java/jdk1.7.0_67/bin/jar 200000
[root@localhost Downloads]#

1.5 Check java version

 
[root@localhost Downloads]# java -version
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
[root@localhost Downloads]#

1.6 Finally, add JAVA_HOME environment variable to /etc/profile file or $HOME/.bash_profile file

[root@localhost Downloads]# export JAVA_HOME="/usr/java/latest"
[root@localhost Downloads]# export JAVA_HOME="/usr/java/jdk1.7.0_67"
[root@localhost Downloads]# export JAVA_HOME="/usr/java/jre1.7.0_67"
[root@localhost Downloads]#

2 Installing OpenSSH-Server

2.1 Install OpenSSH-Server

[root@localhost Downloads]# yum install openssh-server

2.2 Verify the service statud of OpenSSH-Server

[root@localhost Downloads]# service sshd status
sshd.service - OpenSSH server daemon
Loaded: loaded (/usr/lib/systemd/system/sshd.service; disabled)
Active: inactive (dead)
[root@localhost Downloads]#

2.3 If this is stoped then start this by means of the follow command

[root@localhost Downloads]# service sshd start
Redirecting to /bin/systemctl start sshd.service
[root@localhost Downloads]# service sshd status
Redirecting to /bin/systemctl status sshd.service
sshd.service - OpenSSH server daemon
 Loaded: loaded (/usr/lib/systemd/system/sshd.service; disabled)
 Active: active (running) since Wed 2015-01-21 12:02:30 CST; 25s ago
 Process: 5464 ExecStartPre=/usr/sbin/sshd-keygen (code=exited, status=0/SUCCESS)
 Main PID: 5466 (sshd)
 CGroup: /system.slice/sshd.service
 └─5466 /usr/sbin/sshd -D

Jan 21 12:02:30 localhost.localdomain systemd[1]: Started OpenSSH server daemon.
Jan 21 12:02:31 localhost.localdomain sshd[5466]: Server listening on 0.0.0.0...
Jan 21 12:02:31 localhost.localdomain sshd[5466]: Server listening on :: port...
Hint: Some lines were ellipsized, use -l to show in full.
[root@localhost Downloads]#

2.4 Sets the service to always start when booting the system

[root@localhost Downloads]#  systemctl enable sshd.service
ln -s '/usr/lib/systemd/system/sshd.service' '/etc/systemd/system/multi-user.target.wants/sshd.service'
[root@localhost Downloads]#

2.5 SSH configuration without passphrase request. First, be sure you are NOT logged as root user

[root@localhost Downloads]# exit
logout
[undercloud@localhost ~]$ 

2.6 Then, generate a DSA key by means of the following command:

[undercloud@localhost ~]$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Created directory '/home/undercloud/.ssh'.
Your identification has been saved in /home/undercloud/.ssh/id_dsa.
Your public key has been saved in /home/undercloud/.ssh/id_dsa.pub.
The key fingerprint is:
f5:e4:fd:1b:f4:dc:04:5f:e4:64:94:0b:99:13:0d:bb undercloud@localhost.localdomain
The key's randomart image is:
+--[ DSA 1024]----+
|             o*.*|
|             =.B |
|          . ..+ +|
|         . + ..+.|
|        S   oE..o|
|              .+o|
|               .=|
|                o|
|               . |
+-----------------+
[undercloud@localhost ~]$ 

2.7 Configure the SSH service without passphrase by means of the following commands:

[undercloud@localhost ~]$ chmod 755 ~/.ssh
[undercloud@localhost ~]$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
[undercloud@localhost ~]$ chmod 644 ~/.ssh/authorized_keys
[undercloud@localhost ~]$ ssh-add
Identity added: /home/undercloud/.ssh/id_dsa (/home/undercloud/.ssh/id_dsa)
[undercloud@localhost ~]$

2.8 You can verify the SSH conexion without passphrase as follow:

[undercloud@localhost ~]$ ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
RSA key fingerprint is 48:b3:71:8b:c9:f8:1c:16:8c:64:8a:b0:1a:35:42:2f.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
Last login: Sat Jan 18 02:06:53 2015 from 192.168.1.114
[undercloud@localhost ~]$ exit
logout
Connection to localhost closed.
[undercloud@localhost ~]$ ssh localhost
Last login: Wed Jan 21 11:28:56 2015
[undercloud@localhost ~]$ 

3 Installing Hadoop 2.6.0 (YARN)

3.1 Extract the content from hadoop-2.6.0.tar.gz file and move the extracted folder to the /usr/local directory

[undercloud@localhost Downloads]$ tar vxzf hadoop-2.6.0.tar.gz
[undercloud@localhost Downloads]$ su -
Password: 
[root@localhost ~]# mv /home/undercloud/Downloads/hadoop-2.6.0 /usr/local
[root@localhost ~]#

3.2 Change the ownership of the hadoop-2.6.0 directory to the undercloud user

[root@localhost ~]# chown -R undercloud /usr/local/hadoop-2.3.0
[root@localhost ~]#exit
[undercloud@localhost Downloads]$ 

3.3 Set up Hadoop environment variables, adding the following variables at the end of the .bashrc file

#Hadoop variables
export JAVA_HOME=/usr/java/jdk1.7.0_67
export HADOOP_INSTALL=/usr/local/hadoop-2.6.0
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export HADOOP_YARN_HOME=$HADOOP_INSTALL

3.4 Modify the JAVA_HOME variable into /usr/local/hadoop-2.6.0/etc/hadoop/hadoop-env.sh file

export JAVA_HOME=/usr/java/jdk1.7.0_67

4 Configure Hadoop 2.6.0

3.1 First step is to configure the core-site.xml file locate in /usr/local/hadoop-2.6.0/etc/hadoop

<configuration>
   <property>
      <name>fs.default.name</name>
      <value>hdfs://localhost:9000</value>
   </property> 
</configuration>

3.2 Second, is to configure the yarn-site.xml file as follow

<configuration>
   <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
   </property>
   <property>
      <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
      <value>org.apache.hadoop.mapred.ShuffleHandler</value>
   </property>
</configuration>

3.3 Third, move the mapred-site.xml.template file to mapred-site.xml and the edit the content as follow

<configuration>
   <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
   </property>
</configuration>

3.4 Create folders where NameNode and DataNode data will be stored

[undercloud@localhost hadoop]$ cd ~
[undercloud@localhost ~]$ mkdir -p hadoopData/hdfs/namenode
[undercloud@localhost ~]$ mkdir -p hadoopData/hdfs/datanode
[undercloud@localhost ~]$

3.5 Configure the hdfs-site.xml file

<configuration>
   <property>
      <name>dfs.replication</name>
      <value>1</value>
   </property>
   <property>
      <name>dfs.namenode.name.dir</name>
      <value>file:/home/undercloud/hadoopData/hdfs/namenode</value>
   </property>
   <property>
      <name>dfs.datanode.data.dir</name>
      <value>file:/home/undercloud/hadoopData/hdfs/datanode</value>
   </property>
</configuration>

4 Starting Hadoop YARN services

4.1 First, it is necessary format the HDFS namenode as usual

[undercloud@localhost hadoop]$ cd /usr/local/hadoop-2.6.0
[undercloud@localhost hadoop-2.6.0]$ bin/hadoop namenode -format

4.2 Start the HDFS services

[undercloud@localhost hadoop-2.6.0]$  sbin/start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-undercloud-namenode-localhost.localdomain.out
localhost: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-undercloud-datanode-localhost.localdomain.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-undercloud-secondarynamenode-localhost.localdomain.out
[undercloud@localhost hadoop-2.6.0]$

4.3 Start the YARN services

[undercloud@localhost hadoop-2.6.0]$ sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-bautista-resourcemanager-localhost.localdomain.out
localhost: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-bautista-nodemanager-localhost.localdomain.out
[undercloud@localhost hadoop-2.6.0]$ 

4.4 You can see the running services by means of the jps command

[undercloud@localhost bin]$ cd /usr/local/hadoop-2.6.0
[undercloud@localhost hadoop-2.6.0]$ /usr/java/default/bin/jps
5485 NameNode
5660 ResourceManager
6101 Jps
5568 DataNode
5986 JobHistoryServer
[undercloud@localhost hadoop-2.6.0]$

4.5 You can see the resource manager web page by means of the address http://localhost:8088

Hadoop-2.3.0_running

4.6 Also you can see the NameNode Overview in the address http://localhost:50070
Screenshot-Namenode_running

4.7 Finnaly, you should be up and running. You can run the pi example as follow:

[undercloud@localhost hadoop-2.6.0]$ bin/hadoop jar $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 2 5
Number of Maps  = 2
Samples per Map = 5
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /usr/local/hadoop-2.6.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'.
14/05/10 14:42:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Starting Job
14/05/10 14:42:30 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/05/10 14:42:31 INFO input.FileInputFormat: Total input paths to process : 2
14/05/10 14:42:31 INFO mapreduce.JobSubmitter: number of splits:2
14/05/10 14:42:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1399745002298_0003
14/05/10 14:42:32 INFO impl.YarnClientImpl: Submitted application application_1399745002298_0003
14/05/10 14:42:32 INFO mapreduce.Job: The url to track the job: http://localhost.localdomain:8088/proxy/application_1399745002298_0003/
14/05/10 14:42:32 INFO mapreduce.Job: Running job: job_1399745002298_0003