How to install Hadoop 2.3.0 (YARN)

This entry presents the procedure to set up Hadoop 2.3.0 (YARN). Apache Hadoop 2.3.0 includes a lot of improvements over previous releases. In fact, Hadoop 2.3.0 or also known as Hadoop YARN is a breakthrough in Hadoop architecture because this provides a more general processing platform beyond MapReduce. The main feature of Hadoop YARN is that the management of resources is not focused only on MapReduce, but now is oriented to run multiple type of applications in Hadoop.

You can find more information about the new Hadoop 2.3.0 architecture in the following links:

Official website of Apache Hadoop 2.3.0

Hadoop YARN – Hortonworks
MapReduce 2.0 in Apache Hadoop 0.23 – Cloudera

The software used to set up Apache Hadoop 2.3.0 is:

CenOS 6.5 x86_64
Java JDK 7 update 55
Hadoop Apache 2.3.0

1 Installing JDK 7 on CenOS

1.1 First step is to change to root user

[undercloud@localhost ~]$ su -

1.2 Install Java JDK 7 package

[root@localhost Downloads]# rpm -Uvh jdk-7u55-linux-x64.rpm

1.3 Install JDK java, javaws, and javac with alternatives –install command

[root@localhost Downloads]# alternatives --install /usr/bin/java java /usr/java/latest/jre/bin/java 200000
[root@localhost Downloads]# alternatives --install /usr/bin/javaws javaws /usr/java/latest/jre/bin/javaws 200000
[root@localhost Downloads]# alternatives --install /usr/lib64/mozilla/plugins/ /usr/java/latest/jre/lib/amd64/ 200000
[root@localhost Downloads]# alternatives --install /usr/bin/javac javac /usr/java/latest/bin/javac 200000
[root@localhost Downloads]# alternatives --install /usr/bin/jar jar /usr/java/latest/bin/jar 200000
[root@localhost Downloads]#

1.4 Use Java JDK absolute version (/usr/java/jdk1.7.0_55)

[root@localhost Downloads]# alternatives --install /usr/bin/java java /usr/java/jdk1.7.0_55/jre/bin/java 200000
[root@localhost Downloads]# alternatives --install /usr/bin/javaws javaws /usr/java/jdk1.7.0_55/jre/bin/javaws 200000
[root@localhost Downloads]# alternatives --install /usr/lib64/mozilla/plugins/ /usr/java/jdk1.7.0_55/jre/lib/amd64/ 200000
[root@localhost Downloads]# alternatives --install /usr/bin/javac javac /usr/java/jdk1.7.0_55/bin/javac 200000
[root@localhost Downloads]# alternatives --install /usr/bin/jar jar /usr/java/jdk1.7.0_55/bin/jar 200000
[root@localhost Downloads]#

1.5 Check java version

[root@localhost Downloads]# java -version
java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)
[root@localhost Downloads]#

1.6 Finally, add JAVA_HOME environment variable to /etc/profile file or $HOME/.bash_profile file

[root@localhost Downloads]# export JAVA_HOME="/usr/java/latest"
[root@localhost Downloads]# export JAVA_HOME="/usr/java/jdk1.7.0_55"
[root@localhost Downloads]# export JAVA_HOME="/usr/java/jre1.7.0_55"
[root@localhost Downloads]#

2 Installing OpenSSH-Server

2.1 Install OpenSSH-Server

[root@localhost Downloads]# yum install openssh-server

2.2 Verify the service statud of OpenSSH-Server

[root@localhost Downloads]# service sshd status
openssh-daemon is stopped
[root@localhost Downloads]#

2.3 If this is stoped then start this by means of the follow command

[root@localhost Downloads]# service sshd start
Generating SSH1 RSA host key:                              [  OK  ]
Generating SSH2 RSA host key:                              [  OK  ]
Generating SSH2 DSA host key:                              [  OK  ]
Starting sshd:                                             [  OK  ]
[root@localhost Downloads]#

2.4 Sets the service to always start when booting the system

[root@localhost Downloads]# chkconfig --list sshd
sshd           	0:off	1:off	2:off	3:off	4:off	5:off	6:off
[root@localhost Downloads]# chkconfig --level 345 sshd on
[root@localhost Downloads]# chkconfig --list sshd
sshd           	0:off	1:off	2:off	3:on	4:on	5:on	6:off
[root@localhost Downloads]#

2.5 SSH configuration without passphrase request. First, be sure you are NOT logged as root user

[root@localhost Downloads]# exit
[undercloud@localhost ~]$ 

2.6 Then, generate a DSA key by means of the following command:

[undercloud@localhost ~]$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Created directory '/home/undercloud/.ssh'.
Your identification has been saved in /home/undercloud/.ssh/id_dsa.
Your public key has been saved in /home/undercloud/.ssh/
The key fingerprint is:
f5:e4:fd:1b:f4:dc:04:5f:e4:64:94:0b:99:13:0d:bb undercloud@localhost.localdomain
The key's randomart image is:
+--[ DSA 1024]----+
|             o*.*|
|             =.B |
|          . ..+ +|
|         . + ..+.|
|        S   oE..o|
|              .+o|
|               .=|
|                o|
|               . |
[undercloud@localhost ~]$ 

2.7 Configure the SSH service without passphrase by means of the following commands:

[undercloud@localhost ~]$ chmod 755 ~/.ssh
[undercloud@localhost ~]$ cat ~/.ssh/ >> ~/.ssh/authorized_keys
[undercloud@localhost ~]$ chmod 644 ~/.ssh/authorized_keys
[undercloud@localhost ~]$ ssh-add
Identity added: /home/undercloud/.ssh/id_dsa (/home/undercloud/.ssh/id_dsa)
[undercloud@localhost ~]$

2.8 You can verify the SSH conexion without passphrase as follow:

[undercloud@localhost ~]$ ssh localhost
The authenticity of host 'localhost (' can't be established.
RSA key fingerprint is 48:b3:71:8b:c9:f8:1c:16:8c:64:8a:b0:1a:35:42:2f.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
Last login: Sat May 10 02:06:53 2014 from
[undercloud@localhost ~]$ exit
Connection to localhost closed.
[undercloud@localhost ~]$ ssh localhost
Last login: Sat May 10 02:27:10 2014 from localhost.localdomain
[undercloud@localhost ~]$ 

3 Installing Hadoop 2.3.0 (YARN)

3.1 Extract the content from hadoop-2.3.0.tar.gz file and move the extracted folder to the /usr/local directory

[undercloud@localhost Downloads]$ tar vxzf hadoop-2.3.0.tar.gz
[undercloud@localhost Downloads]$ su -
[root@localhost ~]# mv /home/undercloud/Downloads/hadoop-2.3.0 /usr/local
[root@localhost ~]#

3.2 Change the ownership of the hadoop-2.3.0 directory to the undercloud user

[root@localhost ~]# chown -R undercloud /usr/local/hadoop-2.3.0
[root@localhost ~]#exit
[undercloud@localhost Downloads]$ 

3.3 Set up Hadoop environment variables, adding the following variables at the end of the .bashrc file

#Hadoop variables
export JAVA_HOME=/usr/java/jdk1.7.0_55
export HADOOP_INSTALL=/usr/local/hadoop-2.3.0

3.4 Modify the JAVA_HOME variable into /usr/local/hadoop-2.3.0/etc/hadoop/ file

export JAVA_HOME=/usr/java/jdk1.7.0_55

4 Configure Hadoop 2.3.0

3.1 First step is to configure the core-site.xml file locate in /usr/local/hadoop-2.3.0/etc/hadoop

3.2 Second, is to configure the yarn-site.xml file as follow

		shuffle service that needs to be set for Map Reduce to run 

3.3 Third, move the mapred-site.xml.template file to mapred-site.xml and the edit the content as follow

3.4 Create folders where NameNode and DataNode data will be stored

[undercloud@localhost hadoop]$ cd ~
[undercloud@localhost ~]$ mkdir -p hadoopData/hdfs/namenode
[undercloud@localhost ~]$ mkdir -p hadoopData/hdfs/datanode
[undercloud@localhost ~]$

3.5 Configure the hdfs-site.xml file


4 Starting Hadoop YARN services

4.1 First, it is necessary format the HDFS namenode as usual

[undercloud@localhost hadoop]$ cd /usr/local/hadoop-2.3.0
[undercloud@localhost hadoop-2.3.0]$ bin/hadoop namenode -format

4.2 Start the HDFS services

[undercloud@localhost hadoop-2.3.0]$ sbin/ start namenode
starting namenode, logging to /usr/local/hadoop-2.3.0/logs/hadoop-undercloud-namenode-localhost.localdomain.out
[undercloud@localhost hadoop-2.3.0]$ sbin/ start datanode
starting datanode, logging to /usr/local/hadoop-2.3.0/logs/hadoop-undercloud-datanode-localhost.localdomain.out
[undercloud@localhost hadoop-2.3.0]$

4.3 Start the YARN services

[undercloud@localhost hadoop-2.3.0]$ sbin/ start resourcemanager
starting resourcemanager, logging to /usr/local/hadoop-2.3.0/logs/yarn-undercloud-resourcemanager-localhost.localdomain.out
[undercloud@localhost hadoop-2.3.0]$ sbin/ start nodemanager
starting nodemanager, logging to /usr/local/hadoop-2.3.0/logs/yarn-undercloud-nodemanager-localhost.localdomain.out
[undercloud@localhost hadoop-2.3.0]$ sbin/ start historyserver
starting historyserver, logging to /usr/local/hadoop-2.3.0/logs/mapred-undercloud-historyserver-localhost.localdomain.out
[undercloud@localhost hadoop-2.3.0]$ 

4.4 You can see the running services by means of the jps command

[undercloud@localhost bin]$ cd /usr/local/hadoop-2.3.0
[undercloud@localhost hadoop-2.3.0]$ /usr/java/default/bin/jps
5485 NameNode
5660 ResourceManager
6101 Jps
5568 DataNode
5986 JobHistoryServer
[undercloud@localhost hadoop-2.3.0]$

4.5 You can see the resource manager web page by means of the address http://localhost:8088


4.6 Also you can see the NameNode Overview in the address http://localhost:50070

4.7 Finnaly, you should be up and running. You can run the pi example as follow:

[undercloud@localhost hadoop-2.3.0]$ bin/hadoop jar $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar pi 2 5
Number of Maps  = 2
Samples per Map = 5
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /usr/local/hadoop-2.3.0/lib/native/ which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'.
14/05/10 14:42:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Starting Job
14/05/10 14:42:30 INFO client.RMProxy: Connecting to ResourceManager at /
14/05/10 14:42:31 INFO input.FileInputFormat: Total input paths to process : 2
14/05/10 14:42:31 INFO mapreduce.JobSubmitter: number of splits:2
14/05/10 14:42:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1399745002298_0003
14/05/10 14:42:32 INFO impl.YarnClientImpl: Submitted application application_1399745002298_0003
14/05/10 14:42:32 INFO mapreduce.Job: The url to track the job: http://localhost.localdomain:8088/proxy/application_1399745002298_0003/
14/05/10 14:42:32 INFO mapreduce.Job: Running job: job_1399745002298_0003