Never Ending Security

It starts all here

Tag Archives: Hadoop

Hadoop 2.2 Multi Node Cluster Setup for Linux

Hadoop 2.2 Multi Node Cluster Setup


If  you are using putty to access your Linux box remotely, please install openssh  by running this command, this also helps in configuring SSH access easily in the later part of the installation:

sudo apt-get install openssh-server


  1. Installing Java v1.7
  2. Adding dedicated Hadoop system user.
  3. Configuring SSH access.
  4. Disabling IPv6.

Before starting of installing any applications or softwares, please  makes sure your list of packages from all repositories and PPA’s is up to date or if not update them by using this command:

sudo apt-get update

Installing Java v1.7:

For running Hadoop it requires Java v1. 7+

Download Latest oracle Java Linux version of the oracle website by using this command


If it fails to download, please check with this given command which  helps to avoid passing username and password.

wget --no-cookies --no-check-certificate --header "Cookie:" ""

Unpack the compressed Java binaries, in the directory:

sudo tar xvzf jdk-7u25-linux-x64.tar.gz

Create a Java directory using mkdir under /user/local/ and change the directory to /usr/local/Java by using this command

mkdir -R /usr/local/Java
cd /usr/local/Java

Copy the Oracle Java binaries into the /usr/local/Java directory.

sudo cp -r jdk-1.7.0_45 /usr/local/java

Edit the system PATH file /etc/profile and add the following system variables to your system path

sudo nano /etc/profile    or  sudo gedit /etc/profile

Scroll down to the end of the file using your arrow keys and add the following lines below to the end of your /etc/profile file:

export JAVA_HOME
export PATH

Inform your Ubuntu Linux system where your Oracle Java JDK/JRE is located. This will tell the system that the new Oracle Java version is available for use.

sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/local/java/jdk1.7.0_45/bin/javac" 1
sudo update-alternatvie --set javac /usr/local/Java/jdk1.7.0_45/bin/javac

This command notifies the system that Oracle Java JDK is available for use

Reload your system wide PATH /etc/profile by typing the following command:

. /etc/profile

Test to see if Oracle Java was installed correctly on your system.

Java -version

Adding dedicated Hadoop system user.

We will use a dedicated Hadoop user account for running Hadoop. While that’s not required  but it is recommended, because it helps to separate the Hadoop installation from other software applications and user accounts running on the same machine.

a. Adding group:

sudo addgroup Hadoop

b. Creating a user and adding the user to a group:

sudo adduser –ingroup Hadoop hduser

Configuring SSH access:

The need for SSH Key based authentication is required so that the master node can then login to slave nodes (and the secondary node) to start/stop them and also local machine if you want to use Hadoop with it. For our single-node setup of Hadoop, we therefore need to configure SSH access to localhost for the hduser user we created in the previous section.

Before this step you have to make sure that SSH is up and running on your machine and configured it to allow SSH public key authentication.

Generating an SSH key for the hduser user.
a. Login as hduser with sudo
b. Run this Key generation command:

ssh-keyegen -t rsa -P ""

It will ask to provide the file name in which to save the key, just press has entered so that it will generate the key at ‘/home/hduser/ .ssh’

Enable SSH access to your local machine with this newly created key.

cat $HOME/.ssh/ >> $HOME/.ssh/authorized_keys

The final step is to test the SSH setup by connecting to your local machine with the hduser user.

ssh hduser@localhost

This will add localhost permanently to the list of known hosts

Disabling IPv6.

We need to disable IPv6 because Ubuntu is using IP for different Hadoop configurations. You will need to run the following commands using a root account:

sudo gedit /etc/sysctl.conf

Add the following lines to the end of the file and reboot the machine, to update the configurations correctly.

#disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Hadoop installation

Go to Apache Downloads and download Hadoop version 2.2.0 (prefer to download any stable versions)

Run this following command to download Hadoop version 2.2.0


Unpack the compressed hadoop file by using this command:

tar –xvzf hadoop-2.2.0.tar.gz

Move hadoop package of your choice, I picked /opt/hadoop-2.2.0 for my convenience

sudo mv hadoop-2.2.0 /opt/hadoop-2.2.0

Make sure to change the owner of all the files to the hduser user and hadoop group by using this command:

sudo chown -R hduser:hadoop Hadoop

Add the follwing lines into .bashrc file

root@arrakis[~]#cd ~
root@arrakis[~]#vi .bashrc

copy and paste following line at end of the file

export HADOOP_HOME=/opt/hadoop-2.2.0
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

Modify hadoop environment file

Add JAVA_HOME to libexec/ at beginning of the file

root@arrakis[~]#vi /opt/hadoop-2.2.0/libexec/
export JAVA_HOME='/usr/local/Java/jdk1.7.0_45'

Add JAVA_HOME to hadoop/ at beginning of the file

root@arrakis[~]#vi /opt/hadoop-2.2.0/etc/hadoop/
export JAVA_HOME='/usr/local/Java/jdk1.7.0_45'

Check Hadoop installation

root@arrakis[~]#cd /opt/hadoop-2.2.0/bin
root@arrakis[bin]#./hadoop version
Hadoop 2.2.0

At this point Hadoop installed in your node.

Create folder for tmp

root@arrakis[~]#mkdir -p $HADOOP_HOME/tmp

Configuration : Multi-node setup

Add IP address of Master and all Slaves to /etc/hosts – for both Master and all the slave nodes

Add the association between the hostnames and the IP address for the master and the slaves on all the nodes in the /etc/hosts. Make sure that the all the nodes in the cluster are able to ping to each other.

hduser@arrakis:/opt/hadoop-2.2.0/bin#vi /etc/hosts master slave

In our case we only have one slave, if you have more slaves name them as slave1, slave2…

Password-less ssh from master to slave

hduser@arrakis:[~]#ssh-keygen -t rsa -P ""
hduser@arrakis:[~]#ssh-copy-id -i /home/hduser/.ssh/ hduser@slave
root@arrakis[bin]#ssh slave

[Note : If you skip this step, you will  have to provide passwords for all slaves when Master start the process ./start-*.sh . If you have configured multiple slaves repeat the process for every node ]

Add the Slave entries in $HADOOP_CONF_DIR/slaves –  only at Master node

Add all the slave entries in slaves file in Master node.

hduser@arrakis:[~]#vi /opt/hadoop-2.2.0/etc/hadoop/slaves

Note : again – we only have  one slave in this example, if you have more slaves add all the slave hostnames

Hadoop Configuration

– both Master and all the slave

Add the properties in following hadoop configuration file which is availabile under $HADOOP_CONF_DIR


hduser@arrakis[~]#cd /opt/hadoop-2.2.0/etc/hadoop
hduser@arrakis[hadoop]#vi core-site.xml

#Paste following between <configuration> tag



hduser@arrakis[hadoop]#vi hdfs-site.xml

#Paste following between <configuration> tag


Note : Our  replication values is 2 [one master and one slave ]. If you have more slaves put replication value based on that.


hduser@arrakis[hadoop]#vi mapred-site.xml

#Paste following between <configuration> tag



hduser@arakis[hadoop]#vi yarn-site.xml

#Paste following between <configuration> tag

    <name>yarn.nodemanager.aux- services.mapreduce.shuffle.class</name>
    <name>yarn.resourcemanager.resource- tracker.address</name>

Format the namenode – only at Master node

hduser@arrakis:/opt/hadoop-2.2.0/bin#cd /opt/hadoop-2.2.0/bin
hduser@arrakis:/opt/hadoop-2.2.0/bin# ./hadoop namenode -format

Administering Hadoop
– Start & Stop
– Only at Master node

Start the process at Master node – slave nodes will automatically start : to start namenode and datanode

hduser@arrakis:[~]# cd /opt/hadoop-2.2.0/sbin
hduser@arrakis:[sbin]# ./

check Master

 17675 Jps
 17578 SecondaryNameNode
 17409 NameNode

check Slave

 9317 Jps
 9250 DataNode : to start resourcemanager and nodemanager

hduser@arrakis:[sbin]# ./

check Master

 17578 SecondaryNameNode
 17917 ResourceManager
 17409 NameNode
 18153 Jps

check Slave

 9317 Jps
 9250 DataNode
 9357 NodeManager

Working with Hadoop

execute command at master

hduser@arrakis:/opt/hadoop-2.2.0/bin# ./hdfs dfs -mkdir -p /user/hadoop2
hduser@arrakis:/opt/hadoop-2.2.0/bin# ./hdfs dfs -put /root/Desktop/test.html /user/hadoop2
hduser@arrakis:/opt/hadoop-2.2.0/bin# ./hdfs dfs -ls
Found 1 items
-rw-r--r-- 2 root supergroup 225 2013-11-11 20:19 /user/hadoop2/test.html

check slave node

hduser@slave:/opt/hadoop-2.2.0/bin# ./hdfs dfs -ls user/hadoop2/
Found 1 items
-rw-r--r-- 2 root supergroup 225 2013-11-11 20:19 /user/hadoop2/test.html
hduser@slave:/opt/hadoop-2.2.0/bin# /opt/hadoop-2.2.0/bin# ./hdfs dfs -cat /user/hadoop2/test.html
test file. Welcome to Hadoop2.2.0 Installation. !!!!!!!!!!!