Infinite Programming Tips: Running Hadoop in Pseudo Distributed Mode

Sunday, November 27, 2011

Running Hadoop in Pseudo Distributed Mode

This section contains instructions for Hadoop installation on ubuntu. This is Hadoop quickstart tutorial to setup Hadoop quickly. This is shortest tutorial of Hadoop installation, here you will get all the commands and their description required to install Hadoop in Pseudo distributed mode(single node cluster)

COMMAND	DESCRIPTION
sudo apt-get install sun-java6-jdk	Install java
	If you don't have hadoop bundle download here download hadoop
sudo tar xzf file_name.tar.gz	Extract hadoop bundle
Go to your hadoop installation directory(HADOOP_HOME)
vi conf/hadoop-env.sh	Edit configuration file hadoop-env.sh and set JAVA_HOME: export JAVA_HOME=path to be the root of your Java installation(eg: /usr/lib/jvm/java-6-sun)
vi conf/core-site.xml then type: <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>	Edit configuration file core-site.xml
vi conf/hdfs-site.xml then type: <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>	Edit configuration file hdfs-site.xml
vi conf/mapred.xml then type: <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration>	Edit configuration file mapred-site.xml and type:
sudo apt-get install openssh-server openssh-client	install ssh
ssh-keygen -t rsa -P "" cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys ssh localhost	Setting passwordless ssh
bin/hadoop namenode –format	Format the new distributed-filesystem During this operation : Name node get start Name node get formatted Name node get stopped
bin/start-all.sh	Start the hadoop daemons
jps	It should give output like this: 14799 NameNode 14977 SecondaryNameNode 15183 DataNode 15596 JobTracker 15897 TaskTracker
Congratulations Hadoop Setup is Completed
http://localhost:50070/	web based interface for name node
http://localhost:50030/	web based interface for job tracker
Now lets run some examples
bin/hadoop jar hadoop-*-examples.jar pi 10 100	run pi example
bin/hadoop dfs -mkdir input bin/hadoop dfs -put conf input bin/hadoop jar hadoop--examples.jar grep input output 'dfs[a-z.]+' bin/hadoop dfs -cat output/	run grep example
bin/hadoop dfs -mkdir inputwords bin/hadoop dfs -put conf inputwords bin/hadoop jar hadoop--examples.jar wordcount inputwords outputwords bin/hadoop dfs -cat outputwords/	run wordcount example

bin/stop-all.sh	Stop the hadoop daemons

Infinite Programming Tips

Sunday, November 27, 2011

Running Hadoop in Pseudo Distributed Mode

No comments:

Post a Comment

Featured Posts

Run Commands for Windows