Introduction
This documentation should get you up and running with a full pseudo distributed Hadoop/HBase installation in an Ubuntu VM quickly. I use Ubuntu because the Debian Package Management (apt) is by far the best way to install software on a machine. It is possible to also use this on regular hardware as well.
The reason why you will need this is because much of the existing documentation is spread around quite a few different locations. Thus, I've already done the work of digging this information out so that you don't have to.
This documentation is intended to be read and used from top to bottom. Before you do an initial install, I suggest you read through it once first.
Reference Manuals
Create the virtual machine
The first thing that you will want to do is download a copy of the Ubuntu Server 10.04 64bit ISO image. This version is the current Long Term Support (LTS) version. These instructions may work with a newer version, but I'm suggesting the LTS because that is what I test with and also what your operations team will most likely want to install into production. Once you have the ISO, create a new virtual machine using your favorite VM manager (I like vmware fusion on my Mac).
Unix Box Setup
Once you have logged into the box, we need to setup some resources...
You can now use ifconfig -a to find out the IP address of the virtual machine and log into it via ssh. You will want to execute most of the commands below as root.
LZO Compression
This setup provides LZO compression for your data in HBase which greatly reduces the amount of data which is stored on disk. Sadly, LZO is under the GPL license, so it can't be distributed with Apache. Therefore, I'm providing a nice debian that I got ahold of for you to use. On your vm:
Hadoop / HDFS
Install some packages:
Edit some files:
/etc/hadoop/conf/hdfs-site.xml
/etc/hadoop/conf/core-site.xml
/etc/hadoop/conf/mapred-site.xml
ZooKeeper
/etc/zookeeper/zoo.cfg
HDFS/HBase Setup
Make an /hbase folder in hdfs
HBase Installation
/etc/hbase/conf/hbase-site.xml
/etc/hbase/conf/hbase-env.sh
/etc/hadoop/conf/hadoop-env.sh
Now, restart the master and start the region server:
Starting/Stopping everything
Start
Hbase Shell
Ports
To ensure that everything is working correctly, visit your VM's ip address with these ports on the end of a http url.
This documentation should get you up and running with a full pseudo distributed Hadoop/HBase installation in an Ubuntu VM quickly. I use Ubuntu because the Debian Package Management (apt) is by far the best way to install software on a machine. It is possible to also use this on regular hardware as well.
The reason why you will need this is because much of the existing documentation is spread around quite a few different locations. Thus, I've already done the work of digging this information out so that you don't have to.
This documentation is intended to be read and used from top to bottom. Before you do an initial install, I suggest you read through it once first.
Reference Manuals
- https://ccp.cloudera.com/display/CDHDOC/CDH3+Installation
- https://ccp.cloudera.com/display/CDHDOC/CDH3+Deployment+in+Pseudo-Distributed+Mode
- https://ccp.cloudera.com/display/CDHDOC/ZooKeeper+Installation
- https://ccp.cloudera.com/display/CDHDOC/HBase+Installation
- http://hadoop.apache.org/common/docs/r0.20.2/quickstart.html#PseudoDistributed
- http://hbase.apache.org/book.html
- http://hbase.apache.org/pseudo-distributed.html
Create the virtual machine
The first thing that you will want to do is download a copy of the Ubuntu Server 10.04 64bit ISO image. This version is the current Long Term Support (LTS) version. These instructions may work with a newer version, but I'm suggesting the LTS because that is what I test with and also what your operations team will most likely want to install into production. Once you have the ISO, create a new virtual machine using your favorite VM manager (I like vmware fusion on my Mac).
Unix Box Setup
Once you have logged into the box, we need to setup some resources...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| echo "deb http://archive.cloudera.com/debian lucid-cdh3 contrib" >> /etc/apt/sources.list.d/cloudera.listecho "deb-src http://archive.cloudera.com/debian lucid-cdh3 contrib" >> /etc/apt/sources.list.d/cloudera.listecho "sun-java6-bin shared/accepted-sun-dlj-v1-1 boolean true" | debconf-set-selectionsecho "hdfs - nofile 32768" >> /etc/security/limits.confecho "hbase - nofile 32768" >> /etc/security/limits.confecho "hdfs soft/hard nproc 32000" >> /etc/security/limits.confecho "hbase soft/hard nproc 32000" >> /etc/security/limits.confecho "session required pam_limits.so" >> /etc/pam.d/common-sessionaptitude install curl wgetcurl -s http://archive.cloudera.com/debian/archive.key | sudo apt-key add -aptitude updateaptitude install openssh-server ntpaptitude install sun-java6-jdkaptitude safe-upgradereboot now |
You can now use ifconfig -a to find out the IP address of the virtual machine and log into it via ssh. You will want to execute most of the commands below as root.
LZO Compression
This setup provides LZO compression for your data in HBase which greatly reduces the amount of data which is stored on disk. Sadly, LZO is under the GPL license, so it can't be distributed with Apache. Therefore, I'm providing a nice debian that I got ahold of for you to use. On your vm:
1
2
| dpkg -i Cloudera-hadoop-lzo_20110510102012.2bd0d5b-1_amd64.deb |
Hadoop / HDFS
Install some packages:
1
2
3
| apt-get install hadoop-0.20apt-get install hadoop-0.20-namenode hadoop-0.20-datanode hadoop-0.20-jobtracker hadoop-0.20-tasktrackerapt-get install hadoop-0.20-conf-pseudo |
Edit some files:
/etc/hadoop/conf/hdfs-site.xml
1
2
3
4
| <property> <name>dfs.datanode.max.xcievers</name> <value>4096</value></property> |
1
2
3
4
5
6
7
8
9
| <property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value></property> <property> <name>io.compression.codec.lzo.class</name> <value>com.hadoop.compression.lzo.LzoCodec</value></property> |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
| <property> <name>mapred.compress.map.output</name> <value>true</value> </property> <property> <name>mapred.map.output.compression.codec</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property> <property> <name>mapred.child.ulimit</name> <value>1835008</value> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>2</value> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>2</value> </property> |
ZooKeeper
1
| apt-get install hadoop-zookeeper-server |
1
2
| Change localhost to 127.0.0.1 Add: maxClientCnxns=0 |
1
| service hadoop-zookeeper-server restart |
HDFS/HBase Setup
Make an /hbase folder in hdfs
1
2
3
4
| sudo -u hdfs hadoop fs -mkdir /hbasesudo -u hdfs hadoop fs -chown hbase /hbaseNOTE: If you want to delete an existing hbase folder, first stop hbase!sudo -u hdfs hadoop fs -rmr -skipTrash /hbase |
HBase Installation
1
2
| apt-get install hadoop-hbaseapt-get install hadoop-hbase-master |
/etc/hbase/conf/hbase-site.xml
1
2
3
4
5
6
7
8
| <property> <name>hbase.cluster.distributed</name> <value>true</value></property><property> <name>hbase.rootdir</name></property> |
/etc/hbase/conf/hbase-env.sh
1
2
3
| export HBASE_CLASSPATH=`ls /usr/lib/hadoop/lib/cloudera-hadoop-lzo-*.jar`export HBASE_MANAGES_ZK=falseexport HBASE_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-amd64-64 |
/etc/hadoop/conf/hadoop-env.sh
1
| export HADOOP_CLASSPATH="$HADOOP_CLASSPATH":`hbase classpath` |
Now, restart the master and start the region server:
1
2
| service hadoop-hbase-master restartapt-get install hadoop-hbase-regionserver |
Starting/Stopping everything
Start
- service hadoop-zookeeper-server start
- for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done
- service hadoop-hbase-master start
- service hadoop-hbase-regionserver start
- service hadoop-hbase-regionserver stop
- service hadoop-hbase-master stop
- for service in /etc/init.d/hadoop-0.20-*; do sudo $service stop; done
- service hadoop-zookeeper-server stop
Hbase Shell
1
2
| su - hbasehbase shell |
Ports
To ensure that everything is working correctly, visit your VM's ip address with these ports on the end of a http url.
- HDFS: 50070
- JobTracker: 50030
- TaskTracker: 50060
- Hbase Master: 60010
- Hbase RegionServer: 60030
No comments:
Post a Comment
Thank you for Commenting Will reply soon ......