Thursday, March 28, 2013

WebHDFS REST API

The HTTP REST API supports the complete FileSystem interface for HDFS.

Operations

For More Please Visit WebHDFS



Jobtracker API error - Call to localhost/127.0.0.1:50030 failed on local exception: java.io.EOFException

Try the port number listed in your $HADOOP_HOME/conf/mapred-site.xml under the mapred.job.tracker property. Here's my pseudo mapred-site.xml conf

<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>

If you look at the JobTracker.getAddress(Configuration) method, you can see it uses this property if you don't explicitly specify the jobtracker host / port:

public static InetSocketAddress getAddress(Configuration conf) {
String jobTrackerStr =
conf.get("mapred.job.tracker", "localhost:8012");
return NetUtils.createSocketAddr(jobTrackerStr);
}


Tuesday, March 26, 2013

See content of Tar/tar gz file without extracting it

Have you ever needed to peek inside a tar.gz file in UNIX/Linux terminal when you don't have a GUI option to do so, if so here is the solution you will want to use for it :)

tar -tf tarfileyouwanttopeekin.tar
or
tar -tf tarfileyouwanttopeekin.tar.gz
or
tar -tf tarfileyouwanttopeekin.tar.bz2


if you look into a Zip file :

unzip -l tarfileyouwanttopeekin.zip

Thursday, March 14, 2013

Adding Scheduler to Hadoop Cluster

 

As we know we we execute task or jobs on hadoop it follows FIFO Scheduling, but if you are in multi user hadoop environment the you will need better scheduler for the consistency and correctness of the task scheduling.

Hadoop comes with other schedulers too those are:

Fair Schedulers : This defines pools and over time; each pool gets around the same amount of resources.

Capacity Schedulers : This defines queues, and each queue has a guaranteed capacity. The capacity scheduler shares computer resources allocated to a queue with other queues if those resources are not in use.

For changing the scheduler you need to take your cluster offline and make some configuration changes, first make sure that the correct scheduler jar files are there. In older version of hadoop you need to put the jar file if not ther in lib directory but from hadoop 1 these jars available in the lib folder and if you are using the newer hadoop good news for you Smile

Steps will be:

Using C++ or C to interact with hadoop

Are you a C++ or c programmer and you are not willing to write java code to interact with Hadoop/HDFS ha? Ok you have an option that is : llbhdfs native library that enables you to write programs in c or cpp to interact with Hadoop.

Current Hadoop distributions contain the pre-compiled libhdfs libraries for 32-bit and 64-bit Linux operating systems. You may have to download the Hadoop standard distribution and compile the libhdfs library from the source code, if your operating system is not compatible with the pre-compiled libraries.

For more information read following:

http://wiki.apache.org/hadoop/MountableHDFS

https://ccp.cloudera.com/display/CDHDOC/Mountable+HDFS

http://xmodulo.com/2012/06/how-to-mount-hdfs-using-fuse.html

Writing code in cpp or c Follows:

Finding out block location and block size of file on HDFS

 

Have you ever needed to find out the Block location and Block size for a file which is lying on hdfs hadoop? if so here is the command you can use to find out that.

For that we need “fsck” command which hadoop provide.

Here goes the command:

bin/hadoop fsck /filepath/filenameonhdfs –block –files  -location

This command will provide information about Block location on what data node its lying, what are the blocks for that file

Just go and play with the command you will understand more. Smile

Bench marking Hadoop


You have completed setting up your Hadoop cluster?? Now its time to benchmark it, as this will help you to understand that your Hadoop is production ready, and configured properly.
So how to proceed?

The benchmarking programs are there in Hadoop*-test.jar file which you can call.

So lets try TestDFSIO : this will test the read/write performance of HDFS.

How we can run this?

Wednesday, March 13, 2013

jps not working: jps: command not found

Either install Open JDK

or

do following

Create an alias for JPS using following:

alias jps='/usr/lib/jvm/jdk1.6.0_33/bin/jps'
else if you want to see java process running execute following command

ps -ef | grep java
or
ps -aux| grep java

Featured Posts

#Linux Commands Unveiled: #date, #uname, #hostname, #hostid, #arch, #nproc

 #Linux Commands Unveiled: #date, #uname, #hostname, #hostid, #arch, #nproc Linux is an open-source operating system that is loved by millio...