Infinite Programming Tips

Friday, July 26, 2013

The path "" is not a valid path to the 3.5.0-17-generic kernel headers / VMWare can't find linux headers path

Error/Problem while installing VMware Tool in Virtual Machine

If you this error falls in front of you (Ubuntu)

Throw this command infront of it :)

sudo apt-get install build-essential linux-headers-`uname -r` psmisc

-or-

sudo apt-get install linux-headers-$(uname -r)

If anyone of this command will executed successfully, then the error will run away from you :)

Sunday, July 14, 2013

Hive mysql meta store

Hadoop offline image viewer Hadoop OIV

Hadoop Hive Hbase pig configuration on linux

Configure Hadoop and HBase on Linux

Thursday, July 11, 2013

Setting HeartBeat Interval for Datanode

Setting up the following in your hdfs-site.xml will give you 1-minute timeout.

<property>

<name>heartbeat.recheck.interval</name>

<value>15</value>

<description>Determines datanode heartbeat interval in seconds</description>

</property>

If above doesn't work - try the following (seems to be version-dependent):

<property>

<name>dfs.heartbeat.recheck.interval</name>

<value>15</value>

<description>Determines datanode heartbeat interval in seconds.</description>

</property>

Timeout equals to 2 * heartbeat.recheck.interval + 10 * heartbeat.interval. Default forheartbeat.interval is 3 seconds.

Monday, July 8, 2013

Call to failed on local exception: java.io.EOFException

1. check if you are able to connect to hadoop.
2. check if namenode is running.
3. check for the version of hadoop

Thursday, July 4, 2013

How to copy files from one Hadoop Cluster to another ?

Suppose if you want to copy files from hadoop clusters you have three options :

1 : copy file to local and then copy from local using

copyToLocal and then copyFromLocal

-get and -put

But not a good option.

So we have another option:

-cp and distcp

Distcp will require Map-reduce to be running if you dont want to run Mapreduce on your cluster you have other option that is -cp

Uses:
hadoop dfs -cp hdfs://<source> hdfs://<destination>

if you want faster copy use distcp for that your job tracker and task tracker must be running.

distcp uses:

hadoop distcp hdfs://<source> hdfs://<destination>.

Thursday, June 27, 2013

List top 10/n biggest/smallest files on hadoop (size wise)

List top 10 biggest files in a directory on hadoop:
hadoop dfs -du /testfiles/hd|sort -g -r|head -n <N> {N here is the top number of file you want to list}

hadoop dfs -du /testfiles/hd|sort -g -r|head -n 10

List top 10 biggest file on hadoop(Recursively) :

hadoop dfs -lsr /|awk '{print $5 "\t\t" $8}'|sort -n -r|head -n <N> {N here is the top numbers of files you want to list}
hadoop dfs -lsr /|awk '{print $5 "\t\t" $8}'|sort -n -r|head -n 10

List top 10 smallest files in a directory on hadoop:
hadoop dfs -du /testfiles/hd|sort -g -r|tail -n <N> {N here is the top number of file you want to list}

hadoop dfs -du /testfiles/hd|sort -g -r|tail -n 10

List top 10 smallest file on hadoop(Recursively) :

hadoop dfs -lsr /|awk '{print $5 "\t\t" $8}'|sort -n -r|tail -n <N> {N here is the top numbers of files you want to list}
hadoop dfs -lsr /|awk '{print $5 "\t\t" $8}'|sort -n -r|tail -n 10

Wednesday, June 26, 2013

Hadoop version support matrix, hadoop Hbase Version compatibility

	HBase-0.92.x	HBase-0.94.x	HBase-0.95
Hadoop-0.20.205	S	X	X
Hadoop-0.22.x	S	X	X
Hadoop-1.0.0-1.0.2	S	S	X
Hadoop-1.0.3+	S	S	S
Hadoop-1.1.x	NT	S	S
Hadoop-0.23.x	X	S	NT
Hadoop-2.x	X	S	S
HBase requires hadoop 1.0.3 at a minimum; there is an issue where we cannot find KerberosUtil compiling against earlier versions of Hadoop.

Where

S = supported and tested,
X = not supported,
NT = it should run, but not tested enough.

Because HBase depends on Hadoop, it bundles an instance of the Hadoop jar under its lib directory. The bundled jar is ONLY for use in standalone mode. In distributed mode, it is critical that the version of Hadoop that is out on your cluster match what is under HBase. Replace the hadoop jar found in the HBase lib directory with the hadoop jar you are running on your cluster to avoid version mismatch issues. Make sure you replace the jar in HBase everywhere on your cluster. Hadoop version mismatch issues have various manifestations but often all looks like its hung up.

Saturday, June 22, 2013

Keep your locate database updated in linux

We may use find command to find a file in linux but it is a bit slower than Locate,

locate is a command that help us to find a file efficiently and a bit faster,

it maintains a database called

"/var/lib/mlocate/mlocate.db" 

and also has a utility called "updatedb" which keeps this database updated with the

file which

are there in the system. so when we run this command this command runs it update

the database

and updates this database with newly file added.  This database is kept by linux system to

keep most updated file list.

Read It In Detail ===>>