Friday, July 26, 2013

The path "" is not a valid path to the 3.5.0-17-generic kernel headers / VMWare can't find linux headers path

Error/Problem while installing VMware Tool in Virtual Machine

If you this error falls in front of you (Ubuntu)

Throw this command infront of it :)

sudo apt-get install build-essential linux-headers-`uname -r` psmisc

-or-

sudo apt-get install linux-headers-$(uname -r)


If anyone of this command will executed successfully, then the error will run away from you :)

Thursday, July 11, 2013

Setting HeartBeat Interval for Datanode

Setting up the following in your hdfs-site.xml will give you 1-minute timeout.
<property>
 <name>heartbeat.recheck.interval</name>
 <value>15</value>
 <description>Determines datanode heartbeat interval in seconds</description>
</property>
If above doesn't work - try the following (seems to be version-dependent):
<property>
 <name>dfs.heartbeat.recheck.interval</name>
 <value>15</value>
 <description>Determines datanode heartbeat interval in seconds.</description>
</property>

Timeout equals to 2 * heartbeat.recheck.interval + 10 * heartbeat.interval. Default forheartbeat.interval is 3 seconds.

Monday, July 8, 2013

Thursday, July 4, 2013

How to copy files from one Hadoop Cluster to another ?

Suppose if you want to copy files from hadoop clusters you have three options :

1 : copy file to local and then copy from local using
  
 copyToLocal and then copyFromLocal
  
 -get and -put

But not a good option.

So we have another option:

-cp and distcp

Distcp will require Map-reduce to be running if you dont want to run Mapreduce on your cluster you have other option that is -cp

Uses:
hadoop dfs -cp hdfs://<source> hdfs://<destination>

if you want faster copy use distcp for that your job tracker and task tracker must be running.

distcp uses:

hadoop distcp hdfs://<source> hdfs://<destination>.

Thursday, June 27, 2013

List top 10/n biggest/smallest files on hadoop (size wise)

List top 10 biggest files in a directory on hadoop:
hadoop dfs -du /testfiles/hd|sort -g -r|head -n  <N>  {N here is the top number of file you want to list}
      
hadoop dfs -du /testfiles/hd|sort -g -r|head -n  10

List top 10 biggest file on hadoop(Recursively) :


hadoop dfs -lsr /|awk '{print $5 "\t\t" $8}'|sort -n -r|head -n <N> {N here is the top numbers of files you want to list}
hadoop dfs -lsr /|awk '{print $5 "\t\t" $8}'|sort -n -r|head -n 10


List top 10 smallest files in a directory on hadoop:
hadoop dfs -du /testfiles/hd|sort -g -r|tail -n  <N>  {N here is the top number of file you want to list}
      
hadoop dfs -du /testfiles/hd|sort -g -r|tail -n  10

List top 10 smallest file on hadoop(Recursively) :


hadoop dfs -lsr /|awk '{print $5 "\t\t" $8}'|sort -n -r|tail -n <N> {N here is the top numbers of files you want to list}
hadoop dfs -lsr /|awk '{print $5 "\t\t" $8}'|sort -n -r|tail -n 10

Wednesday, June 26, 2013

Hadoop version support matrix, hadoop Hbase Version compatibility


HBase-0.92.xHBase-0.94.xHBase-0.95
Hadoop-0.20.205SXX
Hadoop-0.22.x SXX
Hadoop-1.0.0-1.0.2 SSX
Hadoop-1.0.3+SSS
Hadoop-1.1.x NTSS
Hadoop-0.23.x XSNT
Hadoop-2.x XSS
 HBase requires hadoop 1.0.3 at a minimum; there is an issue where we cannot find KerberosUtil compiling against earlier versions of Hadoop.

Where
S = supported and tested,
X = not supported,
NT = it should run, but not tested enough.  

Because HBase depends on Hadoop, it bundles an instance of the Hadoop jar under its lib directory. The bundled jar is ONLY for use in standalone mode. In distributed mode, it is critical that the version of Hadoop that is out on your cluster match what is under HBase. Replace the hadoop jar found in the HBase lib directory with the hadoop jar you are running on your cluster to avoid version mismatch issues. Make sure you replace the jar in HBase everywhere on your cluster. Hadoop version mismatch issues have various manifestations but often all looks like its hung up.

Saturday, June 22, 2013

Keep your locate database updated in linux

We may use find command to find a file in linux but it is a bit slower than Locate, 
locate is a command that help us to find a file efficiently and a bit faster, 
it maintains a database called

"/var/lib/mlocate/mlocate.db" 

and also has a utility called "updatedb" which keeps this database updated with the 
file which

are there in the system. so when we run this command this command runs it update 
the database

and updates this database with newly file added.  This database is kept by linux system to 
keep most updated file list.

Featured Posts

Run Commands for Windows

  🖥️ CPL Files (Control Panel Applets) Run via Win + R → filename.cpl Command Opens appwiz.cpl P...