Thursday, June 27, 2013

List top 10/n biggest/smallest files on hadoop (size wise)

List top 10 biggest files in a directory on hadoop:
hadoop dfs -du /testfiles/hd|sort -g -r|head -n  <N>  {N here is the top number of file you want to list}
      
hadoop dfs -du /testfiles/hd|sort -g -r|head -n  10

List top 10 biggest file on hadoop(Recursively) :


hadoop dfs -lsr /|awk '{print $5 "\t\t" $8}'|sort -n -r|head -n <N> {N here is the top numbers of files you want to list}
hadoop dfs -lsr /|awk '{print $5 "\t\t" $8}'|sort -n -r|head -n 10


List top 10 smallest files in a directory on hadoop:
hadoop dfs -du /testfiles/hd|sort -g -r|tail -n  <N>  {N here is the top number of file you want to list}
      
hadoop dfs -du /testfiles/hd|sort -g -r|tail -n  10

List top 10 smallest file on hadoop(Recursively) :


hadoop dfs -lsr /|awk '{print $5 "\t\t" $8}'|sort -n -r|tail -n <N> {N here is the top numbers of files you want to list}
hadoop dfs -lsr /|awk '{print $5 "\t\t" $8}'|sort -n -r|tail -n 10

Wednesday, June 26, 2013

Hadoop version support matrix, hadoop Hbase Version compatibility


HBase-0.92.xHBase-0.94.xHBase-0.95
Hadoop-0.20.205SXX
Hadoop-0.22.x SXX
Hadoop-1.0.0-1.0.2 SSX
Hadoop-1.0.3+SSS
Hadoop-1.1.x NTSS
Hadoop-0.23.x XSNT
Hadoop-2.x XSS
 HBase requires hadoop 1.0.3 at a minimum; there is an issue where we cannot find KerberosUtil compiling against earlier versions of Hadoop.

Where
S = supported and tested,
X = not supported,
NT = it should run, but not tested enough.  

Because HBase depends on Hadoop, it bundles an instance of the Hadoop jar under its lib directory. The bundled jar is ONLY for use in standalone mode. In distributed mode, it is critical that the version of Hadoop that is out on your cluster match what is under HBase. Replace the hadoop jar found in the HBase lib directory with the hadoop jar you are running on your cluster to avoid version mismatch issues. Make sure you replace the jar in HBase everywhere on your cluster. Hadoop version mismatch issues have various manifestations but often all looks like its hung up.

Saturday, June 22, 2013

Keep your locate database updated in linux

We may use find command to find a file in linux but it is a bit slower than Locate, 
locate is a command that help us to find a file efficiently and a bit faster, 
it maintains a database called

"/var/lib/mlocate/mlocate.db" 

and also has a utility called "updatedb" which keeps this database updated with the 
file which

are there in the system. so when we run this command this command runs it update 
the database

and updates this database with newly file added.  This database is kept by linux system to 
keep most updated file list.

Wednesday, June 19, 2013

if you want a file not to be modified by anyone even the root user

Suppose you have a file which you never want any one to delete, modify, move here is the option you have......

This is called an immutable file which is supported in linux ext2/ext3

which you can do as follow:

suppose you want to create a file called configuration which you never wanted to be modified move or removed the what you can do is follows:

chattr +i configuration

after doing that this file configuration cant be deleted,moved or changed

if you want it to simple file again you  can do

chattr -i configuration

Featured Posts

#Linux Commands Unveiled: #date, #uname, #hostname, #hostid, #arch, #nproc

 #Linux Commands Unveiled: #date, #uname, #hostname, #hostid, #arch, #nproc Linux is an open-source operating system that is loved by millio...