Infinite Programming Tips: 2012

Thursday, December 27, 2012

Find and remove empty direcory

This command will find the empty directory and delete it recursively

find . -type d -empty -delete --> Recursively

find . -empty -type d -exec rmdir {} + --> From current Dir

find . -depth -type d -empty -exec rmdir -v {} +

Find all files of type <*.txt or what ever you like>

find . -name \*.txt -print --> In place of *.txt you can specify extention
you like to search

java.net.SocketException Too many open files

Have you ever encountered this error? is so here is the solution.... :)

This may happen if there are many http request, or forgetting to close the open connection in time so for this what you can do is you can increase the number of open file in linux machine, as it is related to os issue mostly.

Open /etc/sysctl.conf and add the following :

fs.file-max = <65535> <-- keep this number as you need

And for changing the effect of this use following command

sudo sysctl -p /etc/sysctl.conf

Tuesday, December 11, 2012

Find file of specific size Linux

Some time you many need to find all files greater than equal to specific size like 1MB 10 MB or 1GB, so you can use following command to get the files have length more than equal to you have specified:

find / -type f -size <GIVE SIZE HERE IN KB> -exec ls -lh {} \; | awk '{ print $9 ": " $5 }'

Example:

find / -type f -size +1048576k -exec ls -lh {} \; | awk '{ print $9 ": " $5 }'

This will find files greater than size 1GB. so you can experiment with the size as you need.

Monday, December 10, 2012

Whitelist a node in hadoop

You have a cluster with black listed nodes, this is how you can make dem white listed nodes

<property>
<name>dfs.hosts</name>
<value>path to whitelisted node file</value>
</property>

then issue following command

./bin/hadoop dfsadmin -refreshNodes

Hadoop Eco System

HDFS: The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets.

Read It In Detail ===>>

Tuesday, December 4, 2012

Search for a string in all files in a directory Linux

Grep command you can use to search a specified string in all files in the directory path you have given

grep "string to search" /var/*

so this will search string to search in /var/ directory files.

another variation :

find . -type f -exec grep -i "string to find" {} \; -print

or you can use

grep "string to search" *.htm --> this will search for the string to search in all htm files

There can be more options as always :) try and find more.

Monday, December 3, 2012

Hadoop on Windows Azure

Setting Up Your Cluster On Windows :

Once you're invited to Participate in the beta, you can set up your Hadoop cluster. Go to hadooponazure.com and log in with your Windows Live ID Authorized. Next, fill out the dialog boxes on the Web site using the Following values:

Cluster (DNS) name: Enter name in the form "unique <your string>. CloudApp. Net".
Cluster size: Choose the number of nodes, from 4 to 32, and Their Associated storage allocations, from 2TB to 16TB per cluster.
Administrator username and password: Enter a username and password, password complexity restrictions are listed on the page. Once this is set you can connect via remote desktop or via Excel.
Configuration information for a SQL Azure instance: This is an option for storing the Hive MetaStore. If it's selected, you'll need to supply the URL to your SQL Azure server instance, as well as the name of the target database and login credentials. The login you specify must have the Following permissions on the target database: ddl_ddladmin, ddl_datawriter, ddl_datareader.

For more information click Link

Sunday, December 2, 2012

Free Public DNS Server

OpenDNS free dns server list / IP address:

208.67.222.222
208.67.220.220

Google public dns server IP address:

8.8.8.8
8.8.4.4

Dnsadvantage free dns server list:

156.154.70.1
156.154.71.1

Read It In Detail ===>>

Monday, November 26, 2012

Linux : Date/Time Wise History, Date Wise command history in Linux

export HISTTIMEFORMAT="%m/%d - %H:%M:%S: --> "

It will change the HISTTIMEFORMAT variable and bash will store a timestamp in its history accordingly. Then your history will look like this

1 08/16 - 16:12:37: --> cat README | less
2 08/16 - 16:12:58: --> pkg-config --list-all | grep webkit
3 08/16 - 16:13:04: --> history

These are the date/time format you can use.

%d - Day
%m - Month
%y - Year
%T - Time
%H - Hours
%M - Minutes
%S - Seconds

Thursday, November 22, 2012

org.apache.hadoop.hbase.regionserver.HRegionServer ABORTING region server Unhandled exception org.apache.hadoop.hbase.ClockOutOfSyncException

org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server : Unhandled exception: org.apache.hadoop.hbase.ClockOutOfSyncException:

This error occurs due to time synchronization between the nodes of the cluster, if the time difference between the node is more that 30000ms between the master and slave time then this error comes:

What is the solution ????

Read It In Detail ===>>

Wednesday, November 21, 2012

Setting up Standalone zookeeper cluster for production cluster

1.      Download the zookeeper from

a.      http://zookeeper.apache.org/releases.html#download or http://www.apache.org/dyn/closer.cgi/zookeeper/

2.      The downloaded file will be Tar file so extract in your desired directory.

3.      Create Directory for Zookeeper snapshot logs

Read It In Detail ===>>

Tuesday, November 20, 2012

Datanode Decommissioning from Hadoop Cluster

Hadoop offers the decommission feature to properly take out a set of existing data-nodes. The nodes to be taken out of cluster should be included into the exclude file, and the exclude file name should be specified as a configuration parameter dfs.hosts.exclude. This file should have been specified during namenode startup. It could be a zero length file. You must use the full hostname, ip or ip:port format in this file. Then the shell command

bin/hadoop dfsadmin -refreshNodes

Read It In Detail ===>>

Best Way to add nodes to hadoop cluster

Add the new node's DNS name to the conf/slaves file on the master node. Then log in to the new slave node and execute:

If you are using Cloudera's distribution of Hadoop:

service hadoop-0.20-datanode start
service hadoop-0.20-tasktracker start

If you are using Apache distribution of Hadoop:

Read It In Detail ===>>

Monday, November 19, 2012

Configure MySQL as a MetaStore For Hive. MySQL as Hive

1. first your mysql should be installed and running.

I will show you how to configure mysql as a meta store for hive

lets start --->

Read It In Detail ===>>

Ubuntu returning to login screen, Ubuntu Login loop

If you are having login problem in ubuntu, for example if you are putting your password and login screen coming again and again then just try following solution.

sudo apt-get install --reinstall xorg

/home/<username>/.XAuthority* <—The user name which is not able to login.

then restart the system.

and if above does not work try following

switch to shell (Ctrl+Alt+F1) you can use <F1 to F6 in place of F1>

logged in as the user

cd /home/user

sudo mv .Xauthority .XauthorityBak

sudo reboot

Friday, November 16, 2012

Install oracle/sun java on ubuntu 12.10

1. Open a terminal window.
2. Type in the following commands then hit Enter after each.

sudo sh -c "echo 'deb http://www.duinsoft.nl/pkg debs all' >> /etc/apt/sources.list"
sudo apt-get update
sudo apt-key adv --keyserver keys.gnupg.net --recv-keys 5CB26B26
sudo apt-get update
sudo apt-get install update-sun-jre

To install JDK 7 on i386 32-bit systems:

1. Open a terminal window.
2. Type in the following commands then hit Enter after each.
cd /tmp
wget -c --no-cookies --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com" "http://goo.gl/g9cJl" -O jdk-7u7-nb-7_2-linux-i586-ml.sh

chmod +x jdk-7u7-nb-7_2-linux-i586-ml.sh
sudo sh jdk-7u7-nb-7_2-linux-i586-ml.sh

To install JDK 7 on AMD 64-bit systems:

cd /tmp

wget -c --no-cookies --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com" "http://goo.gl/AJ1oS" -O jdk-7u7-nb-7_2-linux-x64-ml.sh

chmod +x jdk-7u7-nb-7_2-linux-x64-ml.sh
sudo sh jdk-7u7-nb-7_2-linux-x64-ml.sh

When the install is complete, use these commands:

sudo mkdir -p /usr/lib/jvm/
sudo cp -R /usr/local/jdk1.7.0* /usr/lib/jvm/
sudo update-alternatives --install /usr/bin/javac javac /usr/lib/jvm/jdk1.7.0_07/bin/javac 1
sudo update-alternatives --install /usr/bin/java java /usr/lib/jvm/jdk1.7.0_07/bin/java 1

For more, see the original article at the link below.
http://www.itworld.com/software/305913/install-oracle-java-7-ubuntu-1210
and
http://www.upubuntu.com/2012/10/how-to-install-oracle-java-7-jre-7-jdk.html?m=0

Open a port to listen on in Linux

The following command you can use to open a port to if you are getting a connection refused error on specific port number :

iptables -A INPUT -p tcp --dport <Port>2 -j ACCEPT

Monday, October 15, 2012

Hive Thrift Server

Hive provide a service that can be called as hive server or thrift server which helps us to access hive using different languages like c++, Java, Ruby, Python and many others remotely like how we use JDBC or ODBC connectors.

By default we use Command line to access hive, but to use hive programmatically we need such facility as a server that runs on an address and a port no so that it can be used in programming language efficiently and easily.

Starting Thrift Server :

bin/hive - -service hiveserver & -----> this will start thrift server and will give terminal to be used

bin/hive - - service hiveserver -------> this need to keep the terminal open, if terminal is closed server will be killed (Look for & operator in linux)

after starting this you can verify whether hive thrift server is running or not, by using following command

netstat –nl | grep 10000

if it shows something like :

tcp 0 0 :::10000 :::* Listen

that means your thrift server is running successfully. by default thrift server runs on port no 10000 we and also make it to run on different port no. as follows

build/dist/bin/hive --service hiveserver --help
usage HIVE_PORT=xxxx ./hive --service hiveserver
  HIVE_PORT : Specify the server port

Limitation with Thrift server : HiveServer can not handle concurrent requests from more than one client. This is actually a limitation imposed by the Thrift interface that HiveServer exports, and can't be resolved by modifying the HiveServer code.

for using hive with jdbc and java refer –> Infinity

Tuesday, September 25, 2012

ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried

ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 times

Here is some help for this command:
List all tables in hbase. Optional regular expression parameter could
be used to filter the output. Examples:

There may be more than one probable reason : one is here :)

Check if namenode is in safemode....... If so wait for namenode to come out from safe mode or after waiting for 1 minute you can just ask name node to come out of safe mode using command

bin/hadoop dfsadmin -safemode leave

and then try to work with hbase ..

Monday, September 24, 2012

Apache Hadoop NextGen MapReduce (YARN)

MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2) or YARN.

The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs.

The ResourceManager and per-node slave, the NodeManager (NM), form the data-computation framework. The ResourceManager is the ultimate authority that arbitrates resources among all the applications in the system.

The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.

Check this LINK for more detail

Hadoop High Availability (HA hadoop Cluster)

Note: Currently, only manual failover is supported, means rely on the operator to manually initiate a failover. Automatic failure detection and initiation of a failover will be implemented in future versions.

The best technology has one demerit you know what : High availability of Namenode, so if NameNode is down whole cluster is down, so here hadoop has added HA feature in Hadoop which will make it more available for use Smile

i did not get any other word.

High Availability feature addresses the above problems by providing the option of running two redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby. This allows a fast failover to a new NameNode in the case that a machine crashes, or a graceful administrator-initiated failover for the purpose of planned maintenance.

How it is organised ?

Read It In Detail ===>>

Sunday, September 23, 2012

Demystifying Hadoop concepts Series: Safe mode

What is is safe mode of hadoop, may time we come across this exception “ org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.SafeModeException” or some other exceptions Which contains safe mode in it Smile

First let me tell what Safe mode is in context to Hadoop : as we all know Name node contains fsimage (metadata) of the data present on the cluster, which can be large or small based on the size of the cluster and the size of date present on the cluster, so when the name node starts it loads this fsimage and the edit logs from the disk in the Primary memory RAM for fast processing, and after loading it waits for data nodes to report about the present on those data nodes, so during this process that is loading the fsimage and edit logs and waiting for data nodes to report about the data block in safe mode, which is a read only mode for name node this is done to maintain the consistency of the data present, this is just like saying “ i will not receive any thing till i know what i already have”. And during this period no modification to the file blocks are allowed as to maintain the correctness of the data.

Read It In Detail ===>>

Wednesday, July 25, 2012

CloudFront: HOW TO RUN MAPREDUCE PROGRAMS USING ECLIPSE

CloudFront: HOW TO RUN MAPREDUCE PROGRAMS USING ECLIPSE: Hadoop provides us a plugin for Eclipse that helps us to connect our Hadoop cluster to Eclipse. We can then run MapReduce jobs and browse Hd...

Monday, July 23, 2012

SharePoint Orange: How to : working with HBase Coprocessor

SharePoint Orange: How to : working with HBase Coprocessor: HBase Coprocessor : It allows user code to get executed at each region(for a table) in region server. Clients only get the final responses...

Monday, July 16, 2012

High available hadoop cluster

One of the problems which is being discussed with the Hadoop cluster was that is not high available due to the name node failure, as it single point of failure, so here is new version of Hadoop that is coming with high avalibility option that is two NameNode , where the switching of NameNode will happen automatically as one NameNode goes down, the other will take charge meanwhile we can check the problem with the name node,

The proposed design follows:

So you can thin it of two cluster interconnected using switches and high speed communication line.

so mainly this will be dependent a service called heartbeat that will check the aliveness of NameNode if one goes down the redirection will be done automatically.

We can see the it as follow too:

Sunday, June 24, 2012

Convert PHP code to c++ code

HipHop is a source code transformer which transforms PHP source code into highly optimized C++ and then compiles it using g++. Currently supported platforms are Linux and FreeBSD. There is no OS X support.

Download it from HERE

Wednesday, June 20, 2012

CloudFront: HOW TO CONFIGURE HABSE IN PSEUDO DISTRIBUTED MODE ...

CloudFront: HOW TO CONFIGURE HABSE IN PSEUDO DISTRIBUTED MODE ...: If you have successfully configured Hadoop on a single machine in pseudo-distributed mode and looking for some help to use Hbase on top of t...

Tuesday, June 19, 2012

E:Encountered a section with no Package: header,,E:Problem with MergeList /var/lib/apt/lists /us.archive.ubuntu.com_ubuntu_dists_natty_main_binary-i386_Packages, E:The package lists or status file could not be parsed or opened.

Problem :
E:Encountered a section with no Package: header,,E:Problem with MergeList /var/lib/apt/lists /us.archive.ubuntu.com_ubuntu_dists_natty_main_binary-i386_Packages,
E:The package lists or status file could not be parsed or opened.

Probable Reason:

Installation process interrupted abruptly. or some important installation process not completed correctly.

Probable Solution:

Try following commands :

sudo rm /var/lib/apt/lists/* -vf
sudo apt-get update

CloudFront: HOW TO MOVE DATA INTO AN HBASE TABLE USING FLUME-N...

CloudFront: HOW TO MOVE DATA INTO AN HBASE TABLE USING FLUME-N...: The first Hbase sink was commited to the Flume 1.2.x trunk few days ago. In this post we'll see how we can use this sink to collect data f...

Monday, June 18, 2012

Want to experiment with android tablet

Install android apps from google play using pc.
Just login to chrome with your google id and login with same id inyour tablet or android phone, and go to google play using your pc' as it will be easy for you to install apps using a pc, saerch applications and install so whenever your android tablet or phone will be connected to internet your choosen apps will automatically be downloaded and synchronised to your android device.
You just need to login with same google account in your pc as well as your android device.

Friday, June 15, 2012

CloudFront: Tips for Hadoop newbies (Part I).

CloudFront: Tips for Hadoop newbies (Part I).: Few moths ago, after completing my graduation I thought of doing something new. In quest of that I started learning and working on Apache's...

CloudFront: How to install maven3 on ubuntu 11.10

CloudFront: How to install maven3 on ubuntu 11.10: If you are trying to install maven2 that comes shipped with your ubutnu 11.10, and it is not working as intended you can try following steps...

CloudFront: Error while executing MapReduce WordCount program ...

CloudFront: Error while executing MapReduce WordCount program ...: Quite often I see questions from people who are comparatively new to the Hadoop world or just starting their Hadoop journey that they are ge...

CloudFront: HOW TO CHANGE THE DEFAULT KEY-VALUE SEPARATOR OF A...

CloudFront: HOW TO CHANGE THE DEFAULT KEY-VALUE SEPARATOR OF A...: The default MapReduce output format, TextOutputFormat , writes records as lines of text. Its keys and values may be of any type, since Text...

CloudFront: HOW TO MOVE DATA INTO AN HBASE TABLE USING FLUME-N...

Thursday, June 14, 2012

CodePool: How to print A to Z in Java easily

CodePool: How to print A to Z in Java easily: To print the alphabets from A-Z in Java without any hassles you just need a for loop like this : For lower case : for(char ch='...

CloudFront: Error while executing MapReduce WordCount program ...

CloudFront: Tips for Hadoop newbies (Part I).

CloudFront: Tips for Hadoop newbies (Part I).: Few moths ago, after completing my graduation I thought of doing something new. In quest of that I started learning and working on Apache's...

Wednesday, June 13, 2012

Print a triangle using shell script

Print numbers in order :

#!/bin/bash
for i in $(seq 0 4)
do
for j in $(seq $i -1 0)
do
echo -n $j
done
echo
done

Will give you output as :

Read It In Detail ===>>

Tuesday, June 12, 2012

Java Forecast 4u: Why the methods of interfaces are public and abstr...

Java Forecast 4u: Why the methods of interfaces are public and abstr...: Interface methods are: public since they should be available to third party vendors to provide implementation. and abstract becaus...

Java Forecast 4u: What is Lazy Loading ?

Java Forecast 4u: What is Lazy Loading ?: Lazy loading decides whether to load the child objects while loading the parent object. we need to do this setting in Hibernate mapping ...

Java Forecast 4u: Input/Outpur( I/O) Stream

Java Forecast 4u: Input/Outpur( I/O) Stream: I/O stands for input output. Input stream are used to read the data from input devices. Output stream are used to write the data to output...

Java Forecast 4u: what is Configuration class in Hibernate?

Java Forecast 4u: what is Configuration class in Hibernate?: Configuration is a class which is available in “org.hibernate.cfg “package . Hibernate runtime system will be stored by installing con...

java.util.concurrent.RejectedExecutionException

Problem:

Exception : java.util.concurrent.RejectedExecutionException while performing any operation on hbase.

Probable Solution and Reason:

You may have closed the table object somewhere, of if the table is closed nothing will happen :) just check if you have closed the table and trying to perform operation using that table object .

Configure eclipse for map reduce and writing sample word count program

If this video does not play please visit :

http://www.youtube.com/watch?v=TavehEdfNDk

This will show how to configure eclipse for running Hadoop mapreduce program. configuration and a sample word count program which you can get from
http://wiki.apache.org/hadoop/WordCount

Download eclipse jar from here :
https://dl.dropbox.com/u/19454506/hadoop-eclipse-plugin-0.20.203.0.jar
this jar will work for newer version of hadoop too. copy this jar file to eclipse plugin directory and follow the video.

If above video is not playing please visit :http://www.youtube.com/watch?v=TavehEdfNDk

HBase components and Know what......

It is one of the cool projects from Apache, that enable a facility to provide a large scale scale able, distributed database, based on Hadoop. In this data is organised as row columns that can grow infinitely as you add up new nodes. No need to reconfigure and mess up much with the configuration setting.

This requires Java and Hadoop to run full fledged manner.

Components:

HBaseMaster :

The HBaseMaster is responsible for assigning regions to HRegionServers. The first region to be assigned is the ROOT region which locates all the META regions to be assigned. The HBaseMaster also monitors the health of each HRegionServer, and if it detects a HRegionServer is no longer reachable, it will split the HRegionServer's write-ahead log so that there is now one write-ahead log for each region that the HRegionServer was serving. After it has accomplished this, it will reassign the regions that were being served by the unreachable HRegionServer. In addition, the HBaseMaster is also responsible for handling table administrative functions such as on/off-lining of tables, changes to the table schema (adding and removing column families), etc.

HRegionServer:

The HRegionServer is responsible for handling client read and write requests. It communicates with the BaseMaster to get a list of regions to serve and to tell the master that it is alive. Region assignments and

other instructions from the master "piggy back" on the heart beat messages.

HBase client:

The HBase client is responsible for finding HRegionServers that are serving the particular row range of interest. On instantiation, the HBase client communicates with the HBaseMaster to find the location of the ROOT region. This is the only communication between the client and the master.

Inherited from : Here

From : Research paper of (Ankur Khetrapal, Vinay Ganesh)

Monday, June 11, 2012

Hadoop 1 eclipse plugin

https://dl.dropbox.com/u/19454506/hadoop-eclipse-plugin-0.20.203.0.jar

Thursday, June 7, 2012

Deep Copy and Shallow Copy in OOPS

Shallow Copy :
This does a bit-wise copy of an object. So when is new object is created it will have exact copy of the object, this is where problem comes, suppose the object which is to be cloned has some variable as reference or a reference variable pointing to some other data or object, then in the new object clone will contain the reference to the old object data only,

***Soon i will add image to clarify this concept ***

Deep Copy:
It will be like duplicate of the object, in this copy the new object or variable of referenced data will be created.

***Soon i will add image to clarify this concept ***

Compile a .cs file which is located in different folder of the disk from another .cs program

ProcessStartInfo info = new ProcessStartInfo(@"C:\Windows\Microsoft.NET\Framework\v3.5\csc.exe");
info.Arguments = @" /out:C:\ss\Class1.dll C:\ss\Class1.cs";
info.UseShellExecute = false;
Process.Start(info);

Wednesday, June 6, 2012

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask, java.io.IOException: Exception reading file:/

Exception in hive :

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask, java.io.IOException: Exception reading file:/

Error Stack :


Error initializing attempt_201206070234_0004_m_000002_0:

java.io.IOException: Exception reading file:/../Hadoop/hdfs/tmp/mapred/local/ttprivate/taskTracker/shashwat/jobcache/job_201206070234_0004/jobToken
at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:135)

t org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1154)
 at org
at org.apache.hadoop.mapreduce.security.TokenCache.loadTokens(TokenCache.java:165)

Read It In Detail ===>>

Tuesday, June 5, 2012

FAILED: Error in metadata: MetaException(message:Got exception: org.apache.hadoop.ipc.RemoteException org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /user/hive/warehouse/user. Name node is in safe mode.

Error Message:

FAILED: Error in metadata: MetaException(message:Got exception: org.apache.hadoop.ipc.RemoteException org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /user/hive/warehouse/user. Name node is in safe mode.

Solution:

Read It In Detail ===>>

FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Unexpected exception caught. NestedThrowables: java.lang.reflect.InvocationTargetException FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

Error in hive :

FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Unexpected exception caught.

NestedThrowables:

java.lang.reflect.InvocationTargetException

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

Probable Solution:

If you have copied extra jars from somewhere to hive live folder, that is causing the problem, so remove the jars that you have added and then try.

and also if you have defined the aux_jars check if different jars are colliding in the path.

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/common/LogUtils$LogInitializationException,

Exception :

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/CommandNeedRetryException

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/common/LogUtils$LogInitializationException

Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.CommandNeedRetryException

Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.common.LogUtils$LogInitializationException

Solution :

In hadoop-env.sh if you are specifying HADOOP_CLASSPATH then specify like :

export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/home/shashwat/Hadoop/hadoop/lib:/home/shashwat/Hadoop/hadoop

Bihar board intermediate result

Not for programmers : )

Visit this link

http://results.bihareducation.net/

or

Send ON RESULTSALERT to9870807070

Inheritance Example

Thursday, May 31, 2012

SharePoint Orange: What is SEG_Y? Headers and Traces.

SharePoint Orange: What is SEG_Y? Headers and Traces.: SEG_Y is open standard file format for storing geophysical ( eg: seismic ) data. These are stored on magnetic tapes and usually of several G...

SharePoint Orange: Things to remember : In Core JAVA

SharePoint Orange: Things to remember : In Core JAVA: Q 1. Can you tell, which Algorithm is used by HashMap/HashTable? A - HashMap internally uses bucket to store key-value pair. When a key is...

SharePoint Orange: Things to remember : In Map Reduce

SharePoint Orange: Things to remember : In Map Reduce: Q 1. What is IdentityMapper? A - An empty Mapper which directly writes key/value to the output. Mapper Q 2. What is ...

Friday, May 25, 2012

Get file name from String in Java

String path="c:\myfolder\file.txt";
String filename=path.substring(path.lastIndexOf("/")+1,path.lastIndexOf("."));
System.out.println(filename);

Thursday, May 24, 2012

Install Oracle Java 7 in Ubuntu via PPA

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer

 Follow this link if above does not work

http://www.liberiangeek.net/2012/04/install-oracle-java-jdk-7-in-ubuntu-12-04-precise-pangolin/

SimplePostTool: FATAL: Solr returned an error #400 Bad Request

Probable reasons :

What ever you are trying to post has wrong entries that does not match with the solr schema. it can be

Value Type
Value not found if it is required field
Value format is not correct

Solution :

Check the solr log for porper error
make your posting data in correct format
check for the field type which you are posting.

Wednesday, May 23, 2012

Handling Jar in Ubuntu : Set .JAR default action to run with Java

Right click on any .jar files and select properties.

Click on the 'Open With' tab (usually the 4th tab).

select Sun 'Java 6 Runtime'(if you are using sun Java otherwise select appropriate Java from the list).

And click close to save the settings.

Remove packages from ubuntu linux using shell terminal

sudo apt-get purge <package-name>

Example : this will remove sun java 6 from the system

sudo apt-get purge sun-java6-jdk sun-java6-plugin

Installing Java on Ubuntu 12.04

Open Terminal and Clean up the absolute java by giving following command

sudo apt-get purge openjdk*
Then give this commands

sudo add-apt-repository ppa:eugenesan/java
sudo apt-get update
sudo apt-get install oracle-java7-installer

After above command issue this commands

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer

If above method does not work go for this

wget https://github.com/flexiondotorg/oab-java6/raw/0.2.1/oab-java6.sh -O oab-java6.sh
chmod +x oab-java6.sh
sudo ./oab-java6.sh

Else follow this link : Configure Sun Java

Sunday, May 20, 2012

APACHE ACCUMULO

The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system. Apache Accumulo is based on Google's BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift. Apache Accumulo features a few novel improvements on the BigTable design in the form of cell-based access control and a server-side programming mechanism that can modify key/value pairs at various points in the data management process. Other notable improvements and feature are outlined here.

Google published the design of BigTable in 2006. Several other open source projects have implemented aspects of this design including HBase,Hypertable, and Cassandra. Accumulo began its development in 2008 and joined the Apache community in 2011.

Apache Gora

What is Apache Gora?

The Apache Gora open source framework provides an in-memory data model and persistence for big data. Gora supports persisting to column stores, key value stores, document stores and RDBMSs, and analyzing the data with extensive Apache Hadoop MapReduce support.

Why Apache Gora?

Although there are various excellent ORM frameworks for relational databases, data modeling in NoSQL data stores differ profoundly from their relational cousins. Moreover, data-model agnostic frameworks such as JDO are not sufficient for use cases, where one needs to use the full power of the data models in column stores. Gora fills this gap by giving the user an easy-to-use in-memory data model and persistence for big data framework with data store specific mappings and built in Apache Hadoop support.

The overall goal for Gora is to become the standard data representation and persistence framework for big data. The roadmap of Gora can be grouped as follows.

Data Persistence : Persisting objects to Column stores such as HBase, Cassandra, Hypertable; key-value stores such as Voldermort, Redis, etc; SQL databases, such as MySQL, HSQLDB, flat files in local file system of Hadoop HDFS.
Data Access : An easy to use Java-friendly common API for accessing the data regardless of its location.
Indexing : Persisting objects to Lucene and Solr indexes, accessing/querying the data with Gora API.
Analysis : Accesing the data and making analysis through adapters for Apache Pig, Apache Hive and Cascading
MapReduce support : Out-of-the-box and extensive MapReduce (Apache Hadoop) support for data in the data store.

Background

ORM stands for Object Relation Mapping. It is a technology which abstacts the persistency layer (mostly Relational Databases) so that plain domain level objects can be used, without the cumbersome effort to save/load the data to and from the database. Gora differs from current solutions in that:

Gora is specially focussed at NoSQL data stores, but also has limited support for SQL databases.
The main use case for Gora is to access/analyze big data using Hadoop.
Gora uses Avro for bean definition, not byte code enhancement or annotations.
Object-to-data store mappings are backend specific, so that full data model can be utilized.
Gora is simple since it ignores complex SQL mappings.
Gora will support persistence, indexing and anaysis of data, using Pig, Lucene, Hive, etc.

For More Visit

Apache JMeter

The Apache JMeter™ desktop application is open source software, a 100% pure Java application designed to load test functional behavior and measure performance. It was originally designed for testing Web Applications but has since expanded to other test functions.

What can I do with it?

Apache JMeter may be used to test performance both on static and dynamic resources (files, Servlets, Perl scripts, Java Objects, Data Bases and Queries, FTP Servers and more). It can be used to simulate a heavy load on a server, network or object to test its strength or to analyze overall performance under different load types. You can use it to make a graphical analysis of performance or to test your server/script/object behavior under heavy concurrent load.

What does it do?

Apache JMeter features include:

Can load and performance test many different server types:
- Web - HTTP, HTTPS
- SOAP
- Database via JDBC
- LDAP
- JMS
- Mail - POP3(S) and IMAP(S)
Complete portability and 100% Java purity .
Full multithreading framework allows concurrent sampling by many threads and simultaneous sampling of different functions by separate thread groups.
Careful GUI design allows faster operation and more precise timings.
Caching and offline analysis/replaying of test results.
Highly Extensible:
- Pluggable Samplers allow unlimited testing capabilities.
- Several load statistics may be choosen with pluggable timers .
- Data analysis and visualization plugins allow great extensibility as well as personalization.
- Functions can be used to provide dynamic input to a test or provide data manipulation.
- Scriptable Samplers (BeanShell is fully supported; and there is a sampler which supports BSF-compatible languages)

JMeter is not a browser

JMeter is not a browser. As far as web-services and remote services are concerned, JMeter looks like a browser (or rather, multiple browsers); however JMeter does not perform all the actions supported by browsers. In particular, JMeter does not execute the Javascript found in HTML pages. Nor does it render the HTML pages as a browser does (it's possible to view the response as HTML etc, but the timings are not included in any samples, and only one sample in one thread is ever viewed at a time).

Visit Here For More

Friday, May 18, 2012

Issue:If you master machine's region server is running and slave's regionserver is not running and if you start the slave's regionserver by bin/hbase regionserver start and if you get exception like “ hadoop.hbase.ClockOutOfSyncException: Server

Issue:

If you master machine's region server is running and slave's regionserver is not running and if you start the slave's regionserver by bin/hbase regionserver start and if you get exception like “ hadoop.hbase.ClockOutOfSyncException: Server localhost,60020,1337339438788 has been rejected; Reported time is too far out of sync with master. Time difference of 34404ms > max allowed of 30000ms ” means .

Fix:

check the master machine time(command”date”) and compare it slave machine time(date). If the difference is more than 30 sec- update the slave machine time to master machine's time. command-

“sudo date nnddhhmmyyyy.ss”

usage- “sudo date 051817202012.00”

Thursday, May 17, 2012

Secure HBase : Access Controls

Building and maintaining an HBase cluster is a complex undertaking, especially if you build your own hardware infrastructure. Leasing from a cloud service such as Amazon EC2 allows you to avoid the expense and complexity of setting up your own hardware, but you’ll still need to know how to install, configure and tune your HBase cluster on top of your leased instances.

But what if you could simply connect to a HBase instance, hosted in a public cloud, and let someone else worry about HBase setup and maintenance? We believe there’s a group of potential HBase users who simply want to connect to a managed HBase cluster and start storing their data.

Another class of customers may be large organizations that want to centralize IT resources within a private cloud: a single company-internal cluster running HBase. Such organizations may have several departments, each of which is a tenant in the private cloud.

Both of these groups of potential HBase users want to keep their data secure in the presence of other tenants of the hosted HBase cluster: they want own their own tables, and provide defined access to users in their department, and perhaps even provide defined access to other tenants.

Read Full Story

HBase Security for the Enterprise

Trend Micro developed the new security features in HBase 0.92 and has the first known deployment of secure HBase in production. We will share our motivations, use cases, experiences, and provide a 10 minute tutorial on how to set up a test secure HBase cluster and a walk through of a simple usage example. The tutorial will be carried out live on an on-demand EC2 cluster, with a video backup in case of network or EC2 unavailability.

Source : here

no crontab for cluster - using an empty one

If get this error for any user while defining the crontab ...

Just do as follows

Type " export EDITOR=nano"

and then try crontab -e

Will work

Thursday, December 27, 2012

Tuesday, December 11, 2012

Monday, December 10, 2012

Tuesday, December 4, 2012

Monday, December 3, 2012

Setting Up Your Cluster On Windows :

Sunday, December 2, 2012

Monday, November 26, 2012

Thursday, November 22, 2012

Wednesday, November 21, 2012

Tuesday, November 20, 2012

Monday, November 19, 2012

Friday, November 16, 2012

Monday, October 15, 2012

Tuesday, September 25, 2012

Monday, September 24, 2012

Sunday, September 23, 2012

Wednesday, July 25, 2012

Monday, July 23, 2012

Monday, July 16, 2012

Sunday, June 24, 2012

Wednesday, June 20, 2012

Tuesday, June 19, 2012

Monday, June 18, 2012

Friday, June 15, 2012

Thursday, June 14, 2012

Wednesday, June 13, 2012

Tuesday, June 12, 2012

Monday, June 11, 2012

Thursday, June 7, 2012

Wednesday, June 6, 2012

Tuesday, June 5, 2012

Not for programmers : )

Thursday, May 31, 2012

Friday, May 25, 2012

Thursday, May 24, 2012

Wednesday, May 23, 2012

Sunday, May 20, 2012

What is Apache Gora?

Why Apache Gora?

Background

What can I do with it?

What does it do?

JMeter is not a browser

Friday, May 18, 2012

Thursday, May 17, 2012

Tuesday, May 15, 2012

Featured Posts