Saturday, March 31, 2012

Hadoop Hive Hbase Profiler

If you have some question lie this

How do I get the following meta information about a table

1. recent users of table,

2. top users of table,

3. recent queries/jobs/reports,

4. number of rows in a table

Look :->

The tool available here
Store information from the job tracker including counters, progress, jobxml info. Put it all in cassandra so you can profile runs of a job, day over day etc.

Some Code Samples from This project :


package com.m6d.hadoopclusterprofiler;

import java.util.HashMap;

import java.util.Map;

public class Launcher {

public static void main(String[] args) {

Map <String,String>conf = new HashMap<String,String>();

for (String arg : args) {

if (arg.contains("=")) {

String vname = arg.substring(0, arg.indexOf('='));

String vval = arg.substring(arg.indexOf('=') + 1);

conf.put(vname, vval.replace("\"", ""));

System.out.println(vname);

System.out.println(vval);

}

}

DataLayer dataLayer=null;

try {

dataLayer = new DataLayer(conf.get(Conf.cassandra_hosts),

Integer.parseInt(conf.get(Conf.cassandra_port))

);

} catch (NumberFormatException e) {

e.printStackTrace();

System.exit(1);

} catch (Exception e) {

e.printStackTrace();

System.exit(1);

}

JobClientPoller p = new JobClientPoller(conf.get(Conf.jt_host),

Integer.parseInt(conf.get(Conf.jt_port)),

Integer.parseInt(conf.get(Conf.delay_ms)),

dataLayer);

Thread pollerThread = new Thread(p);

pollerThread.start();

}

}

import com.google.gson.Gson;

public class JobClientPoller implements Runnable{

String jobTracker;

int pollingMS;

int jtport;

boolean goOn;

private boolean firstRun=true;

DataLayer dataLayer;

Map<String,JobStatus> lastRunJobs; //should persist this to fs so no replay on startup

public JobClientPoller(String jobTracker, int jtport, int pollingMS, DataLayer dl){

goOn=true;

this.pollingMS=pollingMS;

this.jtport=jtport;

this.dataLayer=dl;

this.jobTracker=jobTracker;

lastRunJobs = new HashMap<String,JobStatus>();

}

public void run(){

while(goOn){

System.out.println("running");

long loopStart= System.currentTimeMillis();

JobConf conf = new JobConf();

conf.set("mapred.job.tracker",jobTracker+":"+jtport);

JobClient jc = null;

Rest is avilable on the site hosting the project, this is open source project from a company named : http://m6d.com/ you can download and contribute to the project.

No comments:

Post a Comment

Thank you for Commenting Will reply soon ......

Featured Posts

#Linux Commands Unveiled: #date, #uname, #hostname, #hostid, #arch, #nproc

 #Linux Commands Unveiled: #date, #uname, #hostname, #hostid, #arch, #nproc Linux is an open-source operating system that is loved by millio...