Tuesday, April 10, 2012

Apache Whirr : Nice tool to configure hadoop cluster automatically


Whirr provides:
  • A cloud-neutral way to run services. You don't have to worry about the idiosyncrasies of each provider.
  • A common service API. The details of provisioning are particular to the service.
  • Smart defaults for services. You can get a properly configured system running quickly, while still being able to override settings as needed.
Download a release of Whirr from a nearby mirror

Check this Whirr in 5 Munutes

Pre-requisites

  • Java 6
  • An account with a cloud provider, such as Amazon EC2, or Rackspace Cloud Servers
  • An SSH client



Configure a Hadoop cluster

First, create a properties file to define the cluster. The name doesn't matter, but here we will assume it is called hadoop.propertiesand located in your home directory. This file defines a cluster with a single machine for the namenode and jobtracker, and a further machine for a datanode and tasktracker. You can see how to launch other services by consulting the sample configurations in the recipesdirectory of the distribution.
whirr.cluster-name=myhadoopcluster 
whirr.instance-templates=1 hadoop-jobtracker+hadoop-namenode,1 hadoop-datanode+hadoop-tasktracker 
whirr.provider=aws-ec2
whirr.identity=${env:AWS_ACCESS_KEY_ID} 
whirr.credential=${env:AWS_SECRET_ACCESS_KEY}
whirr.private-key-file=${sys:user.home}/.ssh/id_rsa
whirr.public-key-file=${sys:user.home}/.ssh/id_rsa.pub
Note that we haven't specified a particular cloud image, since Whirr provides a default for each provider which should work well enough. However, for larger clusters you will likely use larger hardware sizes or particular images. See the recipesfiles and the Configuration Guide for details.
In this configuration file the cloud identity and credential are read from environment variables - you can equally well put them in the configuration file if you wish.
The private-key-file and public-key-file properties specify an SSH keypair. You can generate a keypair with:
% ssh-keygen -t rsa -P ''
You should use only RSA SSH keys, since DSA keys are not accepted yet.
Note: the keypair specified by these properties is not the same as the AWS keypair generated with the ec2-add-keypair command or the AWS Management Console (since these don't place bothof the keys on your local machine). The PEM-encoded X.509 Certificate and Private Key (e.g. pk-XXXXXX.pem) cannot be used as a keypair either.

Launch a Hadoop cluster

Run the following command to launch a cluster:
% bin/whirr launch-cluster --config hadoop.properties
Messages will be logged to the console as the cluster starts. You can see debug-level logging in a file named whirr.login the directory you ran the whirrcommand from.
A message will be printed out when the cluster has started, with a URL that you can use to access the web UI.

More : HERE 


No comments:

Post a Comment

Thank you for Commenting Will reply soon ......

Featured Posts

#Linux Commands Unveiled: #date, #uname, #hostname, #hostid, #arch, #nproc

 #Linux Commands Unveiled: #date, #uname, #hostname, #hostid, #arch, #nproc Linux is an open-source operating system that is loved by millio...