Wednesday, January 29, 2014

Some optimization trics for hadoop & mapreduce

Here are the some parameters which we can use to optimize and utilize hadoop and maprduce in a bit better way.

these parameter and their values are not fixed and the optimization and different parameter test must be done to optimize closely Hadoop according to the set up and type of machines in the cluster.

io.sort.factor-->64
io.sort.mb-->254
Mapred.reduce.parallel.copies
-->(number of machines*number of mappers)/2 (generally)
mapred.tasktracker.(map|reduce).task.maximum
-->map less than cores(if8cores then 5-10)  (generally)
-->reduce(less than mapper, 4-6-8) (generally)
-->number of map+reduce>number of cores (generally)
mapred.(map|reduce).task.speculative.execution-->true
-->Same task to be executed on more than one machine in parallel
Tasktracker.http.threads
-->HTTP threads should be enough to support parallel copies in sort and snuffle phase.
We can use LZO compressed.
Use combiner
Impliment a custom partioner 
Input split 
~64-128-256(size of each file or block)

No comments:

Post a Comment

Thank you for Commenting Will reply soon ......

Featured Posts

#Linux Commands Unveiled: #date, #uname, #hostname, #hostid, #arch, #nproc

 #Linux Commands Unveiled: #date, #uname, #hostname, #hostid, #arch, #nproc Linux is an open-source operating system that is loved by millio...