Infinite Programming Tips: Some optimization trics for hadoop & mapreduce

Wednesday, January 29, 2014

Some optimization trics for hadoop & mapreduce

Here are the some parameters which we can use to optimize and utilize hadoop and maprduce in a bit better way.

these parameter and their values are not fixed and the optimization and different parameter test must be done to optimize closely Hadoop according to the set up and type of machines in the cluster.

io.sort.factor-->64

io.sort.mb-->254

Mapred.reduce.parallel.copies

-->(number of machines*number of mappers)/2 (generally)

mapred.tasktracker.(map|reduce).task.maximum

-->map less than cores(if8cores then 5-10) (generally)

-->reduce(less than mapper, 4-6-8) (generally)

-->number of map+reduce>number of cores (generally)

mapred.(map|reduce).task.speculative.execution-->true

-->Same task to be executed on more than one machine in parallel

Tasktracker.http.threads

-->HTTP threads should be enough to support parallel copies in sort and snuffle phase.

We can use LZO compressed.

Use combiner

Impliment a custom partioner

Input split

~64-128-256(size of each file or block)

Infinite Programming Tips

Wednesday, January 29, 2014

Some optimization trics for hadoop & mapreduce

No comments:

Post a Comment

Featured Posts

✨ Tired of the same old Windows Start Menu and Taskbar?