Here are the some parameters which we can use to optimize and utilize hadoop and maprduce in a bit better way.
these parameter and their values are not fixed and the optimization and different parameter test must be done to optimize closely Hadoop according to the set up and type of machines in the cluster.
io.sort.factor-->64
io.sort.mb-->254
Mapred.reduce.parallel.copies
-->(number of machines*number of mappers)/2 (generally)
mapred.tasktracker.(map|reduce).task.maximum
-->map less than cores(if8cores then 5-10) (generally)
-->reduce(less than mapper, 4-6-8) (generally)
-->number of map+reduce>number of cores (generally)
mapred.(map|reduce).task.speculative.execution-->true
-->Same task to be executed on more than one machine in parallel
Tasktracker.http.threads
-->HTTP threads should be enough to support parallel copies in sort and snuffle phase.
We can use LZO compressed.
Use combiner
Impliment a custom partioner
Input split
~64-128-256(size of each file or block)
No comments:
Post a Comment
Thank you for Commenting Will reply soon ......