Thursday, February 28, 2013

Rebalancing your hdfs, hadoop cluster

Are you stuck in a scenario where replication factor is not correct, or not like what you expect it to be?

You can go for re balancing of hdfs, what you can do is:

Suppose you have replication factor as 2 but some files are showing replication factor as 2 or 3 or 1 and you want your replication factor to be 2.

Just increase the replication factor and then decrease for your hdfs root file system recursively.

suppose you need to increase the replication factor 2 and you are having replication factor as 1, and hadoop is not automatically replicating these blocks than you can increase the replication factor to 3 and then again decrease the replication factor to 2

Use following command to increase and then decrease the replication factor.

Increasing:

hadoop dfs -setrep -R 3 /      -----> this will increase the replication factors of all the files to 3 and replicate automatically once you have enough replication you can decrease the replication factor to stabalise the cluster as you needed

Decreaseing:

hadoop dfs -setrep -R 2 /     ------>  this will make the replication factor to 2 recursively for your hdfs root partition.

this method you can apply for a single file or a specific folder too.

if you have over and underutilized nodes in the hadoop cluster you can run balancer which is in bin dir to make your cluster balanced.

NOTE : you should have enough space on your dfs for replication, because as you are increasing the replication factor, it will need space.

No comments:

Post a Comment

Thank you for Commenting Will reply soon ......

Featured Posts

#Linux Commands Unveiled: #date, #uname, #hostname, #hostid, #arch, #nproc

 #Linux Commands Unveiled: #date, #uname, #hostname, #hostid, #arch, #nproc Linux is an open-source operating system that is loved by millio...