Thursday, January 9, 2014

Hadoop small file and block allocation

One of the misconceptions about Hadoop is that smaller files (smaller than the block size 64 MB default) will still use the whole block on the filesystem and there will be space westage on hdfs. This is not the true in reality. The smaller files occupy exactly as much disk space as they require(1 mb file at local disk will somwhat same space on hdfs). But this does not mean that having many small files will use HDFS efficiently. Regardless of the block size, its metadata at namenode occupies exactly the same amount of memory. As a result, a large number of small HDFS files (smaller than the block size) will use a lot of the NameNode’s memory, thus negatively impacting HDFS scalability and performance.

So HDFS blocks are not a storage allocation unit, but a replication unit.

No comments:

Post a Comment

Thank you for Commenting Will reply soon ......

Featured Posts

#Linux Commands Unveiled: #date, #uname, #hostname, #hostid, #arch, #nproc

 #Linux Commands Unveiled: #date, #uname, #hostname, #hostid, #arch, #nproc Linux is an open-source operating system that is loved by millio...