One
of the misconceptions about Hadoop is that smaller files (smaller than the
block size 64 MB default) will still use the whole block on the filesystem and
there will be space westage on hdfs. This is not the true in reality. The
smaller files occupy exactly as much disk space as they require(1 mb file at
local disk will somwhat same space on hdfs). But this does not mean that having
many small files will use HDFS efficiently. Regardless of the block size, its
metadata at namenode occupies exactly the same amount of memory. As a result, a
large number of small HDFS files (smaller than the block size) will use a lot
of the NameNode’s memory, thus negatively impacting HDFS
scalability and performance.
So HDFS blocks are not a storage allocation unit, but a replication unit.
No comments:
Post a Comment
Thank you for Commenting Will reply soon ......