Infinite Programming Tips: Different components of hadoop system

Sunday, April 15, 2012

· Hadoop Common – The common utilities that support the other Hadoop sub – projects

· HDFS – A distributed file system that provides high throughput access to application data

· MapReduce – A software framework for distributed processing of large data sets on compute clusters.

· Avro – A data serialization system.

· Chukwa – A data collection system for managing large distributed systems

· Hbase – A scalable, distributed database that supports structured data storage for large tables

· Hive – A data warehouse infrastructure that provides data summarization and ad hoc querying

· Mahout – A scalable machine learning and data mining library

· Pig – A high – level data-flow language and execution framework for parallel computation

· ZooKeeper – A high – performance coordination service for distributed applications