Apache Hadoop is a open-source software framework for distributed storage and distributed processing of Big Data. It is the most widely used Big Data solution for data intensive distributed applications.
Apache Hadoop was created in 2005 by Doug Cutting and Mike Cafarella, and is now a project of the Apache Software Foundation. It was originally developed to support the large-scale web search program of the same name. The Hadoop framework is based on Java and runs on most computing platforms.
Apache Hadoop is composed of four components: Hadoop Common, Hadoop Distributed File System (HDFS), Hadoop MapReduce, and YARN.
• Hadoop Common: This is a collection of libraries and utilities that support the other Hadoop modules.
• HDFS: This is a distributed filesystem designed to store extremely large datasets on commodity hardware.
• Hadoop MapReduce: This is a software framework that allows for the processing and analysis of huge amounts of data in a distributed manner.
• YARN: This is a resource management system that coordinates the scheduling and execution of applications on a Hadoop cluster.
Apache Hadoop is used by companies of all sizes for various types of data-intensive tasks, such as recommendation engines and analytics. It is the de facto standard for high-volume and computationally intensive data processing tasks.
Apache Hadoop has spawned several related projects such as Apache Hive, Apache Pig, Apache HBase, and Apache Spark, which add even more capabilities to the Hadoop framework. These related projects, combined with the increasing popularity of cloud computing, have made Apache Hadoop one of the most popular and widely used Big Data solutions available.