Recommended

Twitter

The Next Generation of Apache Hadoop MapReduce

Arun has a great article on Yahoo covering the upcoming changes to the Hadoop architecture. The post provides some background on Hadoop, and discusses the changes to resource management and job scheduling/monitoring at a high level.

The Apache Hadoop MapReduce framework has hit a scalability limit around 4,000 machines. We are developing the next generation of Apache Hadoop MapReduce that factors the framework into a generic resource scheduler and a per-job, user-defined component that manages the application execution. Since downtime is more expensive at scale high-availability is built-in from the beginning; as are security and multi-tenancy to support many users on the larger clusters. The new architecture will also increase innovation, agility and hardware utilization.