Institute of Electrical and Electronics Engineers
Apache Hadoop is one of the most prominent and early technologies for handling big data. Different scheduling algorithms within the framework of Apache Hadoop were developed in the last decade. In this paper, we attempt to provide a comprehensive overview over the different paradigms for scheduling in Apache Hadoop. The surveyed approaches fall under different categories, namely, Deadline prioritization, Resource prioritization, Job size prioritization, Hybrid approaches and recent trends for improvements upon default schedulers.
Permanent URL (for citation purposes)