Updated May 2, 2023
Difference Between MapReduce vs Yarn
Yarn stands for Yet Another Resource Negotiator, the new framework for managing resources (Memory and CPU). It helps us develop distributed applications of any kind; it provides us with necessary daemons and APIs. Another essential feature of YARN is that it handles and schedules resource requests from the application and helps the process to execute the request. YARN is a generic platform to run any distributed application, MapReduce version 2 is the distributed application that runs on top of YARN, Whereas MapReduce is the processing unit of the Hadoop component; it processes data in parallel in the distributed environment. So basically, map-reduce work on massive data component; it processes the data and stores it in HDFS to make retrieval easier than traditional storage.
Head-to-Head Comparison Between MapReduce vs Yarn (Infographics)
Below are the top 10 comparisons between MapReduce vs Yarn:
Key Differences Between MapReduce vs Yarn
Mentioned below are the key differences between MapReduce vs Yarn:
- Hadoop 1 has two components first one is HDFS (Hadoop Distributed File System), and the second is Map Reduce. Whereas Hadoop 2 it also has two components, HDFS and YARN/MRv2 (we usually call Yarn as MapReduce version 2).
- In Map Reduce, when Map-reduce stops working, then automatically, all his slave nodes will stop working. This is the one scenario where job execution can interrupt, called a single point of failure. YARN overcomes this issue because of its architecture; Yarn has the concept of an Active name node and a standby name node. When an active node stop working for some time, a passive node starts working as an active node and continues the execution.
- Map reduce has a single master and multiple slave architecture; if master-slave goes down, the entire slave will stop working. This is the single point of failure in HADOOP1, whereas HADOOP2, based on YARN architecture, has the concept of multiple masters and slaves; if one master goes down, another master will resume its process and continue the execution.
- As shown in the diagram below, the difference in both Ecosystems is HADOOP1 and HADOOP2. Component-wise, Yarn Resource Management interacts with Map-reduce and HDFS.
- So basically, Yarn is responsible for resource management, which means which job will be executed and which system gets decided by Yarn. In contrast, map-reduce is a programming framework responsible for executing a particular job, so basically, Map-reduce has two components, a mapper and a reducer, for the execution of a program.
- In Map, each data node runs individually, whereas, in Yarn, each node runs by a node manager.
- Map reduce uses a Job tracker to create and assign a task to a task tracker. Due to data, the resource management is not impressive, resulting in some of the data nodes being idle and of no use, whereas YARN has a Resource Manager for each cluster, and each data node runs a Node Manager. For each job, one slave node will act as the Application Master, monitoring resources/tasks.
MapReduce vs Yarn Comparison Table
Below is the comparison between MapReduce vs Yarn:
Basis for Comparison | Yarn | MapReduce |
Meaning | Yarn Stands for Yet Another Resource Negotiator. | Map Reduce is self-defined. |
Version | Introduce in Hadoop 2.0. | Introduce in Hadoop 1.0. |
Responsibility | Now Yarn is responsible for the Resource management part. | Earlier, MapReduce was responsible for Resource Management as well as data processing. |
Execution Model | The yarn execution model is more generic as compared to MapReduce. | Less Generic as compared to Yarn. |
Application Execution | Yarn can also execute those applications that don’t follow the MapReduce model. | MapReduce can execute its own model-based application. |
Architecture | Yarn is introduced in MR2 on top of the job and task tracks. In the place of job tracker and task tracker Application, the master comes into the picture. | In the earlier version of MR1, YARN was not there. In the place of YARN, a job tracker and task tracker was present, which help in the execution of application or jobs. |
Flexibility | Yarn is more isolated and scalable. | Less scalable as compared to Yarn. |
Daemons | The Yarn has a Name node, Data node, Secondary name node, Resource manager, and Node manager. | Map Reduce has a Name node, Data node, Secondary Name node, Job tracker, and Task tracker. |
Limitation | There is no concept of a single point of failure in YARN because it has multiple Masters, so if one got failed, another master will pick it up and resume the execution. | Single point of failure, low resource utilization(Max of 4200 clusters by YAHOO), and less scalability when compared to Yarn. |
Size | By default, the size of a data node in Yarn is 128MB. | By default, the size of a data node in MapReduce is 64MB. |
Conclusion
Hadoop 1, based on MapReduce, has several issues overcome in Hadoop 2 with Yarn. Like in Hadoop 1 job tracker is responsible for resource management, but Yarn has the concept of a resource manager and a node manager, which will take of resource management. Map reduce has a single point of failure, i.e., Job tracker; if the job tracker stops working, we have to restart our entire cluster and execute our job again from the Initial. In a real scenario, none of the organization don’t want to take this kind of risk, especially in a bank defense sector. Such an organization that works on streamlining data will not be ready to take this kind of risk. For the sake of a few minutes, they are going to lose their data and may have some critical business impact. So Yarn has a better result than MapReduce.
Recommended Articles
This has been a guide to MapReduce vs Yarn. Here we have discussed MapReduce vs Yarn head-to-head comparison, key differences, infographics, and a comparison table. You may also look at the following articles to learn more –