EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login

Mapreduce Combiner

By Priya PedamkarPriya Pedamkar

Home » Data Science » Data Science Tutorials » Hadoop Tutorial » Mapreduce Combiner

Mapreduce Combiner

Introduction to Mapreduce Combiner

A Mapreduce Combiner is also called a semi-reducer, which is an optional class operating by taking in the inputs from the Mapper or Map class. And then it passes the key value paired output to the Reducer or Reduce class. The predominant function of a combiner is to sum up the output of map records with similar keys. The key value assembly output of the combiner will be dispatched over the network into the Reducer as an input task. Class of combiner is placed between class of map and class of reduce to decrease the data volume transferred between reduce and map. Usually, the map output task data is large and the transferred data to task for reduction is high.

How does MapReduce Combiner works?

This is a brief summary on the working of MapReduce Combiner:

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

working of MapReduce Combiner

The Mapreduce Combiner must implement a reducer interface method as it does not have a predefined interface. Each of the output of map key is operated by the combiner, Similar key value output should be processed as Reducer class cause the combiner operated on each key map output. The combiner will be able to produce sum up information even with a huge dataset because it takes the place of the original output data of the map. When a MapReduce job is run on a large dataset, a huge chunk of intermediate data is created by map class and the intermediate data is given to the reducer for later processing which will lead to huge network congestion.

MapReduce program outline is somehow like this without the combiner:

MapReduce program outline is somehow like this without the combiner

No combiner is used in above diagram. The input is halved into two map classes or mappers and keys are 9 generated in number from mappers. Now we take in the intermediate data to be 9 key value pairs and then the mapper sends directly this data to reduce class or While dispatching the data to the reducer, it takes in time some bandwidth network (bandwidth is the time which is taken to transfer data from one machine to another machine). Time has a significant increase while data transfer if the size of the data is too big.

In between reducer and mapper, we have a combiner hadoop then intermediate data is shuffled prior dispatching it to the reducer and generates the output as 4 key value pairs. With a combiner, it is just two. To know how, look below.

Popular Course in this category
Data Scientist Training (76 Courses, 60+ Projects)76 Online Courses | 60 Hands-on Projects | 632+ Hours | Verifiable Certificate of Completion | Lifetime Access
4.8 (8,909 ratings)
Course Price

View Course

Related Courses
Hadoop Training Program (20 Courses, 14+ Projects, 4 Quizzes)MapReduce Training (2 Courses, 4+ Projects)

MapReduce program outline is somehow like this with the combiner:

MapReduce program outline

Reducer is now processing only 4 key value pairs which are given as an input from 2 combiners. Reducer is getting executed only 4 ties to give the final result output, which boosts up the overall performance.

Let’s understand more with examples:

A basic code which is used to understand the virtues of MapReduce programming paradigm is through word count. This program has methods of map, combine and reduce which count the number of occurrences of occurrences of each word in a data file.

Data File

Setup a new project in Eclipse and add the above Hadoop dependency to pom.xml . This will say if the required access to Hadoop library core is there or not.

Word Count

– The block of code first checks if the number of arguments in demand are provided.
– Then it creates  new job and set the job name and the class of main
– Input and output pathways are set from the arguments
– Key values type classes are set which precedes the output format class. The classes have to be of the same type which are used in reduce and map for the output.
– The classes of map, combiner, reduce are set in the job.
– Job is executed and waits until its completion.

Implementation of MapReduce Components and MapReduce Combiner

Below are the implementation of Mapreduce componenets

1. Map Phase

Map phase splits the input data into two parts. They are : Keys and Values. Writable and comparable is the key in the processing stage where only in the processing stage, Value is writable. Let’s say a client gives input data to a Hadoop system, task tracker is assigned tasks by job tracker.Mini reducer which is commonly called a combiner, the reducer code places input as the combiner. Network bandwidth is high when a huge amount of data is required. Hash is the default partition used. Partition module plays a key role in Hadoop. More performance is given by reducing the pressure by petitioner on the reducer.

2. Processing in Intermediate

In the intermediate phase, the map input gets into the sort and shuffle phase. Hadoop nodes do not have replications where all the intermediate data is getting stored in a local file system. Round – robin data is used by Hadoop to write to local disk, the intermediate data. There are other shuffle and sort factors to be considered to reach the condition of writing the data to local disks.

3. Reducer Phase

Reducer takes in the data input that is sorted and shuffled. All the input data is going to be combined and similar key value pairs are to be written to the hdfs system. For searching and mapping purposes, a reducer is not always necessary. Setting some properties for enabling to set the number of reducers for each task. During job processing, the speculative execution plays a prominent role.

Conclusion

The above example elaborates the working of Map – Reduce and Mapreduce Combiner paradigm with Hadoop and understanding with the help of word count examples including all the steps in MapReduce. Then we understood the eclipse for purposes in testing and the execution of the Hadoop cluster with the use of HDFS for all the input files.

Recommended Articles

This is a guide to Mapreduce Combiner. Here we discuss the introduction to Mapreduce Combiner, how does it works, implementation of components and combiner. You can also go through our other related articles to learn more –

  1. How MapReduce Works
  2. MapReduce Interview Questions
  3. What is MapReduce?
  4. MapReduce Algorithms
  5. Guide to MapReduce Word Count

All in One Data Science Bundle (360+ Courses, 50+ projects)

360+ Online Courses

50+ projects

1500+ Hours

Verifiable Certificates

Lifetime Access

Learn More

0 Shares
Share
Tweet
Share
Primary Sidebar
Hadoop Tutorial
  • Advanced
    • What is Yarn in Hadoop
    • Hadoop Administrator
    • Hadoop Administrator Jobs
    • Hadoop Schedulers
    • Hadoop Streaming
    • Apache Hadoop Ecosystem
    • Distributed Cache in Hadoop
    • Hadoop Ecosystem Components
    • Hadoop YARN Architecture
    • HDFS Architecture
    • What is HDFS
    • HDFS Federation
    • Apache HBase
    • HBase Architecture
    • What is Hbase
    • HBase Shell Commands
    • What is MapReduce in Hadoop
    • Mapreduce Combiner
    • MapReduce Architecture
    • MapReduce Word Count
    • Impala Shell
    • HBase Create Table
  • Basics
    • What is Hadoop
    • Career in Hadoop
    • Advantages of Hadoop
    • Uses of Hadoop
    • Hadoop Versions
    • HADOOP Framework
    • Hadoop Architecture
    • Hadoop Components
    • Hadoop Database
    • Hadoop Ecosystem
    • Hadoop Tools
    • Install Hadoop
    • Is Hadoop Open Source
    • What is Hadoop Cluster
  • Commands
    • Hadoop Commands
    • Hadoop fs Commands
    • Hadoop FS Command List
    • HDFS Commands
    • HBase Commands
  • Interview Questions
    • Hadoop Admin Interview Questions
    • Hadoop Cluster Interview Questions
    • Hadoop developer interview Questions
    • HBase Interview Questions

Related Courses

Data Science Certification

Hadoop Certification

MapReduce Certification Course

Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

© 2020 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA Login

Forgot Password?

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you
Book Your One Instructor : One Learner Free Class

Let’s Get Started

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

Special Offer - Data Science Certification Learn More