EDUCBA

EDUCBA

MENUMENU
  • Blog
  • Free Courses
  • All Courses
  • All in One Bundle
  • Login
Home Data Science Data Science Tutorials Kafka Tutorial Kafka MirrorMaker

Kafka MirrorMaker

Priya Pedamkar
Article byPriya Pedamkar

Updated May 30, 2023

Kafka MirrorMaker

Introduction to Kafka MirrorMaker

Apache Kafka has a simple tool to replicate data from two data centers. It is called  MirrorMaker, and at its base, it is a series of consumers (named streams) who are all part of the same consumer group and read data from the range of topics you have selected to replicate. It is more than getting tied together by a Kafka consumer and producer. Information will be interpreted from topics in the origin cluster and written in the destination cluster to a topic with the same name. Let us explore more about Kafka MirrorMaker by understanding its architecture.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

Architecture of Kafka MirrorMaker

  • The data duplication mechanism between Kafka clusters is named “mirroring.” The mirroring function is commonly used for keeping a separate copy of a Kafka cluster in another data center. Kafka’s MirrorMaker module reads the data from topics in one or more Kafka clusters source and writes the relevant topics to a Kafka cluster destination (using the same topic names).
  • The source and target clusters are independent of specific partition numbers and different offsets.
  • The below figure shows an example of the architecture of Kafka MirrorMaker, which aggregates messages from two different clusters into an aggregate cluster and then copies that aggregated cluster to another datacentre. 

Kafka MirrorMaker Architecture

  • While operating with multiple data centers, copying messages between them is almost always essential. Therefore online applications can have access to user activity in both domains. For instance, if a user changes the personal information in their account, the update will need to be noticeable irrespective of the data center that shows the search results.
  • Replicating Kafka clusters is limited to within a single cluster and cannot be performed between different clusters. Kafka does not allow replication within multiple clusters. Every MirrorMaker operation has one producer. The process is pretty easy. MirrorMaker runs a thread for each consumer.
  • Each user collects events from the topics and partitions allocated to him on the source cluster and uses the mutual producer to send events to the target cluster.
  • The consumers must tell the producer every 60 seconds (by default) to submit all the events to Kafka and wait before Kafka accepts them. Consumers then contact the Kafka cluster source to assign the offsets for all these events. It means no data loss (Kafka acknowledges messages until offsets are committed to the source). If the MirrorMaker mechanism fails, there will be no more than 60 seconds of duplicates.
  • It is essentially a consumer-producer that the consumer goes to cluster A and connects to the topic you decide, and it gets the data from there and generates all the messages in cluster B; all the messages received by a topic in cluster A will be available in cluster B in the same topic.
  • Figure 1 shows an instance of architecture using MirrorMaker, which aggregates messages from two different clusters into an aggregate cluster and then copies that aggregated cluster to another data center.

Benefits of Kafka MirrorMaker

Below are the benefits of Kafka MirrorMaker:

1. Global and Central Clusters

The organization sometimes has one or more data centers in different geographic areas, cities, or continents. Many systems can only operate by connecting with the local cluster. Still, some applications need data from multiple data centers (otherwise, you would not be looking at approaches for cross-data center replication). There are many situations where this is a prerequisite, but the classic example is a business that adjusts pricing based on supply and demand. That organization can have a data center in each area with a location, gather local supply and demand statistics, then adjust prices appropriately. After replicating this information to a central cluster, business analysts will report on their sales across the group.

2. Redundancy (DR)

Even though programs operate on a single Kafka cluster and do not require data from other sites, there is a concern about the whole cluster’s capacity becoming unavailable for some reason. Then you would like to have a second Kafka cluster with all the data in the first cluster to direct your applications in case of an emergency.

3. Cloud Migrations

Many companies run their business in an on-site data center and a cloud provider. Sometimes, cloud platform programs choose to operate on multiple regions for flexibility and may utilize different cloud providers. In these situations, at least one Kafka cluster is often present in each on-premise data center and each cloud area. Applications in each data center and region use those Kafka clusters to transfer data efficiently between the data centers.

For instance, if a new application is introduced in the cloud but needs specific data that is modified by applications running in the on-site data center and stored in an on-site database, you can use Kafka Connect to catch changes in the database in the local Kafka cluster and then replicate those changes in the Kafka cluster where the new application is located. It helps to control the effects of cross-data traffic and increases the governance and security of the traffic.

4. Support Data and Schema Replication

Kafka MirrorMaker does support data replication by real-time streaming data between Kafka clusters and data centers. It integrates with Confluent Schema Registry for multi-dc data quality and governance. It supports connection replication by managing data integration across multiple data centers.

5. Ease of Topic Selection

It offers the advantage of flexible topic selection by selecting topics with white lists, black lists, and regular expression

Conclusion

So far, we have seen what Kafka mirror maker is and its architecture. We have also seen its use cases or benefits of why we should use Kafka MirrorMaker. In short, it aggregates messages from two or more local clusters into an aggregate cluster. It then copies that cluster to other datacentres for redundancy, increasing throughput and fault tolerance.

Recommended Articles

This is a guide to Kafka MirrorMaker. Here we discuss the introduction, architecture of Kafka MirrorMaker, and benefits. You can also go through our other suggested articles to learn more –

  1. Kafka Alternatives | Top 5
  2. Introduction to Kafka Tools
  3. 10 Best Kafka Interview Questions
  4. Kafka Replication | How to Work?
All in One Excel VBA Bundle
500+ Hours of HD Videos
15 Learning Paths
120+ Courses
Verifiable Certificate of Completion
Lifetime Access
Financial Analyst Masters Training Program
2000+ Hours of HD Videos
43 Learning Paths
550+ Courses
Verifiable Certificate of Completion
Lifetime Access
All in One Data Science Bundle
2000+ Hour of HD Videos
80 Learning Paths
400+ Courses
Verifiable Certificate of Completion
Lifetime Access
All in One Software Development Bundle
5000+ Hours of HD Videos
149 Learning Paths
1050+ Courses
Verifiable Certificate of Completion
Lifetime Access
Primary Sidebar
All in One Data Science Bundle2000+ Hour of HD Videos | 80 Learning Paths | 400+ Courses | Verifiable Certificate of Completion | Lifetime Access
Financial Analyst Masters Training Program2000+ Hours of HD Videos | 43 Learning Paths | 550+ Courses | Verifiable Certificate of Completion | Lifetime Access
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2023 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

By continuing above step, you agree to our Terms of Use and Privacy Policy.
*Please provide your correct email id. Login details for this Free course will be emailed to you

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more