Introduction to Kafka MirrorMaker
Apache Kafka has a simple tool in order to replicate data from two data centers. It is called MirrorMaker and at its base, it is a series of consumers (named streams) who are all part of the same consumer group and read data from the range of topics you have selected to replicate. It is more than getting tied together by a Kafka consumer and producer. Information will be interpreted from topics in the origin cluster and written in the destination cluster to a topic with the same name. Let us explore more about Kafka MirrorMaker by understanding its architecture
Architecture of Kafka MirrorMaker
- The data duplication mechanism between Kafka clusters is named “mirroring”. Mirroring function is commonly used for keeping a separate copy of a Kafka cluster in another data center. Kafka’s MirrorMaker module reads the data from topics in one or more Kafka clusters source and writes the relevant topics to a Kafka cluster destination (using the same topic names).
- The source and target clusters are entirely independent so that they can have specific partition numbers and different offsets.
- The below figure shows an example of an architecture of Kafka MirrorMaker which aggregates messages from two different clusters into an aggregate cluster and then copying that aggregated cluster to another datacentre.
- Particularly while operating with multiple data centers, it is almost always important to copy messages between them. Therefore online applications can have access to user activity on both domains. For instance, if a user changes the personal information in their account, the update will need to be noticeable irrespective of the data center that shows the search results.
- Kafka clusters can only replicate within a single cluster and not between different clusters. Kafka does not allow replication within multiple clusters. Every MirrorMaker operation has one producer. The process is pretty easy. MirrorMaker runs a thread for each consumer.
- Each user collects events from the topics and partitions allocated to him on the source cluster and uses the mutual producer to send events to the target cluster.
- The consumers must tell the producer every 60 seconds (by default) to submit all the events to Kafka and wait before Kafka accepts those events. Consumers then contact the Kafka cluster source to assign the offsets for all these events. It means no data loss (messages are acknowledged by Kafka until offsets are committed to the source), and if the MirrorMaker mechanism fails, there will be no more than 60 seconds worth of duplicates.
- It is essentially a consumer-producer that the consumer goes to cluster A and connects to the topic you decide and it gets the data from there and generates all the message in cluster B all the message that received by a topic in cluster A will be available in cluster B in the same topic.
- Figure 1 shows an instance of architecture using MirrorMaker, which aggregates messages from two different clusters into an aggregate cluster and then copying that aggregated cluster to another datacentre.
Benefits of Kafka MirrorMaker
Below are the benefits of Kafka MirrorMaker:
1. Global and Central Clusters
The organization has one or more datacentres in different geographic areas, cities or continents in some instances. Many systems can only operate by connecting with the local cluster, but some applications need data from multiple datacentres (otherwise you would not be looking at approaches for cross-data center replication). There are many situations where this is a prerequisite, but the classic example is a business that adjusts pricing based on supply and demand. That organization can have a datacenter in each area where it has a location, gather local supply and demand statistics, then adjust prices appropriately. All of this information is then replicated to a central cluster where business analysts will report on their sales across the group.
2. Redundancy (DR)
Programs operate on a single Kafka cluster and do not need data from other sites, but you are worried about the whole cluster’s capacity becoming unavailable for some reason. Then you would like to have a second Kafka cluster with all the data in the first cluster, so you can direct your applications to in case of an emergency.
3. Cloud Migrations
Many companies are running their business these days in both an on-site datacenter and a cloud provider. Sometimes, cloud platform programs operate on multiple regions for flexibility, and different cloud providers are sometimes used. In these situations, at least one Kafka cluster is often present in each on-premise datacenter and in each cloud area. Those Kafka clusters are used by applications in each datacentre and region to transfer data efficiently between the datacenters. For instance, if a new application is introduced in the cloud but needs certain data that is modified by applications running in the on-site datacenter and stored in an on-site database, you can use Kafka Connect to catch changes in the database in the local Kafka cluster and then replicate those changes in the Kafka cluster where the new application is located. It helps to control the effects of cross-data traffic and increases governance and security of the traffic.
4. Support Data and Schema Replication
Kafka MirrorMaker does support data replication by real-time streaming data between Kafka clusters and data centers. It integrates with Confluent Schema Registry for multi-dc data quality and governance. It supports connection replication by managing data integration across multiple data centers.
5. Ease of Topic Selection
It offers the advantage of flexible topic selection, by selecting topics with white-lists, black-lists, and regular expression
So far we have seen what is Kafka mirror maker, what is its architecture and how Kafka mirroring works. We have also seen its use cases or benefits of why we should use Kafka MirrorMaker. In short, it just aggregates messages from two or more local clusters into an aggregate cluster and then copies that cluster to other datacentres for redundancy, increasing throughput and fault tolerance.
This is a guide to Kafka MirrorMaker. Here we discuss what is its architecture and how Kafka mirroring works along with its Benefits. You can also go through our other suggested articles to learn more –