EDUCBA

EDUCBA

MENUMENU
  • Blog
  • Free Courses
  • All Courses
  • All in One Bundle
  • Login
Home Data Science Data Science Tutorials Kafka Tutorial Kafka Architecture

Kafka Architecture

Priya Pedamkar
Article byPriya Pedamkar

Updated May 29, 2023

Kafka-Architecture

Introduction to Kafka Architecture

The following article provides an outline for Kafka Architecture. LinkedIn developed Kafka and donated it to the Apache Software Foundation. It acts as a publish-subscribe messaging system. But it is not like a normal messaging system; it helps build real-time data pipelines, and streaming apps having the capability to deal with huge volumes of data.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

The message’s primary goal is to send a message from the sender to the receiver and vice versa. The producer is the term used in Kafka to refer to the sender, while the consumer is the term used to refer to the receiver. But it is important to note that the producer does not send a message directly to the consumer. The producer pushes the message to Kafka Server or Broker on a Kafka Topic. The interested consumer subscribes to the required topic and starts consuming messages from Kafka Server.

Kafka Architecture

Kafka Architect

Kafka has a straightforward but powerful architecture. The producer pushes the message to Kafka Broker on a given topic in Kafka. The Kafka cluster contains one or more brokers that store the message from Kafka Producer to a Kafka topic. After that, consumers or groups subscribe to the Kafka topic and receive a message from the Kafka broker. As Kafka is a distributed system with multiple components, the zookeeper helps manage and coordinate.

Kafka Components

Below is the list of components available.

Let us go through the functionality of all the components one by one:

Kafka Components

 

1. Kafka Producer

The producer acts as a sender. It is responsible for sending a message or data. It does not send messages directly to consumers. It pushes messages to Kafka Server or Broker. The messages or data are stored in the Kafka Server or Broker. Multiple producers can send a message to the same Kafka topic or different Kafka topics.

2. Kafka Consumer

The consumer acts as the receiver. It is responsible for receiving or consuming a message. But it does not consume or receive a message directly from Kafka Producer. Kafka Producer pushes messages to the Kafka server or broker. The consumer can request a message from the Kafka broker. If the Kafka Consumer has enough permissions, it gets a message from the Kafka Broker.

3. Kafka Broker

The Kafka Broker is nothing but just a server. Simply put, A broker is an intermediate entity exchanging messages between a producer and a consumer. For Kafka Producer, it acts as a receiver; for Kafka Consumer, it acts as a sender. In the Kafka cluster, there can be one or more Kafka brokers.

4. Kafka Cluster

Now first understand, what is a cluster? A cluster is a common terminology in the distributed computing system. It is nothing but just a group of computers that are working for a common purpose. Kafka is also a distributed system, so it has a cluster with a group of servers called brokers.

There can be one or more brokers in the Kafka cluster.

  • Single Broker Cluster: The Kafka cluster having only one broker is called Single Broker Cluster.
  • Multi-Broker Cluster: The Kafka cluster having two or more brokers is called Multi-Broker Cluster.

 5. Kafka Topic

It is one of the most important components of Kafka. Kafka Topic is a unique name given to a data stream or message stream.

Now let us understand the need for this. As you know, Kafka Producer sends a message stream to the broker, and Kafka Consumer receives a message stream from that broker.

Suppose a consumer wants to consume a message from broker, but the question is, from which message stream? There can be multiple different message streams on the same broker, coming from different Kafka producers. Here comes the concept of topic, which is a unique identity of the message stream. The producer sends a message to a unique name called the topic for that message stream. Multiple producers can also send to the same topic. If any consumer wants to consume the message, it subscribes to the topic in Kafka Broker. The consumer will receive all the messages coming to that topic.

6. Kafka Partitions

Now we know that the producer sends data to the broker with a unique identity called a topic, and the broker stores the message with that topic. Now, you have a huge volume of data, and it is very challenging for the broker to store data on a single machine. As we already know, Kafka is a distributed system. In such a scenario, we can break the Kafka topic into partitions and distribute the partitions on a different machine to store. Based on the use case and data volume, we can decide the number of partitions for a topic during Kafka topic creation.

7. Offsets

In Kafka, a sequence number is assigned to each message in each partition of a Kafka topic. This sequence number is called Offset. As soon as any message arrives in a partition, a number is assigned to that message. For a given topic, different partitions have different offsets. The offset number is always local to the topic partition. There is not any offset that is global to the topic or each partition of the topic. Initially, the offset pointer points to the first message. As soon as the consumer reads that message, the offset pointer moves to the next message and so on in the sequence.

So, combining the topic name, partition number, and offset number is a unique identity for any message. In other words, you can find any message based on the below three components.

[Topic Name] -> [Partition Number] -> [Offset Number]

8. Kafka Consumer Group

As the name suggests, the Kafka Consumer group is a group of consumers. Multiple consumers combined to share the workload. It is just like dividing a piece of a large task among multiple individuals. There can be multiple consumer groups subscribing to the same or different topics. Two or more consumers of the same consumer group do not receive the common message. They always receive a different message because the offset pointer moves to the next number once the message is consumed by any of the consumers in that group.

9. Zookeeper

Zookeeper is a prerequisite for Kafka. Kafka is a distributed system that uses Zookeeper for coordination and to track the status of Kafka cluster nodes. It also keeps track of Kafka topics, partitions, offsets, etc.

Conclusion

As we have seen that Kafka is a compelling distributed streaming platform. It acts as a publish-subscribe messaging system. But it is not like normal messaging systems. It helps industries to build their big data streaming pipelines and streaming applications. We have also seen that the producer does not send a message directly to the consumer. The broker is a centralized component that helps exchange messages between a producer and a consumer.

Recommended Articles

This has been a guide to Kafka Architecture. Here we discuss the introduction, architecture, and components of Kafka. You may also have a look at the following articles to learn more –

  1. Kafka Applications
  2. Kafka vs Spark
  3. Big Data Architecture
  4. How to Install Kafka?
All in One Excel VBA Bundle
500+ Hours of HD Videos
15 Learning Paths
120+ Courses
Verifiable Certificate of Completion
Lifetime Access
Financial Analyst Masters Training Program
2000+ Hours of HD Videos
43 Learning Paths
550+ Courses
Verifiable Certificate of Completion
Lifetime Access
All in One Data Science Bundle
2000+ Hour of HD Videos
80 Learning Paths
400+ Courses
Verifiable Certificate of Completion
Lifetime Access
All in One Software Development Bundle
5000+ Hours of HD Videos
149 Learning Paths
1050+ Courses
Verifiable Certificate of Completion
Lifetime Access
Primary Sidebar
All in One Data Science Bundle2000+ Hour of HD Videos | 80 Learning Paths | 400+ Courses | Verifiable Certificate of Completion | Lifetime Access
Financial Analyst Masters Training Program2000+ Hours of HD Videos | 43 Learning Paths | 550+ Courses | Verifiable Certificate of Completion | Lifetime Access
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2023 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

By continuing above step, you agree to our Terms of Use and Privacy Policy.
*Please provide your correct email id. Login details for this Free course will be emailed to you

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more