Introduction to Kafka Architecture
The following article provides an outline for Kafka Architecture. Kafka is an open-source distributed streaming platform. It was developed by LinkedIn and donated to the Apache Software Foundation. It acts as a publish-subscribe messaging system. But it is not like a normal messaging system it helps in building real-time data pipelines and streaming apps having the capability to deal with huge volumes of data.
The primary goal of any messaging system is to send a message from the sender to the receiver and vice versa. In Kafka sender is called the Producer and the receiver is called Consumer. But It is important to note that the producer does not send a message directly to Consumer. The Producer pushes the message to Kafka Server or Broker on a Kafka Topic. The interested consumer subscribes to the required topic and starts consuming messages from Kafka Server.
Kafka has a very simple but powerful architecture. In Kafka, the producer pushes the message to Kafka Broker on a given topic. The Kafka cluster contains one or more brokers which store the message received from Kafka Producer to a Kafka topic. After that, a consumer or group of consumers subscribe to the Kafka topic and start receiving a message from Kafka broker. As Kafka is a distributed system and having multiple components, the zookeeper helps in its management and coordination. We will go through each of the components of the Kafka one by one in the below section.
Below is the list of components available. Let us go through the functionality of all the components one by one:
1. Kafka Producer
The producer acts as a sender. It is responsible for sending a message or data. It does not send messages directly to consumers. It pushes messages to Kafka Server or Broker. The messages or data are stored in the Kafka Server or Broker. There can be multiple producers which can send a message to the same Kafka topic or different Kafka topics.
2. Kafka Consumer
The consumer acts as Receiver. It is responsible for receiving or consuming a message. But It does not consume or receive a message directly from Kafka Producer. Kafka Producer pushes messages to Kafka Server or broker. The consumer can request a message from the Kafka broker. If the Kafka Consumer will have enough permissions, then it gets a message from the Kafka Broker.
3. Kafka Broker
The Kafka Broker is nothing but just a server. In simple word, A broker is just an intermediate entity who exchange message between a producer and a consumer. For Kafka Producer, it acts as a receiver and for Kafka Consumer, it acts as a sender. In the Kafka cluster, there can be one or more Kafka brokers.
4. Kafka Cluster
Now first understand, what is a cluster? A cluster is a common terminology in the distributed computing system. It is nothing but just a group of computers which are working for a common purpose. Kafka is also a distributed system, so it also has a cluster having a group of servers called brokers.
There can be one or more brokers in the Kafka cluster.
- Single Broker Cluster: The Kafka cluster having only one broker is called Single Broker Cluster.
- Multi-Broker Cluster: The Kafka cluster having two or more brokers are called Multi-Broker Cluster.
5. Kafka Topic
It is one of the most important components of Kafka. Kafka Topic is a unique name given to a data stream or message stream.
Now let us understand the need for this. As you know Kafka Producer sends a message stream to the Broker and Kafka Consumer receives a message stream from that Broker.
Suppose a consumer wants to consume a message from Broker, but the question is, from which message stream? There can be multiple different message streams on the same Broker, coming from different Kafka producers. Here comes the concept of Topic which is a unique identity of the message stream. The Producer sends a message to a unique name which is called the topic for that message stream. Multiple producers can also send to the same topic. If any consumer wants to consume the message, then it subscribes to the topic present in Kafka Broker. Now all the messages coming to that topic will be delivered to the consumer.
6. Kafka Partitions
Now we know that the producer sends data to the broker with a unique identity called topic and the broker stores the message with that topic. Now consider you have a huge volume of data and it is very challenging for the broker to store data on a single machine. As we already know, Kafka is a distributed system. In such a scenario we can break the Kafka topic in partitions and distribute the partitions on a different machine to store. Based on the use case and data volume, we can decide the number of partitions for a topic during Kafka topic creation.
In Kafka, a sequence number is assigned to each message in each partition of a Kafka topic. This sequence number is called Offset. As soon as any message arrives in a partition a number is assigned to that message. For a given topic, different partitions have different offsets. The offset number is always local to the topic partition. There is not any offset that is global to the topic or each partition of the topic. Initially, the offset pointer points to the first message. As soon as the consumer reads that message, the offset pointer moves to the next message and so on in the sequence.
So, for any message, the combination of topic name, partition number and offset number is a unique identity. In other words, you can find any message based on the below three components.[Topic Name] -> [Partition Number] -> [Offset Number]
8. Kafka Consumer Group
As the name suggests, the Kafka Consumer group is a group of consumers. Multiple consumers combined to share the workload. It is just like dividing a piece of large task among multiple individuals. There can be multiple consumer groups subscribing to the same or different topics. Two or more consumers belonging to the same consumer group do not receive the common message. They always receive a different message because the offset pointer moves to the next number once the message consumed by any of the consumers in that consumer group.
Zookeeper is a prerequisite for Kafka. Kafka is a distributed system and it uses Zookeeper for coordination and to track the status of Kafka cluster nodes. It also keeps track of Kafka topics, partitions, offsets, etc.
As we have seen that Kafka is a very powerful distributed streaming platform. It acts as a publish-subscribe messaging system. But It is not like normal messaging systems. It helps industries to build their big data streaming pipelines and streaming applications. We have also seen that the producer does not send a message directly to the consumer. The broker acts as a centralized component which helps in exchanging messages between a producer and a consumer.
This has been a guide to Kafka Architecture. Here we also discuss the introduction, architecture, and components of Kafka. You may also have a look at the following articles to learn more –