[Paper Notes] Kafka: a Distributed Messaging System for Log Processing

Devin Z

3 min readOct 9, 2022

A log-based event streaming platform.

Log data includes:
- User activity events.
- Operational metrics.
- System metrics.
Messaging systems are not fit for log processing because:
- The overly strong delivery guarantees increase the complexity.
- Throughput is not a primary design constraint.
- Distributed support is weak, such as data partitioning.
- Immediate data consumption is assumed (e.g. small queues).
Drawbacks of existing log aggregators:
- Most of them are built for consuming the log data offline.
- Implementation details are unnecessarily exposed.
- The “push” model may make consumers flooded by messages.
Kafka combines the benefits of traditional log aggregators and messaging systems, and supports both (high-throughput) offline analytics and (low-latency) real-time applications.
Concepts:
- A topic is a stream of messages of a particular type.
- A topic is divided into multiple partitions.
- A partition is a logical log that is implemented as a set of segment files (~1GB), and is the unit of load balancing and in-order delivery guarantees.
- A broker is a server that stores partitions of different topics.
- A consumer group is one or more consumers jointly consuming a set of subscribed topics.
Kafka supports both models:
- Point-to-point (i.e. load balancing within a consumer group).
- Publish-subscribe (i.e. fan-out across consumer groups).
Each message is only addressed by the location offset in the partition, not by consecutive ids.
Each broker keeps in memory the starting offset of each segment file.
The benefits of the pull-based consumption model:
- Applications are able to consume data at its own rate.
- Data consumption can be rewinded when needed.
The consumption progress is maintained by the consumer itself, and each pull request contains the beginning offset.
Published messages are flushed to disk in batch, and each pull request can retrieve multiple messages up to a certain size (100s of KBs).
Do not explicitly cache messages at the middleware layer so that:
- Overhead of garbage collection is avoided.
- The file system page caching is fully exploited.
The Linux sendfile API is used to save two copies and one system call each time data is retrieved from broker storage and sent to a remote consumer.
The retention policy is made as a simple time-based SLA (typically 7 days).
Partitions of a topic are typically many more than consumers in each group so that load is truly balanced.
Instead of having a central “master” node, Kafka uses Zookeeper to coordinate consumers of the same group in a decentralized fashion.
Zookeeper paths:
- Broker registry (ephemeral): for each broker, the endpoint and the set of partitions it stores.
- Consumer registry (ephemeral): for each consumer, the consumer group to which it belongs and the set of topics it subscribes to.
- Ownership registry (ephemeral): for each consumer group, the consumer that currently consumes a specific partition.
- Offset registry (persistent): for each consumer group, the offset of the last consumed message in a specific partition.
A rebalance process is triggered in each consumer on addition or removal of brokers or consumers.
Kafka only guarantees at-least-once delivery, but applications can implement deduplication logic that is more cost-effective than two-phase commit.
Messages from a single partition are delivered in order, but messages from different partitions may come out of order.
CRC is used for message-level data corruption detection.
Future works (at that time):
- Add built-in replication of messages across multiple brokers.
- Support both synchronous and asynchronous replication to balance durability and performance.
- Enhance stream processing capability.

References:

Jay Kreps, Neha Narkhede, Jun Rao. Kafka: a Distributed Messaging System for Log Processing. NetDB’11, Jun 12, 2011, Athens, Greece.

Devin Z

Software System Design

View list

33 stories

[Paper Notes] Kafka: a Distributed Messaging System for Log Processing

References:

Software System Design

Written by Devin Z