[Paper Notes] Kafka: a Distributed Messaging System for Log Processing

Devin Z
3 min readOct 9, 2022

--

A log-based event streaming platform.

Boston Public Garden, July 9, 2022
  • Log data includes:
    - User activity events.
    - Operational metrics.
    - System metrics.
  • Messaging systems are not fit for log processing because:
    - The overly strong delivery guarantees increase the complexity.
    - Throughput is not a primary design constraint.
    - Distributed support is weak, such as data partitioning.
    - Immediate data consumption is assumed (e.g. small queues).
  • Drawbacks of existing log aggregators:
    - Most of them are built for consuming the log data offline.
    - Implementation details are unnecessarily exposed.
    - The “push” model may make consumers flooded by messages.
  • Kafka combines the benefits of traditional log aggregators and messaging systems, and supports both (high-throughput) offline analytics and (low-latency) real-time applications.
  • Concepts:
    - A topic is a stream of messages of a particular type.
    - A topic is divided into multiple partitions.
    - A partition is a logical log that is implemented as a set of segment files (~1GB), and is the unit of load balancing and in-order delivery guarantees.
    - A broker is a server that stores partitions of different topics.
    - A consumer group is one or more consumers jointly consuming a set of subscribed topics.
  • Kafka supports both models:
    - Point-to-point (i.e. load balancing within a consumer group).
    - Publish-subscribe (i.e. fan-out across consumer groups).
  • Each message is only addressed by the location offset in the partition, not by consecutive ids.
  • Each broker keeps in memory the starting offset of each segment file.
  • The benefits of the pull-based consumption model:
    - Applications are able to consume data at its own rate.
    - Data consumption can be rewinded when needed.
  • The consumption progress is maintained by the consumer itself, and each pull request contains the beginning offset.
  • Published messages are flushed to disk in batch, and each pull request can retrieve multiple messages up to a certain size (100s of KBs).
  • Do not explicitly cache messages at the middleware layer so that:
    - Overhead of garbage collection is avoided.
    - The file system page caching is fully exploited.
  • The Linux sendfile API is used to save two copies and one system call each time data is retrieved from broker storage and sent to a remote consumer.
  • The retention policy is made as a simple time-based SLA (typically 7 days).
  • Partitions of a topic are typically many more than consumers in each group so that load is truly balanced.
  • Instead of having a central “master” node, Kafka uses Zookeeper to coordinate consumers of the same group in a decentralized fashion.
  • Zookeeper paths:
    - Broker registry (ephemeral): for each broker, the endpoint and the set of partitions it stores.
    - Consumer registry (ephemeral): for each consumer, the consumer group to which it belongs and the set of topics it subscribes to.
    - Ownership registry (ephemeral): for each consumer group, the consumer that currently consumes a specific partition.
    - Offset registry (persistent): for each consumer group, the offset of the last consumed message in a specific partition.
  • A rebalance process is triggered in each consumer on addition or removal of brokers or consumers.
  • Kafka only guarantees at-least-once delivery, but applications can implement deduplication logic that is more cost-effective than two-phase commit.
  • Messages from a single partition are delivered in order, but messages from different partitions may come out of order.
  • CRC is used for message-level data corruption detection.
  • Future works (at that time):
    - Add built-in replication of messages across multiple brokers.
    - Support both synchronous and asynchronous replication to balance durability and performance.
    - Enhance stream processing capability.

--

--