I sat down with some of my co-workers at Confluent to ask what the most pertinent Apache Kafka interview questions were for this year. A wide variety of questions surfaced, from the open-ended basics to advanced challenges.
Then, I took notes and organized the answers.
While the questions asked in a real-life scenario would differ widely, depending on whether the role was on a data team or engineering team and whether the candidate was junior or senior, I did my best to emulate the questions by providing answers for different levels.
This post focuses on the questions that surfaced for junior-level applicants. If you’re a junior-level developer who’s looking to level up their Kafka knowledge for an interview, I hope you find this helpful!
What Is Apache Kafka?
Apache Kafka® is an open-source distributed event store and stream processing platform.
What Is an Event?
An event is a key/value pair representing a thing that happened – it could be a webpage click, a rideshare app request, or a thermostat adjustment.
What Is a Topic?
A topic stores events. Its underlying data structure is a log, which, unlike a queue, stores an immutable series of un-deletable events.
What Is a Partition?
Topics are broken up into partitions. Each partition can live on a different node in a cluster of servers, which enables low latency and high throughput.
Each partition has a default replication factor of 3, which makes partitions highly resilient.
What Is a Broker?
A broker is a server that is responsible for message storage as well as metadata storage. For example, brokers store information about the latest #offset for the consumer to access.
What Is a Producer and What Is a Consumer?
A producer writes messages on a Kafka topic, and a consumer reads those messages. Producers are responsible for event assignment to partitions as well as data compression.
What Is an Offset?
An offset is the logical position of a record in a topic’s partition, assigned by the broker when storing the record. Consumer clients keep track of offsets so they can pick up reading from the last successfully processed one when resuming work from being offline.
What Is a Consumer Group?
When consumers share the same Group ID, they belong to a consumer group. This group splits the work of message consumption. Consequently, 2 consumers in the same group cannot read from the same topic. They must be in separate groups to do that.
What is Message Acknowledgment?
When consumers read a message, they send an acknowledgment to the Kafka broker. You can configure different settings on the producer depending on whether you want to wait for this message acknowledgment before sending a new event or not.
What do People Use Apache Kafka for? What Is It Good At?
Kafka is used for building real-time data pipelines. At LinkedIn, where it was invented, it was used to track site user activity. It’s also used for metrics, stream processing, event sourcing, and as a commit log.
How Does Apache Kafka Provide Fault Tolerance/Durability?
Each partition has a replication factor of 3 and is stored on 3 different servers. That means if a server goes down, there will still be two replicas available.
What Is a Connector?
Kafka Connect itself is a framework that makes it easier to write the code you need to connect to data sources and sinks.
A Kafka Connector, on the other hand, removes some of the hassle and boilerplate code you’d have to write to connect to different data sources and sinks.
What Are Different Partitioning Strategies, and How Do They Affect Ordering?
We know topics are split into multiple partitions and that events are sent to topics. How are events assigned to partitions? That depends on the partitioning strategy.
Partitioning strategies are dependent on the keys of the events that are stored in them.
If there’s no key, the events will be distributed among the different partitions round-robin style, so there’s no reason to expect that events that come after one another in the same partition are in cardinal order.
Kafka handles events with keys by computing their destination from a hash of the key. This ensures that events with the same key end up in the same partition.
So if your underlying logic is sound, say, you’re producing events that come from the same user, using their id as a key, it’s safe to assume that these events are coming into the partition in order.
If you’re producing events to that partition that come from two different users, then the order would not necessarily be guaranteed.
Conclusion… For Now!
Did you find this helpful? Were there questions that I missed? Comment below, and let me know! I’ll be continuing this as part of a series of interview questions, so please let me know if there are other questions you’d like to see featured.
If you’d like to solidify your understanding of Kafka, you can take this introductory course: Kafka101