Thursday, October 23, 2025

Exactly Once Semantics in Iceberg/Kafka

Iceberg has a Kafka Connect component that ensures exactly one semantics despite there being two transactions for the two systems (Kafka and Iceberg) per write.

So, how does is this done? The answer is simple: idemptotency. This code here that prepares the data to be committed but by delegating to this code in MergingSnapshotProducer, it ensures there are no dupes. 

Note the Coordinator commits metadata to Iceberg.

The Coordinator and the Worker dance a tango. The steps go like this: 

The Worker writes Iceberg data to storage. Having saved its SinkRecords to its RecordWriter, it then goes on to processControlEvents.

This involves polling the internal topic for Kafka ConsumerRecords. If this worker is the in the consumer group, and the event is a START_COMMIT, it sends to its internal topic all the ContentFiles that it was responsible for writing, wrapped in a DataWritten object. It also sends a DataComplete object with these Events all as part of a single Kafka transaction.

In turn, if the Coordinator receives a DataComplete object, it calls the code that idempotently writes to Iceberg mentioned above within Iceberg transactions. That is, if the ContentFiles wrapped in the DataWritten object are already in the metadata, they are essentially ignored.

The Coordinator can also trigger a commit if it deems it to have timed out.

The key thing is the worker only acknowledges the SinkRecords it read from the external Kafka topic as part of the same transaction it uses to send the Events to its internal Kafka topic. That way, if the worker crashes after writing to Iceberg storage, those SinkRecords will be read again from Kafka and written again to Iceberg storage. However, the Kafka metadata will be updated exactly once - at the potential cost of some orphan files, it seems.

[In addition, Coordinator sends a CommitToTable and CommitComplete objects to its internal Kafka topic in an unrelated transaction. This appears to be for completeness as I can't see what purpose it serves.]

It's the Coordinator that, in a loop, sent a StartCommit object onto the internal Kafka topic in the first place. It only does this having deemed the commit ready (see previous blog post).

And so the cycle is complete.

No comments:

Post a Comment