how to ensure exactly once delivery? #580

kimnami · 2021-07-30T18:07:19Z

https://docs.confluent.io/kafka-connect-hdfs3-sink/current/overview.html#exactly-once-delivery

The connector uses a write-ahead log to ensure each record is written to HDFS exactly once. Also, the connector manages offsets by encoding the Kafka offset information into the HDFS file so that it can start from the last committed offsets in case of failures and task restarts.

Those are for ensuring it in case of failures.
I wonder how this connector ensures exactly-once in normal status.

Is HdfsSinkConnector idempotent and Transactional?
Where could I find it out?

My question is about how to avoid duplicates during writing in temp file.

For example, let's assume that the last committed offset in HDFS file is 10 and the flush size is 10. Then the connector would consume from 11 to 20 before committed.

In this situation, during consuming 11 ~20 in temp file, how does it avoid duplicates? I think there is no offset info to read in middle of writing in temp file, isnt it?

OneCricketeer · 2021-10-20T12:49:32Z

You seem to be asking about the HDFS3 connector.

This repo is for the HDFS2 one

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to ensure exactly once delivery? #580

how to ensure exactly once delivery? #580

kimnami commented Jul 30, 2021 •

edited

Loading

OneCricketeer commented Oct 20, 2021

how to ensure exactly once delivery? #580

how to ensure exactly once delivery? #580

Comments

kimnami commented Jul 30, 2021 • edited Loading

OneCricketeer commented Oct 20, 2021

kimnami commented Jul 30, 2021 •

edited

Loading