Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to ensure exactly once delivery? #580

Open
kimnami opened this issue Jul 30, 2021 · 1 comment
Open

how to ensure exactly once delivery? #580

kimnami opened this issue Jul 30, 2021 · 1 comment

Comments

@kimnami
Copy link

kimnami commented Jul 30, 2021

https://docs.confluent.io/kafka-connect-hdfs3-sink/current/overview.html#exactly-once-delivery

The connector uses a write-ahead log to ensure each record is written to HDFS exactly once. Also, the connector manages offsets by encoding the Kafka offset information into the HDFS file so that it can start from the last committed offsets in case of failures and task restarts.

Those are for ensuring it in case of failures.
I wonder how this connector ensures exactly-once in normal status.

Is HdfsSinkConnector idempotent and Transactional?
Where could I find it out?


My question is about how to avoid duplicates during writing in temp file.

For example, let's assume that the last committed offset in HDFS file is 10 and the flush size is 10. Then the connector would consume from 11 to 20 before committed.

In this situation, during consuming 11 ~20 in temp file, how does it avoid duplicates? I think there is no offset info to read in middle of writing in temp file, isnt it?

@OneCricketeer
Copy link

You seem to be asking about the HDFS3 connector.

This repo is for the HDFS2 one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants