You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The connector uses a write-ahead log to ensure each record is written to HDFS exactly once. Also, the connector manages offsets by encoding the Kafka offset information into the HDFS file so that it can start from the last committed offsets in case of failures and task restarts.
Those are for ensuring it in case of failures.
I wonder how this connector ensures exactly-once in normal status.
Is HdfsSinkConnector idempotent and Transactional?
Where could I find it out?
My question is about how to avoid duplicates during writing in temp file.
For example, let's assume that the last committed offset in HDFS file is 10 and the flush size is 10. Then the connector would consume from 11 to 20 before committed.
In this situation, during consuming 11 ~20 in temp file, how does it avoid duplicates? I think there is no offset info to read in middle of writing in temp file, isnt it?
The text was updated successfully, but these errors were encountered:
https://docs.confluent.io/kafka-connect-hdfs3-sink/current/overview.html#exactly-once-delivery
Those are for ensuring it in case of failures.
I wonder how this connector ensures exactly-once in normal status.
Is HdfsSinkConnector idempotent and Transactional?
Where could I find it out?
My question is about how to avoid duplicates during writing in temp file.
In this situation, during consuming 11 ~20 in temp file, how does it avoid duplicates? I think there is no offset info to read in middle of writing in temp file, isnt it?
The text was updated successfully, but these errors were encountered: