Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replication key query produces duplicate values across runs #557

Closed
cnatsis opened this issue Dec 17, 2024 · 2 comments
Closed

Replication key query produces duplicate values across runs #557

cnatsis opened this issue Dec 17, 2024 · 2 comments

Comments

@cnatsis
Copy link

cnatsis commented Dec 17, 2024

When running a flow in INCREMENTAL mode with a DATE column as replication-key, the subsequent runs produce duplicate value, since the comparison is performed using >= operators.

https://github.com/MeltanoLabs/tap-postgres/blob/main/tap_postgres/client.py#L242

The correct operator to use is >, regardless the replication key type.

Example steps

State column: key_col

  1. First run,

No state existing, full table load, output data is:

col_a col_b key_col
a1 b1 2024-01-01
a2 b2 2024-01-02
  1. Insert new row in db

| col_a | col_b | key_col |
| a3 | b3 | 2024-12-01 |

  1. Second run

State value: 2024-01-02
⚠ Duplicate row a2 in output

| col_a | col_b | key_col |
| a2 | b2 | 2024-01-02 |
| a3 | b3 | 2024-12-01 |

@edgarrmondragon
Copy link
Member

edgarrmondragon commented Dec 17, 2024

Thanks for logging @cnatsis!

This is arguably not a bug, but a missing feature at worst. See the comment #558 (comment).

Also,

@cnatsis
Copy link
Author

cnatsis commented Dec 18, 2024

Thank you @edgarrmondragon for pointing this out! For now the solution would be to configure idempotent sink on the other side to handle the duplicate rows.

Closing the issue and the PR since it is tracked elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants