Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dedupe script should remove the dataset identifier #4163

Open
FuhuXia opened this issue Jan 19, 2023 · 0 comments
Open

dedupe script should remove the dataset identifier #4163

FuhuXia opened this issue Jan 19, 2023 · 0 comments
Labels
bug Software defect or bug harvest-duplicates Issues related to Duplicated Datasets

Comments

@FuhuXia
Copy link
Member

FuhuXia commented Jan 19, 2023

For data.json type of harvest source, the harvester should be able to uniquely identifier a dataset, regardless the dataset state is deleted or not. However, the dedupe process simply mark one dataset deleted, therefore we still have two datasets sharing the same identifier. If we can not purge the duplicated from the system, the next best thing is removing the identifier from the package.

Same thing applies to non-datajson harvest types.

Related to #2989.

@FuhuXia FuhuXia added the bug Software defect or bug label Jan 19, 2023
@hkdctol hkdctol moved this to 🧊 Icebox in data.gov team board May 30, 2023
@btylerburton btylerburton added the harvest-duplicates Issues related to Duplicated Datasets label Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Software defect or bug harvest-duplicates Issues related to Duplicated Datasets
Projects
Status: 🧊 Icebox
Development

No branches or pull requests

2 participants