dedupe script should remove the dataset identifier #4163

FuhuXia · 2023-01-19T18:53:14Z

For data.json type of harvest source, the harvester should be able to uniquely identifier a dataset, regardless the dataset state is deleted or not. However, the dedupe process simply mark one dataset deleted, therefore we still have two datasets sharing the same identifier. If we can not purge the duplicated from the system, the next best thing is removing the identifier from the package.

Same thing applies to non-datajson harvest types.

Related to #2989.

FuhuXia added the bug Software defect or bug label Jan 19, 2023

nickumia-reisys added this to data.gov team board Jan 19, 2023

hkdctol moved this to 🧊 Icebox in data.gov team board May 30, 2023

btylerburton added the harvest-duplicates Issues related to Duplicated Datasets label Dec 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dedupe script should remove the dataset identifier #4163

dedupe script should remove the dataset identifier #4163

FuhuXia commented Jan 19, 2023

dedupe script should remove the dataset identifier #4163

dedupe script should remove the dataset identifier #4163

Comments

FuhuXia commented Jan 19, 2023