-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDC data duplicated on U.S. Department of Health & Human Services #4073
Comments
Reaching out to HHS to track down some current contacts to work on this. |
Pinged HHS again; moving to blocked for now |
Still blocked but HHS confirms they are investigating the issue |
Pinging HHS again--to confirm that we can delete CDC because HHS data.json covers it |
HHS confirms that CDC harvest source and organization can be deleted, but there's a harvest job right now so will delete next week. |
The error is from web server timeout. The clearing process has finished regardless. I see CDC has 0 datasets. |
CDC harvest source is cleared and deleted. |
Organization deleted. |
Total list of site-wide duplicates. 804 records, all but 10 are CDC records.
CDC has their own organization: https://catalog.data.gov/organization/centers-for-disease-control-and-prevention
However, it looks like all the CDC data is replicated in US Department of Health: https://catalog.data.gov/organization/hhs-gov
In fact, it looks like US Department of Health has more records: https://catalog.data.gov/dataset/?q=%22https%3A%2F%2Fdata.cdc.gov%2Fapi%2Fviews%2F%22&sort=views_recent+desc&ext_location=&ext_bbox=&ext_prev_extent=-150.46875%2C-80.17871349622823%2C151.875%2C80.17871349622823
Expected behavior
Department harvest sources are cleaned such that they don't provide duplicates
Actual behavior
Multiple harvest sources (one from agency, one from department) causing duplicates.
Sketch
Validate with CDC that Department of Health is getting their updated data, and remove CDC harvest source.
https://catalog.data.gov/harvest/https-data-cdc-gov-data-json
Both the harvest sources are being harvested daily...
The text was updated successfully, but these errors were encountered: