-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out what is the best way to update the CSV without the need to reconcile the entire CSV? #144
Comments
_Some draft by Ich: Case 1: New data in the database How do we update CSV_X? Replace (with venue_wiki columns) all rows that are not the same Use OpenRefine? How do we check whether new rows can now be reconciled with Wikidata (because there are new entries in Wikidata)? |
Problem: since almost all important columns (such as IDs) are transformed into an URL, there is no way to compare the old and new CSV unless we go into the details, which makes it not very automatic. |
I don't understand. Can you explain further with examples? |
In our previous discussion, the approach is to compare & merge the new updated raw CSV to the old reconciled CSV, then reconcile only the updated part of the merged CSV and leave the old data untouched. However, since my reconciliation process modifies the raw CSV completely, pandas.concat() would consider the same row from the raw CSV and the reconciled CSV as two different rows. For example, in reconciled CSV:
in raw CSV:
and if we update the raw CSV:
then we compare & merge, we will get
|
The text was updated successfully, but these errors were encountered: