Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort out licensing for works in ACT #94

Open
2 tasks
baskaufs opened this issue Sep 29, 2022 · 1 comment
Open
2 tasks

Sort out licensing for works in ACT #94

baskaufs opened this issue Sep 29, 2022 · 1 comment
Assignees
Labels
ACT WikiProject ACT

Comments

@baskaufs
Copy link

baskaufs commented Sep 29, 2022

Charlotte has a dump of the copyright field, with the ACT ID, copyright statement (i.e. image source info) and copyright permission (CC, or other statement). What needs to happen is:

  • use available Flickr URLs to check the Flickr API to get the license, especially in cases where the permissions statement says to check the source URL.
  • check the works that have Commons URLs to verify that we have already processed them and linked their Wikidata items with an ACT ID claim.
@baskaufs baskaufs self-assigned this Sep 29, 2022
@baskaufs baskaufs added the ACT WikiProject ACT label Sep 29, 2022
@baskaufs
Copy link
Author

Did some preliminary work on checking commons URLs.

  1. Cleaned the output from Jodie, now in processed_lists/clean_metadata_2022-09-29.csv
  2. Used the script processed_lists/clean_raw_export_data.ipynb to grab all Wikidata items with ACT IDs and Commons images. Result in processed_lists/act_items_by_query.csv
  3. There were 2578 ACT works in the dump that had a Commons URL as their copyright value that could be matched to the ACT Wikidata items from the query. There were 940 from the dump that had Commons URLs but couldn't be matched to ACT Wikidata items. Some of these are probably among the 183 Commons images that were black and white, or crops of artworks that we didn't create items for. But that still leaves 757 ACT items from the dump that aren't associated with Wikidata items for some reason and maybe need to be created. There were 348 ACT items from the query that didn't match up with any ACT items in the dump, but that's OK, they probably just aren't listed with the Commons URL as their copyright source.

The next step here is to run the script that looks for tiny Wikidata links on the Commons page and see how many of them don't link to any Wikdata items. Then the issue would be to add ACT links to the ones that do have links and potentially create Wikidata items for those that don't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ACT WikiProject ACT
Projects
None yet
Development

No branches or pull requests

1 participant