Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing access rights for Crawl Objects #27

Open
edsu opened this issue Jun 9, 2022 · 3 comments
Open

Changing access rights for Crawl Objects #27

edsu opened this issue Jun 9, 2022 · 3 comments
Labels
needs analysis cannot proceed with this issue without analysis web archiving 2022 web archiving work cycle

Comments

@edsu
Copy link
Contributor

edsu commented Jun 9, 2022

When the access rights for a Crawl Object are changed in Argo we would like those changes to be respected by pywb so that the content is World, Stanford only or Dark (unavailable). In #10 we address the issue of similar rights changes to Seed Objects. However to make similar changes to sets of WARC files will involve modifications to the CDXJ indexes themselves (to add or remove entries). It may prove difficult to make the contents of a WARC file only available on campus, since these controls operate at the URL level, and a given set of WARC files could contain may URLs at different sites.

@edsu edsu added web archiving 2022 web archiving work cycle needs analysis cannot proceed with this issue without analysis labels Jun 9, 2022
@jcoyne
Copy link
Contributor

jcoyne commented Jun 9, 2022

Would it be possible to create conflicting access where one crawl has a url like https://example.com and it is "world" and another crawl also includes the same url (e.g. https://example.com) and it is "dark"?

@edsu
Copy link
Contributor Author

edsu commented Jun 9, 2022

Good point, that is definitely possible. pywb's ACLJ file can also include the timestamp associated with the URL to block. So in theory that could be factored in if we decide we really need pywb to respect access rights changes related to Crawl Objects. At the moment there haven't been given any use cases for access rights changes to Crawl Objects. This issue is mostly here to note that it isn't currently being handled.

@lwrubel
Copy link
Contributor

lwrubel commented Jul 25, 2022

Iceboxing along with #10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs analysis cannot proceed with this issue without analysis web archiving 2022 web archiving work cycle
Projects
None yet
Development

No branches or pull requests

3 participants