-
-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pywb record inserting domain and collection name into recorded URL on specific sites #886
Comments
ikreymer
added a commit
to webrecorder/wombat
that referenced
this issue
Feb 23, 2024
…ewritten, to be more flexible with rewriting issues caused elsewhere (fixes webrecorder/pywb#886) bump to 3.7.2
There's a few issues with this site:
The history fix needs to be done in wombat, while the other fixes need to be done in pywb / wabac.js |
ikreymer
added a commit
to webrecorder/wombat
that referenced
this issue
Feb 24, 2024
…ewritten, to be more flexible with rewriting issues caused elsewhere (#138) (fixes most significant issue in webrecorder/pywb#886) bump to 3.7.2
Thanks Ilya, really appreciate you looking at this! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
When recording on specific sites, pywb record appears to be duplicating content in the recorded URL, this also seems to be happening in playback, the original target page seems to be captured ok but when you navigate away from it and try to return to the page you get a 404. I've also tried this using ArchiveWeb.page and am getting similar behaviour.
Steps to reproduce the bug
[pywb instance URL]/[collection]/record/https://teesvalley-ca.gov.uk/about/leadership/cabinet-boards-committees/meetings/local-enterprise-partnership/
[pywb instance URL]/[collection]/record/https://teesvalley-ca.gov.uk/[collection]/record/mp_/https://teesvalley-ca.gov.uk/about/leadership/cabinet-boards-committees/meetings/local-enterprise-partnership/
[pywb instance URL]/[collection]/20240222092319/https://teesvalley-ca.gov.uk/about/leadership/cabinet-boards-committees/meetings/local-enterprise-partnership/
The page renders but the URL changes to
[pywb instance URL]/[collection]/20240222092319/https://teesvalley-ca.gov.uk/[collection]/20240222092319mp_/https://teesvalley-ca.gov.uk/about/leadership/cabinet-boards-committees/meetings/local-enterprise-partnership/
Expected behavior
I would expect it not to insert the additional information in the URLs and to play back normally.
Screenshots
How the page looks after the URL has changed
Similar issue with ArchiveWeb.page playback
Environment
We have just updated to the latest version of pywb, I can try and find some more specific info on this if required.
I am using v0.11.3 of ArchiveWeb.page
Additional context
This only seems to have occurred on this site, other sites seem to be capturing as normal. The specific pages I have tried are:
https://teesvalley-ca.gov.uk/business/tees-valley-business-board/
https://teesvalley-ca.gov.uk/about/leadership/cabinet-boards-committees/meetings/local-enterprise-partnership
https://teesvalley-ca.gov.uk/about/leadership/cabinet-boards-committees/meetings/local-enterprise-partnership/local-enterprise-partnership-archive/
Not sure if this is related, but it also looks like there are some minor layout differences in the captured versions from the live web (i.e. the title text is left aligned instead of centred in the captured version)
The text was updated successfully, but these errors were encountered: