-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dumpgenerator.py --xmlrevisions creates Error:list index out of range on pokewiki.de #430
Comments
Unfortunately this wiki intentionally returns HTTP 403 for api.php in
many cases. So it's an arms race; if we implement a workaround they will
just block different user-agents or whatever. I suggest to contact the
sysadmin so that they create regular dumps themselves and make them
available on the Internet Archive, so people won't be tempted to export
manually so often.
I don't recommend using your other method with Special:Export because it
will increase their load and therefore invite more blocks.
|
Oh okay but understandable considering that the whole dump ended up at over 30GB, I actually considered asking them for a dump if i hadn't found the alternative. EDIT: found https://archive.org/download/wiki-pokewikide so is there a way to add the dump that I already have?(after i compressed it) |
Il 12/04/22 20:24, Gernot Zacharias ha scritto:
Any recommendations on how to put it on the internet archive as it is huge with all the images?
If the wiki admins made the dump, it would be on their server, so the
upload to the Internet Archive would probably be quite fast.
(compressing the folder would still exceed the default maximum file size of a Fat32(sadly it is still a common standard) formatted drive so i don't know how viable it is)
You can start with the history 7z which launcher.py would produce, it's
going to be much smaller. It's ok to upload a 30 GB file on the Internet
Archive. If you have a FAT HDD, you can create 4 GB volumes.
If your connection is not sufficiently reliable/fast to finish a 30 GB
upload, you can create a torrent file containing the file and upload the
torrent file instead, it will then download from your torrent client.
|
Full comand that was used:
./dumpgenerator.py --xmlrevisions --images --xml --curonly https://pokewiki.de --namespace 0
I used the command without '--namespace 0' before with the same result, i only had to add it for reproducing the error while not putting to much stress on the wiki page it self.
Expected behaviour:
creating a dump of https://pokewiki.de
Actual behaviour after a a few minutes:
Full log:
dumgenerator.py_xmlrevisions.log
Tail of the output file:
Quick 'integrity' check on the output file
Number of page titles in side *-titles.txt: 86796
Test without '--xmlrevisions'
After using the pull request #280 back from 2016 and integrating it into a new version(pull request #429) i managed to get a full dump of the mentioned wiki.
The text was updated successfully, but these errors were encountered: