-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refusing to download from readcomiconline.li #299
Comments
I was previously getting the same error on a Mac (Monterey 12.2.1), with an identical stack trace. I then updated the Python cloudscraper package from 1.2.34 (iirc) to 1.2.60 and now get a shorter error message: Fooling CloudFlare...Please Wait... |
That shorter message tells the issue properly. That's nothing that I can fix/change since it's dependent on another library. Time to look for another alternative then :( |
Maybe allowing to inject cookies (particularly a valid "cf_clearance" copied from any browser) may help with this problem. Or is it more complex than that? |
That's all dependent on the external library in question. I didn't get time this weekend, but I'm hopeful I'll look at this over the next weekend and probably look for some alternatives for the same. |
btw they've added additional security to their downloads, using js obfuscation to the blogspot urls. i've been messing around with it for a few hours.
|
I think this has been the case for quite some time. Have to take a look. Got busy with fixing CI/CD pipeline last weekend. I'm hopeful I'll work on this thing soon... hopefully :( |
Not sure about the Cloudflare issues, but obfuscation of the image URLs is a recent addition, they've also changed the algorithm once already. Here's how you descramble the current iteration: function beau(lstImages) {
return lstImages.map(url => {
if (url.startsWith('https')) {
return url;
}
const containsS0 = url.includes('=s0');
url = url.slice(0, containsS0 ? -3 : -6);
url = url.slice(4, 22) + url.slice(25);
url = url.slice(0, -6) + url.slice(-2);
url = atob(url);
url = url.slice(0, 13) + url.slice(17);
url = url.slice(0, -2) + (containsS0 ? '=s0' : '=s1600');
return 'https://2.bp.blogspot.com/' + url;
});
} |
That's pretty amazing. I found the rguard.min.js they use but I have never used js before, so trying to figure it out was a joke. Thanks for this. |
This logic of getting those images is there already. Check the readcomicOnlineli file. |
Was looking for alternatives to readcominonline.li and cfscrape/cloudflarescrape and stumbled upon this. Well, it'll take a while to fix this until I introduce the good old selenium again (ugh). :| |
Thankfully I didn't have to bring back selenium for now. Had to add a
It should work. Command will look like this:
I tried adding selenium, but that had some problems with Cloudflare redirects. Not sure what the cause was, will take a look at that alternative in future. The primary reason why I don't want to add Selenium back in this project is because it's heavy, wonky and hard to maintain and setup. This is script is "difficult" enough for people to run and adding another step of setting up selenium isn't something I want to do. P.S: Thanks a ton @vetleledaal for finding that method. It saved a lot of time. Appreciate the help and efforts. Latest binaries should be available in the latest release section now. |
I don't know if I am doing something wrong, or what, but, I have downloaded the latest binary, and even when using the --cookie parameter, and passing the cookie value (looks like this from Chrome "_ga=GA1.2.1772146063.1640340838; rco_quality=hq; fpestid=-noHBOo1SsNkRnaWXOYBEjYORP_dyvfg4VuEfzKXv2C4Q6AEbQRvj_3pF1FAN70Fl_xUmg; list-view=grid; b_token=fW1mHplYLTxwpLDwsSCugCvY3kKUCMvdssRwEADfNrKoHdQEL/lJ1iYqd0Y2OcJHEN3Pfa3jm8GlLZ52DAQ9qoBMCel4yxS5kJ5M1RqTrAp/t1m0J+RoTodjHS1EA+FvokfAgcHJrhVNjZP0FSAgAhxCv5RWztm4xe2zjDwtEwRbQrVpau26cq1dltKR1YOc1x5AKiB1RBc7djfQgL9aog==; _gid=GA1.2.2087406613.1650173529; rco_readType=0; cf_chl_2=69cd54f40c3af75; cf_chl_prog=x22; cf_clearance=a5PT0HyClwbTb5BttA4dxk2ugymv35.w2rq6jwkegZ0-1650263245-0-150; _gat=1; __cf_bm=OFRJAi1yOVoqMSJrmJRq8XGBwyCDfRgsJquWzeG47jI-1650264318-0-AW4sjBGc+2LdcXPidj1O8QOb8RHv5tm7yZY1AnDqXhf/d/wNw3KG0pp4CE28+LwQ0eMREhDPMq3AWdJrcnCNryJf5ecPNcCI3jlsyaVZGPBvHi90CZ9xaYqn6TTrllV5Tw==" I am still getting the exact same error. I have updated python, cloudscraper, and node.js all to the latest available versions |
I haven't tried it again, but if somebody else is facing this same issue, please feel free to update here, we can re-open this one and continue looking into this issue. |
Hi there, |
I'm not totally sure of what is happening, but by monitoring the headers of the http requests made by comic-dl (with the command
Given all of this, I see two possible solutions: either add a "user-agent" option to the script, or add a full "header" option. Note: I know the website uses https instead of http, but since I just wanted to sniff the header generated by the script, it is irrelevant. |
Wondering why this is closed. I'm having this issue. Every time I attempt a download from RCO, specifically from https://readcomiconline.li/Comic/Spider-Man-1990, |
My issue (#326) was closed as a duplicate of this issue, but this issue is also closed and it still doesn't work properly.... |
All RCO related issues will be redirected to this one because they're all the same. Also, reiterating why there's an issue with RCO and there is currently no solution for RCO in comic-dl as of now is because I can't find a way to get past their bot checks. I've tried everything I could think of, but it didn't work out. Read in detail: #299 (comment) @topotech , thanks for investigating it. But, the The code is available to everyone, please feel free to experiment and make changes and share with everyone if it works out. I'll be more than thankful :) Edit: Even though this issue is more or less limiting RCO, I am not going to lock the conversation. Folks, share if you have something I can try to get RCO working again :) |
I understand they all redirect here, but what I'm commenting on is the fact that issue #299 is marked closed, but it is still an open problem... |
Regarding the code, I dug into it myself and found that call to the I was going to write a scraper in nodejs/cheerio until I found this and thought it would get us farther, but apparently we're stuck. Now I'm considering an AutoIt script that literally mimics the user actions. |
Maybe an alternate approach is to inject JS into the site that performs the download? e.g. using this extension: https://chrome.google.com/webstore/detail/custom-javascript-for-web/ddbjnfjiigjmcpcpkmhogomapikjbjdk?hl=en |
Although cloudflare has been turned off, they have this really annoying system that makes it a pain to download anything (see Xonshiz/comic-dl#299)
There's an implementation of …but it's licensed under GPLv2-only, making it incompatible with this project. Maybe you can still find it helpful. |
This worked for me! However it seems like they use multiple approaches for scrambling the image URLs (I guess it just depends when the page was published or something and whatever approach they were using at the time?): |
Nice, thanks for all the inputs everyone. I'll try to go back and take a look at RCO once again with these new suggestions. Now that there are some new methods mentioned, I'm re-opening this issue. Let's hope this gets fixed this time :D |
I can't get the cookies to work
this is my config
|
I had the same issue, I fixed it by adding a "cookie": "", to my config.json :
@Xonshiz is that working as intended or is the config generator forgetting about this? |
Hey everyone. I'm new here, but I am passionate! I've tried so many suggestions and I still have this issue. |
Same issue (0image(s) [00:00, ?image(s)/s]) on MacOS 13.5: comic-dl % python3 ./cli.py -i https://readcomiconline.li/Comic/Sha/Issue-1 --cookie "[COOKIE STRING]" |
Take a look: #344 |
Unfortunately, your solution is a bit over my head - Are you able to dumb it down for us? |
Of course. I did some analysis and noticed that some additional headers were impacting access to the site. Another point was the regex that performed the search used single quotes and was therefore unable to generate the list of comic book images. Last but not least, I created a function called |
The |
Cheers so much for that! I see! Alright, and is this function something that we as users of this file will have access to? Because, after installing the requirements using the Windows Binary, the 0 images issue persists - Please see screenshot. I've noticed that the folders seem to be downloading, however, just not the files. Please see Screenshot. These are folders that belong to the comics I have attempted to download (as a test). I've also been able to successfully download files, but that was back in 2020 and I could have sworn using the Readcomiconline file helped with that. |
There is a flow in this case until it reaches the end users. I submitted the Pull Request with the fixes that I identified in my analysis, but they need to be evaluated and go through an approval cycle by the maintainers of this repository. Only after a merge (i.e., the fix is approved and incorporated) will you be able to have access. |
That makes sense! When it is ready, how would we gain access to it? Would we have to download something? I think it was you who kindly assisted me back in 2020, and I remember you helped me out a lot. |
Try out the only working downloader for readcomicsonline.li in 2024: https://github.com/tabletseeker/readcomic_dl |
Error_Log.log
I have been using this for many months, without issue, but today, whenever I try to download something from readcomiconline.li it gives an error,
I have included the error log, and a screenshot of the error
The text was updated successfully, but these errors were encountered: