Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer-Encoding: chunked with proxy enabled #112

Open
migliori opened this issue Oct 5, 2018 · 9 comments
Open

Transfer-Encoding: chunked with proxy enabled #112

migliori opened this issue Oct 5, 2018 · 9 comments

Comments

@migliori
Copy link

migliori commented Oct 5, 2018

Hi,

I've been using search-engine-google for a while, worked perfectly until now but I've got a recent issue with proxies.

The Google scraper works fine on my localhost, but on the production server it throws an error 500: Unable to check javascript status

The scraped results come with dom => textContent starting with "ncoding
Transfer-Encoding: chunked"

I put a simple test online here: https://www.hack-hunt.com/scraping-simple-test.php
The code is the code of your example here: http://serp-spider.github.io/documentation/search-engine/google/#installation

I just added:
$proxy = Proxy::createFromString('https://xxx:proxy@ip');
$browser->setProxy($proxy);

It works fine on localhost, or on production server if I remove the proxy, but it fails on production with proxy.

Not sure if the issue comes from my server or search-engine-google.

Any help much appreciated, thanks

@LunarDevelopment
Copy link

LunarDevelopment commented Oct 5, 2018 via email

@migliori
Copy link
Author

migliori commented Oct 5, 2018

composer.zip

same on local & server

@LunarDevelopment
Copy link

LunarDevelopment commented Oct 5, 2018 via email

@gsouf
Copy link
Member

gsouf commented Oct 5, 2018

@migliori Please check your CURL version. If curl version is not the same on the server, please try to upgrade and let us know what's going on.

@migliori
Copy link
Author

migliori commented Oct 5, 2018

No, it isn't, if you open https://www.hack-hunt.com/scraping-simple-test.php you'll see the string added before Google content:
`public 'textContent' => string 'ncoding
Transfer-Encoding: chunked

simpsons - Recherche Google(function(){window.google=...`

I suspected that headers could be added by Apache pagespeed module, but tried to disable it without success.

I can't change my PHP Curl version, it's built-in with PLESK PHP.
version =>7.26.0
ssl_version => OpenSSL/1.0.1t
libz_version => 1.2.7

I just tested with nginx instead of apache: same result.

@gsouf
Copy link
Member

gsouf commented Oct 5, 2018

@migliori not php-curl, just curl itself. Run curl --version

@migliori
Copy link
Author

migliori commented Oct 5, 2018

I already did it: apt-get update && apt-get install curl libcurl
curl --version
curl 7.26.0 (x86_64-pc-linux-gnu) libcurl/7.26.0 OpenSSL/1.0.1t zlib/1.2.7 libidn/1.25 libssh2/1.4.2 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap pop3 pop3s rtmp rtsp scp sftp smtp smtps telnet tftp
Features: Debug GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz TLS-SRP

@gsouf
Copy link
Member

gsouf commented Oct 5, 2018

Your version of curl is very old. Try to upgrade to version 7.61 and see if it works.

Additionally curl <7.48 has issue with cookies, preventing SERPS to work correctly with cookies.

@migliori
Copy link
Author

migliori commented Oct 5, 2018

I'm in touch with my server provider & let you know if it's ok or not as soon as the upgrade is done - may take 1 or 2 days.

Thanks so much for your reactivity & help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants