Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phantomjs alternative #124

Open
wioux opened this issue Sep 22, 2017 · 20 comments
Open

phantomjs alternative #124

wioux opened this issue Sep 22, 2017 · 20 comments

Comments

@wioux
Copy link
Member

wioux commented Sep 22, 2017

Should we find an alternative to phantomjs? The maintainer has stepped down.

@mfb
Copy link
Collaborator

mfb commented Sep 22, 2017

There is now firefox headless https://mykzilla.org/2017/08/30/headless-firefox-in-node-js-with-selenium-webdriver/ or I guess more popularly, chrome headless.

@j-ro
Copy link
Collaborator

j-ro commented Sep 22, 2017 via email

@wioux
Copy link
Member Author

wioux commented Sep 22, 2017

Do we still need to support webkit/waitir? REQUIRES_WAITIR is empty and all the bioguide ids from REQUIRES_WEBKIT are house members so we can clear that out, but I'm not sure what the need for the alternative drivers was originally and whether it might come up again. We could really simplify parts of the app if we removed support for those drivers.

@j-ro
Copy link
Collaborator

j-ro commented Sep 22, 2017

I think that's probably fine over here, yeah...

@ghost
Copy link

ghost commented Sep 30, 2017

@j-ro
Copy link
Collaborator

j-ro commented Jan 13, 2018

has anyone started work on this?

@wioux
Copy link
Member Author

wioux commented Jan 16, 2018

Not yet @j-ro.

@j-ro
Copy link
Collaborator

j-ro commented Jan 17, 2018

Thanks @wioux, us either, though it's starting to become more important for us. I'll let you know if it lands on my roadmap. Can you do the same, so we don't duplicate work?

@wioux
Copy link
Member Author

wioux commented Jan 17, 2018

Definitely, I'll let you know.

@j-ro
Copy link
Collaborator

j-ro commented Jan 17, 2018

We're actually doing a bit of initial investigation work on this today, maybe tomorrow too. We'll let you know how it works. There may be just a drop-in replacement that works with capybera, if so, will be fairly easy....

@j-ro
Copy link
Collaborator

j-ro commented Jan 23, 2018

Update here -- we have chromedriver running, but it's probably not quite ready for prime time. It works, but seeing some hard to debug timeout errors, and it's missing some features like blacklists. We're going to run it as an optional switch for certain yamls since it helps in some cases, but we're not going to entirely switch. If there's large appetitive for the code we can put together a PR, but it's very much a WIP.

@k-stewart
Copy link
Contributor

Hey @j-ro, this is becoming more important for us. Have you found a solution you like?

@j-ro
Copy link
Collaborator

j-ro commented May 10, 2018

No, we're still with phantom. Chromedriver works but not as consistently, and it doesn't have many hooks and options to debug and tune. We haven't looked at it since January, maybe that's changed, but we're not planning a switch.

@k-stewart
Copy link
Contributor

Ok, thanks for the insight. I'll see if anything's changed since then.

@j-ro
Copy link
Collaborator

j-ro commented May 13, 2018

Worth a shot -- it didn't really take us very long at all to drop in Chromedriver -- the hard part was getting it to work reliably.

@ghost
Copy link

ghost commented Jul 19, 2019

I'll chime in with my experience as I have worked with puppeteer, and phantomjs, and various selenium webdriver implementations like chromedriver and geckodriver.
Puppeteer provides a high level API that is quite easy to work with for basic scraping. They publish extensive documentation as well. If needing to get something done quick, I think this is a strong contender. It is a JavaScript only API as far as I know.
Selenium webdriver implementations give you more flexibility with the browser you run the automation in but require more programming and configuration to get working. The API is also implemented in different programming languages. Firefox's headless documentation also recommends using selenium webdriver for testing automation.

@ghost
Copy link

ghost commented Jul 19, 2019

Just discovered @k-stewart 's work in #141 as well.

@wioux
Copy link
Member Author

wioux commented Jul 19, 2019

Hi @efx. Our contact-congress work has moved over to EFForg/congress_forms_api to fix this and other issues. Sorry we didn't properly archive this repo -- I'm going to do that now.

@ghost
Copy link

ghost commented Jul 22, 2019

Thanks @wioux. I had found this repository from EFF's homepage, so we should probably update those link(s) as well.

@danielmroberts
Copy link

Hi @efx. Our contact-congress work has moved over to EFForg/congress_forms_api to fix this and other issues. Sorry we didn't properly archive this repo -- I'm going to do that now.

This repo is still not archived. We were about to roll out a system we have been working on for a while based on phantom of the capitol before noticing your comment :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants