-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Project Status? #409
Comments
Same feeling here. Should I invest my time using it? Final version contains fixed bugs but not released version for them. |
also wondering the same. |
Any updates on this? |
I think that the lack of update make it clear the status of it.
Em sáb, 22 de mai de 2021 04:05, Aryan Iyappan ***@***.***>
escreveu:
… Any updates on this?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#409 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJFZV45S3DU6BNJZTLOZAXDTO5JSTANCNFSM4UQ7NHCA>
.
|
thanks for the reply! I am considering moving onto some other library or implementing my own solution. |
Try scrapy-cluster... I moved away from Frontera to it.
Em sáb, 22 de mai de 2021 22:48, Aryan Iyappan ***@***.***>
escreveu:
… thanks for the reply! I am considering moving onto some other library or
implementing my own solution.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#409 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJFZV42MSWRCGJ6SZI6WRKDTPBNGRANCNFSM4UQ7NHCA>
.
|
I ended up implementing my own distributed crawler based on this paper. It talks about creating an URL frontier that enqueues and manages URLs. While adapting to scrapy, the whole concept of "back queues" mentioned in the paper can be discarded. That said, the other thing we need to do is the "front queues". Say you have N number of front queues, push each request that comes into one of the queues While getting the next request, use weighted randoms to pick one of the front queues, and pop the first The next part is the dupefilter. This gives a more scalable frontier. |
@aryaniyaps great insight, thanks for sharing! |
It's been a year since the last commit in the master branch? Do you have any plan to maintain this? I noticed a lot of issues doesn't get resolve, and lots of PR are still pending.
The text was updated successfully, but these errors were encountered: