-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyError [b'frontier'] on Request Creation from Spider #401
Comments
For reference, the yielding in the spider's
And yielding in the
A custom crawling strategy was not really necessary until now, as with this approach the link filtering happens via xpaths already... |
Following the distributed quickstart in the documentation (except the seed injection step), I am monitoring the prints from the ZeroMQ broker:
... wich means, that the spider is never registered by frontera? Could that be the point breaking it? (And what could cause that? The configuration also mostly follows the general-spider example |
After some debugging I can say that at least the start_requests seem to be working properly, the issue arises from the yielded requests from the parse function |
Issue might be related to #337
Hi,
I have already read in discussions here, that the scheduling of requests should be done by frontera and apparently even the creation should be done by the frontier and not by the spider.
However, in the documentation of scrapy and frontera it is written that requests shall be yielded in the spider
parse
function.How should the process look like, if requests are to be created by the crawling strategy and not yielded by the spider? How does the spider trigger that?
In my use case, I am using scrapy-selenium with scrapy and frontera (I use SeleniumRequests to be able to wait for JS loaded elements).
I have to generate the URLs I want to scrape in two phases: I am yielding them firstly in the
start_requests()
method of the spider instead of a seeds file and yield requests for extracted links in the first of twoparse
functions.Yielding SeleniumRequests from
start_requests
works, but yielding SeleniumRequests from theparse
function afterwards results in the following error (only pasted an extract, as the iterable error prints the same errors over and over):Very thankful for all hints and examples!
The text was updated successfully, but these errors were encountered: