You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working on Frontera these days, and Frontera is a great tool for cluster crawling!
But I still find there is something not that easy to understand/figure out, because of the lack of documentation. After reading and trying the settings mentioned in the Cluster setup guide — Frontera 0.7.1 documentation, I notice that the meanings of the keyword BACKEND are inconsistent between spider and worker:
in the spider, it means the message bus, which normally would be Kafka
in the workers (db worker and strategy worker), it means the distributed database, which normally would be HBase or SQLAlchemy in Distributed Mode
I do not understand the purpose of this design: the inconsistent meaning would mislead users to set this keyword in both spiders and workers.
Would anyone tell me the reason for this design? Or is it just a mistake?
The text was updated successfully, but these errors were encountered:
Hi @grammy-jiang it's quite an interesting finding. The thing is Frontera tries to be both a distributed and non-distributed crawl frontier framework. And backend became a place in internal architecture allowing to do this, by effectively moving the storage backend to some other process by means of MessageBusBackend.
Emmm, I only use Frontera in cluster mode and did not read other parts carefully in the documentation. Frontera is a fantastic framework for cluster crawling, but the documentation is not clear enough like scrapy.
I am a scrapy heavy user and write some useful middlewares (both spider and downloader, also with unit test cases), and most of them have published on my GitHub page. I would like to contribute these codes back to the community, but I do not know how to do it. Would you please review my code and mentor me how to contribute?
Hi, there,
I am working on Frontera these days, and Frontera is a great tool for cluster crawling!
But I still find there is something not that easy to understand/figure out, because of the lack of documentation. After reading and trying the settings mentioned in the Cluster setup guide — Frontera 0.7.1 documentation, I notice that the meanings of the keyword BACKEND are inconsistent between spider and worker:
I do not understand the purpose of this design: the inconsistent meaning would mislead users to set this keyword in both spiders and workers.
Would anyone tell me the reason for this design? Or is it just a mistake?
The text was updated successfully, but these errors were encountered: