ModuleNotFoundError: No module named 'frontera.contrib.scrapy.middlewares.seeds' #371

ghost · 2019-06-07T06:58:20Z

@sibiryakov Hi, thanks your suggestion about the kafka. But i have installed it in my pc. I tend to build kafka+hbase crawler.

I have few questions, first when i run this command
python -m frontera.utils.add_seeds --config tutorial.config.dbw --seeds-file seeds.txt

scrapy crawl tutorial -L INFO -s SPIDER_PARTITION_ID=0
i got this error
ModuleNotFoundError: No module named 'frontera.contrib.scrapy.middlewares.seeds'

after i removed, i can run the scrapy, but 0 page crawled
SPIDER_MIDDLEWARES = { 'frontera.contrib.scrapy.middlewares.schedulers.SchedulerSpiderMiddleware': 999, ̶ ̶ ̶ ̶'̶f̶r̶o̶n̶t̶e̶r̶a̶.̶c̶o̶n̶t̶r̶i̶b̶.̶s̶c̶r̶a̶p̶y̶.̶m̶i̶d̶d̶l̶e̶w̶a̶r̶e̶s̶.̶s̶e̶e̶d̶s̶.̶f̶i̶l̶e̶.̶F̶i̶l̶e̶S̶e̶e̶d̶L̶o̶a̶d̶e̶r̶'̶:̶ ̶1̶,̶ }

besides, my kafka didnt consume any message

All my config is followed the document cluster setup guide.

For the kafka problems.
after i add this line MESSAGE_BUS = 'frontera.contrib.messagebus.kafkabus.MessageBus' and remove ̶ ̶ ̶ ̶'̶f̶r̶o̶n̶t̶e̶r̶a̶.̶c̶o̶n̶t̶r̶i̶b̶.̶s̶c̶r̶a̶p̶y̶.̶m̶i̶d̶d̶l̶e̶w̶a̶r̶e̶s̶.̶s̶e̶e̶d̶s̶.̶f̶i̶l̶e̶.̶F̶i̶l̶e̶S̶e̶e̶d̶L̶o̶a̶d̶e̶r̶'̶:̶ ̶1̶,̶
i got this problem when i start db worker, stategic work and crawler.

my config
common.py

from __future__ import absolute_import
from frontera.settings.default_settings import MIDDLEWARES

MAX_NEXT_REQUESTS = 512
SPIDER_FEED_PARTITIONS = 2 # number of spider processes
SPIDER_LOG_PARTITIONS = 2 # worker instances
MIDDLEWARES.extend([
    'frontera.contrib.middlewares.domain.DomainMiddleware',
    'frontera.contrib.middlewares.fingerprint.DomainFingerprintMiddleware'
])

QUEUE_HOSTNAME_PARTITIONING = True
KAFKA_LOCATION = 'localhost:9092' 
URL_FINGERPRINT_FUNCTION='frontera.utils.fingerprint.hostname_local_fingerprint'
MESSAGE_BUS = 'frontera.contrib.messagebus.kafkabus.MessageBus'
SPIDER_LOG_TOPIC = 'frontier-done'
SPIDER_FEED_TOPIC = 'frontier-todo'
SCORING_TOPIC = 'frontier-score'

dbw.py

from __future__ import absolute_import
from .worker import *
LOGGING_CONFIG='logging-db.conf'

spider.py

from __future__ import absolute_import
from .common import *
BACKEND = 'frontera.contrib.backends.remote.messagebus.MessageBusBackend'
KAFKA_GET_TIMEOUT = 0.5
LOCAL_MODE = False  # by default Frontera is prepared for single process mode

sw.py

from __future__ import absolute_import
from .worker import *
CRAWLING_STRATEGY = 'frontera.strategy.basic.BasicCrawlingStrategy' # path to the crawling strategy class
LOGGING_CONFIG='logging-sw.conf' # if needed

worker.py

from __future__ import absolute_import
from .common import *
BACKEND = 'frontera.contrib.backends.hbase.HBaseBackend'
HBASE_DROP_ALL_TABLES = True
MAX_NEXT_REQUESTS = 2048
NEW_BATCH_DELAY = 3.0
HBASE_THRIFT_HOST = 'localhost' # HBase Thrift server host and port
HBASE_THRIFT_PORT = 9090

how i create kafka topic
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 2 --topic frontier-done
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 2 --topic frontier-todo
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 2 --topic frontier-score

I set the partition to 2 in common.py,

SPIDER_FEED_PARTITIONS = 2 # number of spider processes
SPIDER_LOG_PARTITIONS = 2 # worker instances

how i start kafka
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic frontier-done --from-beginning
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic frontier-todo --from-beginning
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic frontier-score --from-beginning

Version of tools
Name: frontera
Version: 0.8.1
Name: Scrapy
Version: 1.6.0
Name:Python
Version:3.7.3
Name:Kafka
Version:2.2.1

I think may be the doc didnt update to v0.8.1, it still stay at v0.8.0.1.
Should i downgrade the frontera to the table version v0.8?
But myself love to use the latest version instead.

Thanks in advance!

The text was updated successfully, but these errors were encountered:

Gallaecio · 2019-06-07T10:47:17Z

Please, use StackOverflow to ask this type of questions. See also https://stackoverflow.com/help/mcve and https://stackoverflow.com/help/how-to-ask

ghost · 2019-06-07T11:03:29Z

@Gallaecio
I have think before to ask this question in stackoverflow, but there is less responsive than here. I believe that there are bugs involved. Have you read my entire problem?

Also, I checked all the previous questions, @sibiryakov is very responsive to solve the problem, this is why I am asking here.

I will try to ask in stackoverflow...

I have uploaded the question to stackoverflow, https://stackoverflow.com/questions/56493245/modulenotfounderror-no-module-named-frontera-contrib-scrapy-middlewares-seeds

Sorry i have not enough reputation to post image in stackoverflow. but i use i used imgur.com instead.
Hope i can get the answer soon..,

ghost · 2019-06-09T09:10:15Z

@sibiryakov I found a solution for this error

  File "/home/liho/anaconda3/lib/python3.7/site-packages/frontera/contrib/messagebus/kafkabus.py", line 60, in __init__
    self._partitions = [TopicPartition(self._topic, pid) for pid in self._consumer.partitions_for_topic(self._topic)]
TypeError: 'NoneType' object is not iterable

You should add this line

            self._consumer.topics()

before

            self._partitions = [TopicPartition(self._topic, pid) for pid in self._consumer.partitions_for_topic(self._topic)]

Seems like partitions_for_topic does not request a metadata refresh, whereas topics does. No clue why this worked in kafka-python 1.4.4, as it seems the two functions have not changed. Maybe metadata was always refreshed asap when creating the consumer in 1.4.4?

Making partitions_for_topic call the same code as topics before returning the partitions seems to solve the problem obviously.

Have a look they are fixing this problem recently
dpkp/kafka-python#1789
dpkp/kafka-python#1781
dpkp/kafka-python#1774
Yelp/kafka-utils@607a577

ghost · 2019-06-09T09:56:34Z

@sibiryakov
After i successfully start the cluster

python -m frontera.worker.db --config tutorial.config.dbw --no-incoming --partitions 0 1

python -m frontera.worker.strategy --config tutorial.config.sw --partition-id 0

When i inject the seeds file by command below,

python -m frontera.utils.add_seeds --config tutorial.config.sw --seeds-file seeds.txt

i got this error in the meanwhile in db worker terminal

But after the seed injected, it gone...

scrapy crawl tutorial -L INFO -s SPIDER_PARTITION_ID=1

But i still get 0 page crawled...

Pls help me when you are free sir, thanks in advance!

sibiryakov · 2019-06-10T05:41:23Z

Hi @liho00 your seeds weren't injected, because the strategy worker was unable to create the table crawler:queue. Check that it can connect to Hbase Thrift Server, and namespace crawler exists.

ghost · 2019-06-10T05:53:41Z

@sibiryakov Hi, I am sure i have created the namespace crawler before, and i am also sure the queue table was created..., i need to clarify that im using frontera v0.8.1 as the 'frontera.contrib.scrapy.middlewares.seeds' has been removed at this version.

after i tried again the error still show up after key in this command

python -m frontera.utils.add_seeds --config tutorial.config.sw --seeds-file seeds.txt

dbw terminal

But after few second it show the seeds injected?

seeds terminal

I am still getting 0 page crawled

Besides that, can you tell me how to inject the seeds? If this module is not needed,

ModuleNotFoundError: No module named 'frontera.contrib.scrapy.middlewares.seeds'

i should inject the seed into my strategic worker?

Lastly, i cannot force close my crawler, it trapped in an endless loop

My kafka, zookeeper, hbase, hadoop all started,

ghost · 2019-06-13T17:54:57Z

solved by downgrade kafka-python to v1.4.4

Gallaecio · 2019-06-14T08:14:52Z

If that’s the only fix, then we need to either update setup.py accordingly or add support for later versions of kafka-python.

sibiryakov · 2019-06-14T08:24:20Z

@Gallaecio it should be a tiny PR #371 (comment)

ghost · 2019-06-14T12:22:14Z

Besides that, I cannot force close the spiders, it trapped in an endless loop [kafka client] warning unable to send to wakeup socket when using kafka-python v1.4.5 and v1.4.6 (latest).

kafka/client_async.py

            except socket.error:
                log.warning('Unable to send to wakeup socket!')

dpkp/kafka-python#1837
dpkp/kafka-python#1842

psdon · 2020-12-07T20:16:05Z

I also get the same problem. How can we solve this?

yenicelik · 2021-01-02T13:44:34Z

Getting the same issue here

Gallaecio added the question label Jun 7, 2019

ghost mentioned this issue Jun 8, 2019

cluster kafka db worker doesnt recognize partitions #360

Closed

ghost closed this as completed Jun 13, 2019

Gallaecio reopened this Jun 14, 2019

Gallaecio added bug and removed question labels Jun 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ModuleNotFoundError: No module named 'frontera.contrib.scrapy.middlewares.seeds' #371

ModuleNotFoundError: No module named 'frontera.contrib.scrapy.middlewares.seeds' #371

ghost commented Jun 7, 2019 •

edited by ghost

Loading

Gallaecio commented Jun 7, 2019

ghost commented Jun 7, 2019 •

edited by ghost

Loading

ghost commented Jun 9, 2019 •

edited by ghost

Loading

ghost commented Jun 9, 2019 •

edited by ghost

Loading

sibiryakov commented Jun 10, 2019

ghost commented Jun 10, 2019 •

edited by ghost

Loading

ghost commented Jun 13, 2019

Gallaecio commented Jun 14, 2019

sibiryakov commented Jun 14, 2019

ghost commented Jun 14, 2019 •

edited by ghost

Loading

psdon commented Dec 7, 2020

yenicelik commented Jan 2, 2021

ModuleNotFoundError: No module named 'frontera.contrib.scrapy.middlewares.seeds' #371

ModuleNotFoundError: No module named 'frontera.contrib.scrapy.middlewares.seeds' #371

Comments

ghost commented Jun 7, 2019 • edited by ghost Loading

Gallaecio commented Jun 7, 2019

ghost commented Jun 7, 2019 • edited by ghost Loading

ghost commented Jun 9, 2019 • edited by ghost Loading

ghost commented Jun 9, 2019 • edited by ghost Loading

sibiryakov commented Jun 10, 2019

ghost commented Jun 10, 2019 • edited by ghost Loading

ghost commented Jun 13, 2019

Gallaecio commented Jun 14, 2019

sibiryakov commented Jun 14, 2019

ghost commented Jun 14, 2019 • edited by ghost Loading

psdon commented Dec 7, 2020

yenicelik commented Jan 2, 2021

ghost commented Jun 7, 2019 •

edited by ghost

Loading

ghost commented Jun 7, 2019 •

edited by ghost

Loading

ghost commented Jun 9, 2019 •

edited by ghost

Loading

ghost commented Jun 9, 2019 •

edited by ghost

Loading

ghost commented Jun 10, 2019 •

edited by ghost

Loading

ghost commented Jun 14, 2019 •

edited by ghost

Loading