Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seems to Fail on one node/file and stop #12

Open
GaFort opened this issue Apr 8, 2020 · 4 comments
Open

Seems to Fail on one node/file and stop #12

GaFort opened this issue Apr 8, 2020 · 4 comments

Comments

@GaFort
Copy link

GaFort commented Apr 8, 2020

Marks below specific file as "retry later" and eventually comes to the below status.. all other crawls seem to get done ok. because of this stopping here, doesnt even get around to downloading. did take the recommendation of restricting workers but that didnt help either.

2020-04-08 19:10:26,601 - 3 - [Failed] priority=C, ttl=3. crawl_supplement: course=decision-making, supplement=types-of-analytics
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/dl_coursera/lib/TaskScheduler.py", line 167, in _func_work
task.run()
File "/usr/local/lib/python3.7/site-packages/dl_coursera/lib/TaskScheduler.py", line 44, in run
self.go()
File "/usr/local/lib/python3.7/site-packages/dl_coursera/lib/TaskScheduler.py", line 75, in go
self._func(**self._kwargs)
File "/usr/local/lib/python3.7/site-packages/dl_coursera/Crawler.py", line 264, in crawl_supplement
assets += crawl_assets(assetIDs)
File "/usr/local/lib/python3.7/site-packages/dl_coursera/Crawler.py", line 287, in crawl_assets
assert len(assets) == len(ids)
AssertionError

Traceback (most recent call last):
File "/usr/local/bin/dl_coursera", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/dl_coursera_run.py", line 178, in main
soc = crawl(args['cookies'], args['slug'], args['isSpec'], args['outdir'], args['n_worker'])
File "/usr/local/lib/python3.7/site-packages/dl_coursera_run.py", line 71, in crawl
soc = crawler.crawl(slug=slug, isSpec=isSpec)
File "/usr/local/lib/python3.7/site-packages/dl_coursera/Crawler.py", line 313, in crawl
assert len(failures) == 0
AssertionError

@GaFort
Copy link
Author

GaFort commented Apr 8, 2020

works on other courses.. seems to be a course specific issue?

@FLZ101
Copy link
Owner

FLZ101 commented Apr 8, 2020

I will fix the issue in the next release.

As a expediency, to delete the following line in Crawler.py to make the crawling & downloading go on:

assert len(failures) == 0

@GaFort
Copy link
Author

GaFort commented Apr 9, 2020

Thank you for the quick fix suggestion.. I feel real dumb now!

@FLZ101
Copy link
Owner

FLZ101 commented Apr 10, 2020

I mean opening /usr/local/lib/python3.7/site-packages/dl_coursera/Crawler.py with a text editor, and then deleting the following line in that file:

assert len(failures) == 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants