Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLAlchemy: AttributeError: _Connection__connection #377

Open
yevgenpapernyk opened this issue Aug 26, 2019 · 1 comment
Open

SQLAlchemy: AttributeError: _Connection__connection #377

yevgenpapernyk opened this issue Aug 26, 2019 · 1 comment

Comments

@yevgenpapernyk
Copy link

yevgenpapernyk commented Aug 26, 2019

I'm using PostgreSQL as backend for the workers.

The worker config is:

from __future__ import absolute_import
from .common import *

BACKEND = 'frontera.contrib.backends.sqlalchemy.Distributed'
SQLALCHEMYBACKEND_ENGINE = 'postgresql://postgres:example@localhost'
MAX_NEXT_REQUESTS = 2048
NEW_BATCH_DELAY = 3.0

The requirements.txt is:

Scrapy>=0.24.4
psycopg2
SQLAlchemy>=0.9.8
msgpack
frontera[sql,hbase,logging,tldextract,kafka,distributed,strategies]

After a while I'm getting this error periodically:

ERROR:sqlalchemy.queue:_Connection__connection
Traceback (most recent call last):
  File "/home/yp/FronteraFromScratch/venv/lib/python3.7/site-packages/frontera/contrib/backends/sqlalchemy/components.py", line 188, in get_next_requests
    self.session.commit()
  File "/home/yp/FronteraFromScratch/venv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 1027, in commit
    self.transaction.commit()
  File "/home/yp/FronteraFromScratch/venv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 506, in commit
    self.close()
  File "/home/yp/FronteraFromScratch/venv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 579, in close
    connection.close()
  File "/home/yp/FronteraFromScratch/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 910, in close
    del self.__connection
AttributeError: _Connection__connection
@dorellang
Copy link

This is stems from the fact that SQLAlchemy is not thread safe and the db worker that comes in with Frontera generates batches not on the main thread (the one the frontier and all the backend models were initialized on)

A possible fix would be to just read from the message bus in a that other thread and then schedule the actual batch generation to be run on the main thread. This is what I do in a hacked version of the worker I use myself, but I am hesitant to release it now because the way I do it is quite inelegant and also add some custom logic for other things. Anyway, that's the general idea if you or anybody else are willing to hack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants