Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: check DB reconnections #342

Open
tiborsimko opened this issue Nov 12, 2021 · 0 comments
Open

db: check DB reconnections #342

tiborsimko opened this issue Nov 12, 2021 · 0 comments
Assignees

Comments

@tiborsimko
Copy link
Member

(stems from #339 (comment))

Current behaviour

Now that we are doing DB connect/disconnect during workflow run, it could happen that a workflow cannot establish connection later during runtime. When this happens, the workflow dies with:

reana_commons.errors.REANAJobControllerSubmissionError: Job submission error: Job submission failed because of DB connection issues.
(psycopg2.OperationalError) FATAL:  sorry, too many clients already

See the above link for detailed description.

Expected behaviour

If possible the system should attempt to reconnect, and if it does not succeed it should sleep for 15 seconds, and retry connection, and re-sleep again, etc for perhaps five times, before giving out and terminating. There are chances that some DB connection will liberate during that time.

We should check our DB reconnection parameters with respect to reconnecting, either from workflow pods or job pods or perhaps infrastructure pods in general, and handle the "DB-maxed-out" situations as gracefully as we can.

@audrium audrium self-assigned this Nov 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants