You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now that we are doing DB connect/disconnect during workflow run, it could happen that a workflow cannot establish connection later during runtime. When this happens, the workflow dies with:
reana_commons.errors.REANAJobControllerSubmissionError: Job submission error: Job submission failed because of DB connection issues.
(psycopg2.OperationalError) FATAL: sorry, too many clients already
See the above link for detailed description.
Expected behaviour
If possible the system should attempt to reconnect, and if it does not succeed it should sleep for 15 seconds, and retry connection, and re-sleep again, etc for perhaps five times, before giving out and terminating. There are chances that some DB connection will liberate during that time.
We should check our DB reconnection parameters with respect to reconnecting, either from workflow pods or job pods or perhaps infrastructure pods in general, and handle the "DB-maxed-out" situations as gracefully as we can.
The text was updated successfully, but these errors were encountered:
(stems from #339 (comment))
Current behaviour
Now that we are doing DB connect/disconnect during workflow run, it could happen that a workflow cannot establish connection later during runtime. When this happens, the workflow dies with:
See the above link for detailed description.
Expected behaviour
If possible the system should attempt to reconnect, and if it does not succeed it should sleep for 15 seconds, and retry connection, and re-sleep again, etc for perhaps five times, before giving out and terminating. There are chances that some DB connection will liberate during that time.
We should check our DB reconnection parameters with respect to reconnecting, either from workflow pods or job pods or perhaps infrastructure pods in general, and handle the "DB-maxed-out" situations as gracefully as we can.
The text was updated successfully, but these errors were encountered: