You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometimes workers get stuck in uninterruptable waits (due to btrfs bugs). In that case, the TCP connection remains alive but the worker never does any work. The scheduler should detect this (e.g. by a timeout when collecting metrics) and mark the worker as paused (or restarting), so that queued jobs for that worker get reassigned.
The text was updated successfully, but these errors were encountered:
Sometimes workers get stuck in uninterruptable waits (due to btrfs bugs). In that case, the TCP connection remains alive but the worker never does any work. The scheduler should detect this (e.g. by a timeout when collecting metrics) and mark the worker as paused (or restarting), so that queued jobs for that worker get reassigned.
The text was updated successfully, but these errors were encountered: