-
-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Handle terminate signals in init script #1390
Conversation
docker/init_scripts/init
Outdated
# check for died processes every 5 seconds | ||
sleep 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved sleep to the end of the loop because it does not make sense to wait for 5 seconds when script is started instead of starting of all child processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great change overall, thanks for looking into it!
I can test it locally after the typo for scheduler
is fixed.
docker/init_scripts/init
Outdated
PROCESS=$1 | ||
if [[ -f "/tmp/${PROCESS}.pid" ]]; then | ||
PID=$(cat "/tmp/${PROCESS}.pid") || true | ||
if [[ -d "/proc/${PID}" ]]; then | ||
info_log "stopping ${PROCESS}" | ||
kill "${PID}" || true | ||
# wait for process exit | ||
while [ -e "/proc/${PID}" ]; do sleep 0.1; done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Seems indentation is broken within this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, converted to tabs
docker/init_scripts/init
Outdated
|
||
shutdown() { | ||
# shutdown in reverse order | ||
stop_process_pid sheduler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: sheduler
instead of scheduler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch, thanks, this is why last one python3
process been killed, now termination exit codes looks like
python3 163192 162602 163192 15.15 0
python3 163184 162602 163184 15.39 0
python3 163176 162602 163176 15.45 signal 15 (TERM)
nginx 163193 163163 163193 15.58 0
nginx 163190 163163 163190 15.58 0
nginx 163173 163163 163173 15.58 0
nginx 163189 163163 163189 15.58 0
nginx 163169 163163 163169 15.58 0
nginx 163178 163163 163178 15.58 0
nginx 163180 163163 163180 15.58 0
nginx 163172 163163 163172 15.58 0
nginx 163165 163163 163165 15.59 0
nginx 163191 163163 163191 15.58 0
nginx 163168 163163 163168 15.59 0
nginx 163186 163163 163186 15.59 0
nginx 163174 163163 163174 15.59 0
nginx 163177 163163 163177 15.59 0
nginx 163182 163163 163182 15.59 0
nginx 163166 163163 163166 15.59 0
nginx 163164 163163 163164 15.59 0
nginx 163183 163163 163183 15.59 0
nginx 163167 163163 163167 15.59 0
nginx 163170 163163 163170 15.59 0
nginx 163181 163163 163181 15.59 0
nginx 163187 163163 163187 15.59 0
nginx 163175 163163 163175 15.59 0
nginx 163185 163163 163185 15.59 0
nginx 163163 162602 163163 15.60 0
gunicorn 163154 163151 163154 16.65 signal 15 (TERM)
gunicorn 163155 163151 163155 16.63 signal 15 (TERM)
gunicorn 163151 162602 163151 16.90 0
valkey-server 162649 162602 162649 21.35 0
python3 163862 163735 163862 0.08 0
docker/init_scripts/init
Outdated
# check for died processes every 5 seconds | ||
sleep 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call!
Great improvement, and it works as expected based on my local testing. RomM requires signing all commits before we can merge this PR. |
`tini` does not wait for child processes to close, so all processes will be killed immediately. This is why the container stops so fast. This fix makes the `init` script listen and handle terminate signals. It also ensures that child processes are shut down in reverse order with proper waiting for completion.
448071c
to
f5bd75c
Compare
Done, I squashed two commits into one and signed it |
tini
(added in PR) does not wait for child processes to exit gracefully, so all processes will be killed immediately. This is why the container stops so fast.Possible
tini
issue: krallin/tini#180This fix makes the
init
script listen and handle terminate signals. It also ensures that child processes are shut down in reverse order with proper waiting for completion.How to reproduce:
To see exit codes of processes I used
exitsnoop-bpfcc
tool from this toolset. Not sure maybe it is possible in easier way.This command started in terminal:
sudo exitsnoop-bpfcc | egrep '(gunicorn|valkey|nginx|python3)'
Before the fix
As you may see, in the logs there are only
exited with code 0
andStarting up
, no any word about shutting downOutput of
exitsnoop-bpfcc
:Almost all processes killed and this is bad because some of these processes may write some data to the disk and operation will be interrupted, data can be written incomplete.
After the fix
You may notice shutting down logs appeared in logs and shutdown handlers executed.
Output of
exitsnoop-bpfcc
:You may see almost all processes exited with
0
code orTERM
which is also good I think. Only only process is finishing withKILL
every time, not sure why.