-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tini Exits Too Early Leading to Graceful Termination Failure #180
Comments
My friend @Zheaoli is working on a PR to address the issue. |
对我来说更可怕的是容器停不掉 |
@jschwinger233 @vsxen @antonmos @zimbatm When will this problem be fixed? If tini exits early without waiting for the child process, then the function of causing tini to forward signals is meaningless |
|
我也遇见了这个问题,tini执行了个bash脚本,kill tini进程后会立即停止bash脚本,内部产生的子进程无法收到信号,导致tini快速退出,子进程被强制终止 |
TL;DR
When dealing with multi sub processes, tini is likely to return without waiting for all children to exit, and that drives kernel send SIGKILL to the remaining processes under the same pid namespace, going against the design principle for tini: allowing for signal forwarding and graceful termination.
Steps to Reproduce
get a tini binary
prepare a python script as followed:
Note that I use
time.sleep(1)
to act as if there is some time consuming operation for graceful termination.ppp.py
and tini bintini
are in the $(pwd), then run a docker container as followed:docker run -d --rm -v $(pwd):/src -w /src python /src/tini -g -s -- python /src/ppp.py
2841
:strace -fTp 2841
, then send SIGTERM to tini:kill -15 2747
What We Expected
Since there is no multiple process group, and we specified the tini with
-s -g
, the actions for tini should have been:What We Got
However, we could simply observed that the second (2840) and the third (2841) python processes received SIGKILL as soon as SIGTERM came, demonstrating that tini didn't wait for their exits so that their graceful termination failed.
Root Cause and Suggestions
Kernel
tini/src/tini.c
Lines 546 to 560 in b9f42a0
Look at the second branch
case 0
, whose semantic in the waitpid(2) is:So
waitpid(-1, NOHANG)=0
means there is no "waitable" child(ren), but there IS child(ren), and that's when tini exits: with children still aliveIs this expected?
You might argue that, as long as the direct child process has decent behavior of handling SIGTERM, such as the first python
2810
should wait for the second python2840
before quitting, the tini is flawless.Well that's quite true, but provided that the direct child is capable of grappling with graceful termination and so on, what's the point of installing tini as pid 1 in the container? In that case, we should run the application process as pid 1, without tini.
Improvement Suggestions
There are many ways to tackle the issue, and the principle is as simple as NOT exit until all children are gone.
The syscall waitpid(2) already offers us ability to distinguish between "there is no child" and "there is no waitable children", so we just follow the doc, and change the tini exit condition.
The text was updated successfully, but these errors were encountered: