Change PID namespace during execveat #105

skissane · 2024-06-22T10:05:05Z

skissane
Jun 22, 2024

A process can't change its own PID namespace. Why? The reason given is that would make the return value of getpid change during its execution, and lots of existing code doesn't expect that, and the consequences would be too awful.

But, why can't a process change its PID namespace during exec? It won't observe its own PID change because the memory in which it might record its PID changing would be lost by the exec. Other processes won't notice its PID change either, because they'll be in the parent PID namespace and so still see the original PID. (It could notice if the original PID was passed in argv or envp, but why would anyone do such a thing?)

Use case: I want to start a child process from my shell (e.g. bash), and have that child process be PID 1 of its own PID namespace. Currently, I end up with another process in-between my shell and the PID 1, the shell won't consider the PID 1 to be its direct child. (I can make my PID 1 a direct child using CLONE_PARENT, but then the shell doesn't know what to do with that child process since it didn't start it.)

Possible API: Add a new flag AT_NEWPID to execveat system call. (Could use 0x200, based on unlinkat/faccessat precedent of using that for syscall-specific AT_ flag.) Fail with EINVAL if AT_NEWPID passed when pid_ns_for_children is the same as your own PID namespace. Once execveat gets to "point of no return", if AT_NEWPID is set, move the process into its pid_ns_for_children.

Restrictions: (a) fail with EINVAL if you are PID 1 of your PID namespace; (b) fail with EINVAL if your pid_ns_for_children is not an empty/never-used PID namespace that has never had a PID 1 yet.

Possible implementation: Change create_pid_cachep to allocate storage for one extra struct upid level, so we can handle entering an empty child PID namespace. We only need a single extra level due to the above restrictions. After that, we allocate PID 1 in the child PID namespace IDR, then increment the struct pid level. We'd also need some flag in struct pid to indicate this has happened, so put_pid knows to call kmem_cache_free using the parent namespace (pid->numbers[pid->level-1].ns) pid_cachep not the actual namespace's.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Linux Userspace API Group

Change PID namespace during execveat #105

{{title}}

Replies: 0 comments

Select a reply

The Linux Userspace API Group

Change PID namespace during execveat #105

skissane Jun 22, 2024

Replies: 0 comments

skissane
Jun 22, 2024