Avoid lazy umount MNT_DETACH for NFS mounts as this causes system hang #47
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In src/oci-umount.c
Lazy unmount of NFS mounts causes serious problems of even holding up the reboot process. It causes hung tasks which are doing NFS IO, we see them in vmcore in UNinterruptible state . In some cases we have seen the NFS IO tasks in blocked UN state not even allowing the reboot/shutdown task to progress as the shutdown gets into blocked UN state waiting for NFS superblock syncing of inodes before shutdown. But the NFS tasks wont progress and are blocked waiting to complete IO. Although the lazy umount removes it from the mount table we have seen the superblock.s_count and s_active reflecting it in use and holding the number of references by several tasks for NFS IO.
MNT_DETACH does not actually unmount a file-system which is in-use; it just detaches the mount from the visible filesystem tree, and makes it impossible to see what processes are still using the mount. This prevents normal shutdown of systems, due to continued access to the mount.
And issue is confirmed to happen only in dockers/containers environment is being used. And the two notable places of lazy umount are in oci-umount.c
== Details, snippet from vmcore analysis:
The below shows the nfsv4 superblock still holding a reference count although it is not in the mount table.
crash> mount | grep ffff9a0018ae2000
crash> << although mount is removed from filesystem tree due to lazy umount
the superblock fields have references and tasks wait for NFS IO onto this superblock
crash> p ((struct super_block*)0xffff9a0018ae2000)->s_op
$13 = (const struct super_operations *) 0xffffffffc0901b60 <nfs4_sops>
crash> p ((struct super_block*)0xffff9a0018ae2000)->s_count
$12 = 2 << usage count is still positive
crash> p ((struct super_block*)0xffff9a0018ae2000)->s_active
$14 = {
counter = 4 << 4 blocked tasks holding reference to this nfs share
}
The 4 blocked tasks on this lazy umounted NFS superblock were
crash> ps -m | grep UN
[0 00:10:44.049] [UN] PID: 2136 TASK: ffff99f3fef64f10 CPU: 17 COMMAND: "poweroff" => blocked performing sync_inodes_sb( )
[0 00:10:55.262] [UN] PID: 31589 TASK: ffff99ff78ba0000 CPU: 7 COMMAND: "java" => blocked for nfs_file_write( )
[0 00:10:55.345] [UN] PID: 62574 TASK: ffff99f44baa0000 CPU: 17 COMMAND: "java" => blocked for nfs_file_write( )
[0 00:11:02.028] [UN] PID: 63909 TASK: ffff99ed7cffaf70 CPU: 10 COMMAND: "prometheus" => blocked for nfs_file_write( )
Signed-off-by: Ronald Monthero [email protected]