Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attemp to resolve deadlock in blocking fifo open #346

Merged
merged 1 commit into from
Dec 13, 2024

Conversation

jokemanfire
Copy link
Contributor

@jokemanfire jokemanfire commented Nov 28, 2024

avoid fifo open deadlock
refs:https://users.rust-lang.org/t/fifos-problem-in-rust-if-it-should-be-kernels-problem/121779
If read side is not open , the write side will be blocking forever, though the fifo file has been deleted.

@github-actions github-actions bot added the C-runc-shim Runc shim label Nov 28, 2024
@jokemanfire jokemanfire changed the title Attemp to resolve deadlock Do not merge Attemp to resolve deadlock Nov 28, 2024
@analytically
Copy link

The pipe-specific version is generally better when you know you're dealing with a Unix pipe, as it's more semantically correct and handles pipe-specific behaviors more robustly. You'd typically use this when building applications that need to interact with other processes through pipes or when dealing with stdin/stdout in an async context. Nicely done!

@jokemanfire
Copy link
Contributor Author

The pipe-specific version is generally better when you know you're dealing with a Unix pipe, as it's more semantically correct and handles pipe-specific behaviors more robustly. You'd typically use this when building applications that need to interact with other processes through pipes or when dealing with stdin/stdout in an async context. Nicely done!

I have discovered this issue, but there is still no good way to avoid this deadlock. The submission is non blocking and cannot meet the requirements. If you have any ideas to avoid it from the kernel layer or user mode, thank you very much.

@analytically
Copy link

Your OpenOptions::new().open_receiver change can also be made to the copy_console method

@jokemanfire
Copy link
Contributor Author

jokemanfire commented Nov 29, 2024

Your OpenOptions::new().open_receiver change can also be made to the copy_console method

Thanks , but this resolution still in progress, please continue to follow https://users.rust-lang.org/t/fifos-problem-in-rust-if-it-should-be-kernels-problem/121779

btw , I also create a discussion about containerd 's fifo containerd/containerd#11074

@analytically
Copy link

Use O_NONBLOCK when opening for writing:

screentshot-2024-11-29 at 09 36 28@2x

@jokemanfire
Copy link
Contributor Author

Use O_NONBLOCK when opening for writing:

screentshot-2024-11-29 at 09 36 28@2x

To synchronize messages, it is necessary to ensure consistency with the FIFO read and write of containerd. Non_blocking cannot guarantee . I think non-blocking will also lead to other issues. Then this may be a problem for the maintainer of containerd. <_<

@analytically
Copy link

True, this is a deeper architectural issue with containerd's FIFO handling

@jokemanfire
Copy link
Contributor Author

True, this is a deeper architectural issue with containerd's FIFO handling

Now I want to solve the problem of concurrency caused by Fifo in Blocking mode. So if you have any ideas, thank you very much.
Notice that , I found that it doesn't always cause a FIFO read or write deadlock when blocking the card owner.

@analytically
Copy link

The sequence that can cause deadlock:

  • Writer opens FIFO (blocks waiting for reader)
  • FIFO file gets deleted before reader opens
  • Reader tries to open but fails because path is gone
  • Writer remains blocked forever

But this doesn't always happen because the timing is critical. It depends on the exact order of:

  • When write open() blocks
  • When delete() happens
  • When read open() attempts

Possible solutions are:

  • Use file locks to protect FIFO lifecycle
  • Order the operations with synchronization
  • Use atomic file operations for deletion
let (tx, rx) = mpsc::channel();
// Start reader first
thread::spawn(move || {
    let reader = open_fifo_read();
    tx.send(()).unwrap();
    // Reader operations
});
// Wait for reader before starting writer
rx.recv().unwrap();
open_fifo_write();

@jokemanfire
Copy link
Contributor Author

jokemanfire commented Nov 29, 2024

Possible solutions are:

  • Use file locks to protect FIFO lifecycle
  • Order the operations with synchronization
  • Use atomic file operations for deletion

That's my problems

  1. 'Use file locks to protect FIFO lifecycle' This deadlock model is not actually aimed at a single process, but at the problem of asynchronous processes. Using file locks can only attempt to use kernel mechanisms. In fact, I believe that to truly solve this problem, the kernel is needed
    2)‘Order the operations with synchronization’ If we use a synchronization mechanism for FIFO blocking again, there will be no need to use it. I think it would be a waste of resources
    3)' Use atomic file operations for deletion' Some with answer 1 ,using atomic operations in multiprocessing requires kernel involvement

btw,how about sharing viewpoints in the Rust community so that more people can participate?

@jokemanfire jokemanfire changed the title Do not merge Attemp to resolve deadlock Attemp to resolve deadlock in blocking fifo open Dec 4, 2024
crates/runc-shim/src/common.rs Outdated Show resolved Hide resolved
crates/runc-shim/src/common.rs Outdated Show resolved Hide resolved
If long time no read side , may cause shim lock.
Avoid fifo open deadlock,use a easy timeout

Signed-off-by: jokemanfire <[email protected]>
@mxpv mxpv added this pull request to the merge queue Dec 13, 2024
Merged via the queue into containerd:main with commit a0f3bf1 Dec 13, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-runc-shim Runc shim
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants