-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: logout discover dead lock #1085
Conversation
I'm confused. The code as it is right now is cancelable. Do check this small example in go playground that simulates the behavior of the current logic package main
import (
"context"
"fmt"
"time"
)
type PeerData = string
func main() {
ctx, cancel := context.WithCancel(context.Background())
channel := make(chan PeerData)
// This simulates a blocking read
go func() {
time.Sleep(10 * time.Second)
thePeerData := <-channel
fmt.Println("received peerData " + thePeerData)
}()
// This simulates logging out
go func() {
time.Sleep(2 * time.Second)
cancel()
}()
somePeerData := PeerData("alice")
select {
case channel <- somePeerData:
fmt.Println("Pushed peerdata into channel succesfully")
case <-ctx.Done():
fmt.Println("Cancel was executed")
}
} It should print |
@qfrank @richard-ramos one thing I noticed, is that the channel is non buffered and it's used outside the struct. |
ah nevermind, I noticed that's addressed in @richard-ramos , but one other thing I noticed, is that we export mutexes from |
Sorry for the confusing(I thought this was explict), actually the dead lock is not relate to
we can see that then another goroutine:
|
can we use sync/atomic with |
WaitGroup is exported in other places:
and it's combined with the lock as you pointed out, we should definitely try to avoid this pattern, we should keep wait group/locks/etc coordination for internal usage or top-down with a single agent, otherwise it's likely going to be a whack-a-mole |
agreed!
do you agree this as a quick fix? @cammellos |
instinctively, I don't think that would work, if there's truly a race condition, sync/atomic will only protect that value and not the writing to channels. |
why we need protect the writing to channels? could you explain a little bit? thanks in advance! because i see no reason so that was my quick fix @cammellos |
…ervice.PushToChan
e66538e
to
ea59f76
Compare
FYI, the test result shows the latest commit also fixed logout stuck issue. @cammellos @richard-ramos |
Writing to a closed channel would panic, that'd be my guess |
if this is the case , i saw we already have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I guess it should be okay to use this atomic.... Just wondering if this operation inside PushToChan
is needed at all, and if we shouldn't instead just past a context, i.e, remove this:
if err := sp.ErrOnNotRunning(); err != nil {
return false
}
and change the function signature to:
func (sp *CommonDiscoveryService) PushToChan(ctx context.Context, data PeerData) bool {
select {
case sp.channel <- data:
return true
case <-ctx.Done():
return false
case <-sp.Context().Done():
return false
}
}
Perhaps something to be done in future PRs.
I had the same feeling :), thank you for the review |
Pls see relate issue include comments as reference. This PR will fix the issue.
NOTE: creating a new goroutine for every call to PushToChan might introduce overhead, especially under high load. I'm not sure if we can use sync/atomic for simple state updates, or maybe redesigning the service to minimize shared state and reduce the need for locks. But I think this PR is a good start entry to discuss 🙂