-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make fsm ingoing channel finite #2862
base: master
Are you sure you want to change the base?
Conversation
this prevents memory consumption from explosing in the case of a route update storm
disclaimer: I'm not a project maintainer, so this comments are JFYI. tl;dr: I think that this PR can't be merged because it introduces a breaking change for some usage scenarios. It would be better to make this behaviour optional with default to an infinite channel, and give gobgp user a config option to switch to a buffered channel of user-specified length. This change can be a breaking one for some gobgp setups, specifically the one with lots of peers. All parsed peers' messages go to the single fsm goroutine. Buffered channels without explicit scheduling of messages (based on peer class, for example) can lead to a potential problem: some peers might push all their messages to the fsm loop, while the others may block on sending to the finite buffer channel. This can lead to filling tcp socket buffers and loosing bgp keepalive messages, resulting in resetting peer connection and resending full state. This is a metastable failure state (pdf) that won't heal itself. The situation might get worse in case of a peer that went crazy and got into a restart loop, sending full state: without scheduling policy for messages it can become fatal. My colleagues got into gobgp OOM in exactly this scenario, so the problem with infinite channel is real. But I think that the solution is more complex than a buffered channel. My suggestion is to leave the default behavior with unbuffered channels. Make a buffered channel enabled by an option from the config, giving the user the opportunity to select the buffer size. |
I would disagree with the previous comment. Indeed, gobgp would start dropping packets and keep alive, but pushing messages in a queue quicker than you can process them is not a great idea. The gobgp memory consumption would just grow forever, until it runs out of memory. |
@maxime-peim I do agree that unbuffered channels should go away. My point is that this exact change can replace OOM with cyclic session re-establishment, and both of these problems are equally bad. My proposals:
There can also be a mid-term solution with a buffered channel plus moving adj-rib-in processing in the same goroutine as the peer input. It would allow to aggregate "flapping" updates and to not starve the FSM goroutine. |
I think that it's difficult to choose the right size of the channel. If it's too small, it would be the bottleneck on a system with lots of memory. My fundamental question is that how you can handle out-of-memory in Go. I prefer to remove the incoming channel. The rx go routine reads from a socket, creates a Path object, and directly inserts it to the table with its lock. In C, we can detect the failure to allocate a Path object and do something. How can we handle in Go? |
Currently we use an infinite channel (using a queue) for incoming messages, this results in explosion in memory consumption in the case of a route update storm. This change prevents this behavior by using a fixed size buffered channel.