make fsm ingoing channel finite #2862

hedibouattour · 2024-12-16T08:23:04Z

Currently we use an infinite channel (using a queue) for incoming messages, this results in explosion in memory consumption in the case of a route update storm. This change prevents this behavior by using a fixed size buffered channel.

this prevents memory consumption from explosing in the case of a route update storm

serejkus · 2024-12-17T05:31:17Z

disclaimer: I'm not a project maintainer, so this comments are JFYI.

tl;dr: I think that this PR can't be merged because it introduces a breaking change for some usage scenarios. It would be better to make this behaviour optional with default to an infinite channel, and give gobgp user a config option to switch to a buffered channel of user-specified length.

This change can be a breaking one for some gobgp setups, specifically the one with lots of peers. All parsed peers' messages go to the single fsm goroutine. Buffered channels without explicit scheduling of messages (based on peer class, for example) can lead to a potential problem: some peers might push all their messages to the fsm loop, while the others may block on sending to the finite buffer channel. This can lead to filling tcp socket buffers and loosing bgp keepalive messages, resulting in resetting peer connection and resending full state. This is a metastable failure state (pdf) that won't heal itself.

The situation might get worse in case of a peer that went crazy and got into a restart loop, sending full state: without scheduling policy for messages it can become fatal. My colleagues got into gobgp OOM in exactly this scenario, so the problem with infinite channel is real. But I think that the solution is more complex than a buffered channel.

My suggestion is to leave the default behavior with unbuffered channels. Make a buffered channel enabled by an option from the config, giving the user the opportunity to select the buffer size.

maxime-peim · 2024-12-20T11:13:42Z

I would disagree with the previous comment. Indeed, gobgp would start dropping packets and keep alive, but pushing messages in a queue quicker than you can process them is not a great idea. The gobgp memory consumption would just grow forever, until it runs out of memory.
Putting back the pressure to the peers is actually better in my opinion.
GoBGP should handle more peers than it can, and surely the model of having one single goroutibe to threat all incoming message is not the best choice there.

serejkus · 2024-12-20T12:20:07Z

@maxime-peim I do agree that unbuffered channels should go away. My point is that this exact change can replace OOM with cyclic session re-establishment, and both of these problems are equally bad. My proposals:

Short-term: make this buffer optional, with default to unbuffered behavior. Please, note, that you still can have buffered channels if you enable it in config.
Long-term: make buffered channel with scheduler for messages between the peer input and FSM goroutine. And, likely between FSM goroutine and output. It can mitigate the risk of working with a peer that started flooding updates.

There can also be a mid-term solution with a buffered channel plus moving adj-rib-in processing in the same goroutine as the peer input. It would allow to aggregate "flapping" updates and to not starve the FSM goroutine.

fujita · 2024-12-29T03:21:27Z

I think that it's difficult to choose the right size of the channel. If it's too small, it would be the bottleneck on a system with lots of memory.

My fundamental question is that how you can handle out-of-memory in Go. I prefer to remove the incoming channel. The rx go routine reads from a socket, creates a Path object, and directly inserts it to the table with its lock. In C, we can detect the failure to allocate a Path object and do something. How can we handle in Go?

make fsm ingoing channel finite

85d8f6a

this prevents memory consumption from explosing in the case of a route update storm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make fsm ingoing channel finite #2862

make fsm ingoing channel finite #2862

hedibouattour commented Dec 16, 2024

serejkus commented Dec 17, 2024

maxime-peim commented Dec 20, 2024

serejkus commented Dec 20, 2024

fujita commented Dec 29, 2024 •

edited

Loading

make fsm ingoing channel finite #2862

Are you sure you want to change the base?

make fsm ingoing channel finite #2862

Conversation

hedibouattour commented Dec 16, 2024

serejkus commented Dec 17, 2024

maxime-peim commented Dec 20, 2024

serejkus commented Dec 20, 2024

fujita commented Dec 29, 2024 • edited Loading

fujita commented Dec 29, 2024 •

edited

Loading