Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bgpPathACache leaks memory #472

Open
4xoc opened this issue Jul 3, 2024 · 3 comments
Open

bgpPathACache leaks memory #472

4xoc opened this issue Jul 3, 2024 · 3 comments

Comments

@4xoc
Copy link

4xoc commented Jul 3, 2024

Describe the bug
bgpPathACache causes large memory leaks and is itself not caching anything that can be retrieved. The result is an ever increasing map which consumes all memory until depletion. This is a combination of issues described in detail below.

Issue 1 - BGPPathA duplication

p (type BGPPathA) is copied and added to the map index, resulting in a duplication of the contents.

bgpc.cache[*p] = p

Issue 2 - impossible cache hit

BGPPathA contains ptrs that always reference new memory (line 131 & 138). The map index therefore becomes unique for each new BGPPathA created. No dedup happens here.

bio-rd/route/bgp_path.go

Lines 129 to 145 in 63eef2b

p := &BGPPath{
BGPPathA: &BGPPathA{
NextHop: bnet.IPFromProtoIP(pb.NextHop).Ptr(),
LocalPref: pb.LocalPref,
OriginatorID: pb.OriginatorId,
Origin: uint8(pb.Origin),
MED: pb.Med,
EBGP: pb.Ebgp,
BGPIdentifier: pb.BgpIdentifier,
Source: bnet.IPFromProtoIP(pb.Source).Ptr(),
OnlyToCustomer: pb.OnlyToCustomer,
},
PathIdentifier: pb.PathIdentifier,
ASPath: asPath,
ASPathLen: asPath.Length(),
BMPPostPolicy: pb.BmpPostPolicy,
}

Issue 3 - missing cache eviction

Even if this cache would work, there is no mechanism for cache eviction. For very long running instances and many changes, this would always grow until no memory is available anymore.

Steps to Reproduce
run ris-mirror, have route updates, ram go brrr and poof

Expected behavior
ris-mirros shouldn't leak memory and can run for long times

Configuration used
this is config independent as the code path is always used in ris-mirror

Additional context
I've done a lot of profiling on this issue und very confident this is the major source of leaks for ris-mirror. Simply running without cache works very nicely and shows generally better peformance.
Happy to provide a PR to fix this, depening on what solution is preferred. Removing cache seems like a better tradeoff than redesigning the entire cache strategy for probably marginal benefits.

@taktv6
Copy link
Member

taktv6 commented Jul 7, 2024

Hi, thanks for reaching out.

The general idea is to deduplicate parts of BGP path attributes when they are the same.
Not freeing unused resources is a known issue and is very welcome to get addresses.
Not having any deduplication happening makes bio-rd consume multiple GB of RAM for just one IPv4 full table. That's why that
things was introduced in the first place.

Let me get over the issues one by one:
On 1: I think that's not 100% correct. The value of what p points to is being used as the index on the map. When two PathA objects with identical content come up,

will return a pointer to the existing object. With the old pointer gone the old object will be garbage collected.

On 2: Yes, that's indeed a bug (that affects ris-mirror, but not ris if I'm not mistaken). The IPs need to be deduplicated themselves first before the PathA is being deduplicated.

On 3: As written above. The missing cache eviction is a known issue and a fix welcome.

@4xoc
Copy link
Author

4xoc commented Jul 8, 2024

As far as I understand the code, your argument about a full table doesn't hold because the cache, as it is at the moment, doesn't do anything usefull. In combination with the second issue it can never match any entry because the ptrs are unique and thus is the hash of the map index. Unfortunately I don't have the possibility to test with a full table so I can't confirm this with data.

Let me clarify on the first issue, the duplication of data happens in the map itself, by design of maps. Go just doesn't save the hash of the struct but the entire index & the value, effectively copying th full structure in this particular instance. This happens even when the cache would be working correctly. Adding to this, any identical BGPPathA would have to exists at least two times just to be as efficient as not using this cache at all (and that's not even counting the overhead of the map itself although perhaps negligible). It indeed appears to be a problem only in the ris-mirror using the BGPPathA cache, not any other cache instance anywhere else. The cache mechanism itself works "as designed" for other instances because no ptrs exist and therefore could very well help on full-tables (including possible inefficiencies when <= 2 identical entries exist).

Pretty much the same applies to IPs being deduplicated in terms of efficiency.

What would you like to see happen here? I believe a cache for BGPPathA is not giving any real world benefits that justify its use or the time to invest in a proper caching strategy. With ptrs in the mix I think the options are quite limited:

  1. not using ptrs at all
  2. Using a different cache map index like a hash of the struct and the resolved ptr values. This could lead to collisions though or with larger hash types cause the index to be even larger than the raw data.

@taktv6
Copy link
Member

taktv6 commented Jul 8, 2024

it can never match any entry because the ptrs are unique

No. That's only true when IPs (Nexthop, Source) are not deduplicated down to the same pointer, or when Aggregator is not nil. When IP address 192.0.2.0 is reliably resolved into the same pointer by the IP deduplication the ptrs are not unique, and thus the cache working. (That the cache in generals work can be seen here: https://go.dev/play/p/A0--_Gyh7vF)

True is, that the Nexthop and Source attributes are not deduplicated in ris-mirror. That's a bug.

We've intentionally split the BGP attributes into BGPPath and BGPPathA. All attributes that are commonly the same have been moved into BGPPathA. In case there are not enough paths that share common values for attributes in BGPPathA, the solution can indeed be less efficient. But that's okay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants