feat(ksymbols): reimplement ksymbols #4464

oshaked1 · 2024-12-25T16:41:54Z

1. Explain what the PR does

The previous ksymbols implementation used a lazy lookup method, where only symbols marked as required ahead of time were stored. Trying to lookup a symbol that was not stored resulted in /proc/kallsyms being read and parsed in its entirety.
While most symbols being looked up were registered as required ahead of time, some weren't (in particular symbols needed for kprobe attachment) which incurred significant overhead when tracee is being initialized.

This new implementation stores all symbols, or if a requiredDataSymbolsOnly flag is used when creating the symbol table (used by default), only non-data symbols are stored (and required data symbols must be registered before updating). Some additional memory usage optimizations are included, for example encoding symbol owners as an index into a list of owner names, and also lazy symbol name lookups where the map of symbol name to symbol is populated only for symbols that were looked up once.

From measurements I performed, the extra memory consumption is around 25MB (from ~132MB to ~157MB when running tracee with no arguments on my machine).

Under the hood, this ksymbols implementation uses a generic symbol table implementation that can be used by future code for managing executable file symbols.

A significant advantage gained by storing all non-data symbols is the ability to lookup a function symbol that contains a given code address, a feature that I plan to use in the future.

This PR closes #4463 and renders #4325 irrelevant (because /proc/kallsyms reads no-longer happen "spontaneously").

NDStrahilevitz

Nice work overall, though I do have some comments in mind.

pkg/utils/symbol_table.go

NDStrahilevitz · 2024-12-26T14:15:52Z

pkg/utils/symbol_table.go

+	mu sync.RWMutex
+	// All symbols sorted by their address in descending order,
+	// for quick binary searches by address.
+	sortedSymbols []*T


Why *T? Symbol is already a pointer by definition.

Are you sure it is? From my understanding if I use T then sortedSymbols contains all of the symbols inline, which I tried but it made it harder to manage symbol references in the name to symbol map.

IIRC say you have a struct S which implements some interface I. Then you can pass s S as I only with &s.

So what you're doing is requiring concretely a **KernelSymbol instead of a *KernelSymbol as Symbol.

IIRC say you have a struct S which implements some interface I. Then you can pass s S as I only with &s.

If I understand correctly, this applies only if the struct methods are implemented using a pointer receiver, notice how I implemented them without it and given a non-pointer type when creating the underlying symbol table.

Pointer receivers tend to be more efficient but that's besides the point. Yes you're correct but then why add the indirection at this point? Just stick to the (hopefully) stack allocated objects. Now you're forcing heap escape. If we're going through pointer eventually might as well just author the struct as a pointer receiver and make the code simpler.

NDStrahilevitz · 2024-12-26T14:23:12Z

pkg/utils/symbol_table.go

+
+	// Add the new symbols to the sorted slice (which now becomes unsorted).
+	// Allocate the slice with the needed capacity to avoid overallocation.
+	oldSymbols := st.sortedSymbols


Pretty sure this "copy" is unsafe as its just a pointer reassignment.
I think you should rather allocate the new slice, use on copy the old and parameter slice into the new allocation, then assign that one to the object field (while the prior slice will be given to GC).

It's not a copy, I'm just saving a reference to the old slice in a temporary variable before assigning an empty slice to the struct variable.

The overall operation you're doing is copying the old into a new extended. I was pinning to the particular line here as its:

The first of the operation

Seemed problematic to me

I still don't understand why it's unsafe

You assign the pointer of st.sortedSymbols to oldSymbols. This isn't a copy.
Then to the pointer of st.sortedSymbols you assign the make. This may be reassigning a pointer which is safe or assigning to the content of the pointer. It's probably the former, but if the latter that is unsafe as you are essentially overriding the content of oldSymbols as well.
In any case since there is no sorted insertion here, i'm not sure why you do the whole reallocation, just append the items, simpler and quicker, and sometimes you will avoid reallocation if the slice has preallocated space.

pkg/utils/symbol_table.go

NDStrahilevitz · 2024-12-26T14:27:19Z

pkg/utils/symbol_table.go

+			symbols = append(symbols, symbol)
+		}
+	}
+	st.mu.RUnlock()


You should for safety's sake push one of these into the start of the following if block, and then duplicate this call before the last return.

Not sure I understand what you mean, why is that safer?

I assume you're using a mutex because different goroutines can call this data structure's methods from multiple locations. By unlocking here you're leaving a small gap where another goroutine can "sneak in", do changes and then unlock you back inside the if post changes.
That's why the gap between locks and unlocks should be as small as possible - so this unlock should be done near the inner Lock of the if or right before the return.

Understood 👍

NDStrahilevitz · 2024-12-26T14:32:17Z

pkg/utils/environment/kernel_symbols.go

-		Owner:   symbolOwner,
-	}
+func (kst *KernelSymbolTable) ForEachSymbol(callback func(*KernelSymbol)) {
+	kst.symbols.ForEachSymbol(callback)
 }

 // copySliceOfPointersToSliceOfStructs converts a slice of pointers to a slice of structs.


As this is left unused inside the package, and I see you've copied it elsewhere, this can be removed, no reason to leave dangling code.

It's used in the test, should I remove it from the package and call the inner version in the test?

I think you've copied it into the test file no? Then you're using that copy anyway and this is dangling.

No, getTheOnlySymbol() in the test file uses KernelSymbolTable.ForEachSymbol()

I'm referring to copySliceOfPointersToSliceOfStructs , I see now it might not be clear from the snippet.

NDStrahilevitz · 2024-12-26T14:37:04Z

pkg/utils/environment/kernel_symbols.go

+// limit to prevent unrelated symbols from being returned for an address lookup
+const maxSymbolSize = 0x100000
+
+var ownersMu sync.RWMutex


The following owner tracking data structures should be localized to the struct I think. Since this package doesn't expose a singleton (which it certainly could given that a kernel symbol table is a single system file) I don't think we should mix global and struct variables.
That is unless you want to make this a singleton.

The reason I made it global is so that I could access it in the Owner() function, without needing to save a back-reference to the symbol table in each symbol (which would incur a significant memory overhead). Is there another way you think I could localize it? I don't want to use a singleton because of the configurable aspect (requiredDataSymbolsOnly, lazyNameLookup).

What purpose does the index:owner mapping serve? I see that when you parse the file you do string->int and when you call Owner() you do int->string (via array), but I don't see where the int plays a role in itself. Why not just use strings overall?
Further, if you expect that an array of owner string will be rather small (up to 5 lets say), then the map duplication is redundant (consider that in event args we prefer arrays to a map), and you could also drop the mutex.

The purpose is to reduce memory usage, storing a uint16 takes less memory than an owner string. On my machine there are 103 possible owners so I'm not sure it would be smart to drop the map.

Maybe it wouldn't be that bad storing the name, on my machine 30622/230472 symbols are owned by modules, and the system symbols could have a memory efficient representation (like a nil)

The purpose is to reduce memory usage, storing a uint16 takes less memory than an owner string. On my machine there are 103 possible owners so I'm not sure it would be smart to drop the map.

That's why I said it depends on the expected count.

Maybe it wouldn't be that bad storing the name, on my machine 30622/230472 symbols are owned by modules, and the system symbols could have a memory efficient representation (like a nil)

That does sound like a lot and storing the string directly would be costly... So going with the int was a good call.

I still don't like the mixed approach but maybe that means the kernelsymbol table should be a singleton. Tentative on this for now.

NDStrahilevitz · 2024-12-26T14:39:50Z

pkg/ebpf/tracee.go

-				environment.WithRequiredSymbols(t.requiredKsyms),
-			)
+			t.kernelSymbols = environment.NewKernelSymbolTable(true, true)
+			// t.requiredKsyms may contain non-data symbols, but it doesn't affect the validity of this call


Since plenty of the symbols i've attempted to include here belong to the T symbols, I think trimming that logic will save both RSS, loading time, and code complexity.
If you do it, please do so in a separate commit.

I could trim the symbols included for attachment purposes, but anything included as a ksymbol event dependency needs to stay becuase we don't know if it's a data symbol or not. I think the effect on RSS and loading time would be negligible, but maybe it's worth it for the sake of code complexity.

I think that would still leave us with plenty of symbols down. Aren't the mem_dump symbols also mostly T symbols?

I think that would still leave us with plenty of symbols down. Aren't the mem_dump symbols also mostly T symbols?

We have no way to distinguish between T and D symbols requested by print_mem_dump (until they are found in the kallsyms file).

This implementation stores all symbols, or if a `requiredDataSymbolsOnly` flag is used when creating the symbol table, only non-data symbols are saved (and required data symbols must be registered before updating). This new implementation uses a generic symbol table implementation that is responsible for managing symbol lookups, and can be used by future code for managing exeutable file symbols.

github-actions bot assigned oshaked1 Dec 25, 2024

github-actions bot added area/ebpf area/testing area/events labels Dec 25, 2024

oshaked1 force-pushed the kallsyms branch 10 times, most recently from 165e3d5 to 8eabeb9 Compare December 26, 2024 14:36

NDStrahilevitz reviewed Dec 26, 2024

View reviewed changes

oshaked1 force-pushed the kallsyms branch from 8eabeb9 to 60ddf20 Compare December 26, 2024 15:19

oshaked1 force-pushed the kallsyms branch from 60ddf20 to 421eb9b Compare December 26, 2024 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ksymbols): reimplement ksymbols #4464

feat(ksymbols): reimplement ksymbols #4464

oshaked1 commented Dec 25, 2024 •

edited

Loading

NDStrahilevitz left a comment

NDStrahilevitz Dec 26, 2024

oshaked1 Dec 26, 2024

NDStrahilevitz Dec 26, 2024

NDStrahilevitz Dec 26, 2024

oshaked1 Dec 26, 2024

NDStrahilevitz Dec 27, 2024

NDStrahilevitz Dec 26, 2024

oshaked1 Dec 26, 2024

NDStrahilevitz Dec 26, 2024

oshaked1 Dec 26, 2024

NDStrahilevitz Dec 27, 2024

NDStrahilevitz Dec 26, 2024

oshaked1 Dec 26, 2024

NDStrahilevitz Dec 26, 2024

oshaked1 Dec 26, 2024

NDStrahilevitz Dec 26, 2024

oshaked1 Dec 26, 2024

NDStrahilevitz Dec 26, 2024

oshaked1 Dec 26, 2024

NDStrahilevitz Dec 27, 2024

NDStrahilevitz Dec 26, 2024

oshaked1 Dec 26, 2024

NDStrahilevitz Dec 26, 2024

oshaked1 Dec 26, 2024

oshaked1 Dec 26, 2024

NDStrahilevitz Dec 27, 2024

NDStrahilevitz Dec 26, 2024

oshaked1 Dec 26, 2024

NDStrahilevitz Dec 26, 2024

oshaked1 Dec 26, 2024

feat(ksymbols): reimplement ksymbols #4464

Are you sure you want to change the base?

feat(ksymbols): reimplement ksymbols #4464

Conversation

oshaked1 commented Dec 25, 2024 • edited Loading

1. Explain what the PR does

NDStrahilevitz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oshaked1 commented Dec 25, 2024 •

edited

Loading