Learning ebpf_exporter by adding examples/cgroup-rstat-flushing #457

netoptimizer · 2024-08-01T15:30:44Z

This PR implements example cgroup-rstat-flushing

For analyzing cgroup rstat flushing behavior and lock contention issue observed in production

Thanks to for @bobrik for reviewing as I need to learn how to use ebpf_exporter

For other to learn, I've kept some commits that show mistakes and followup commits that fixes these

For now this implements basic counting of cgroup rstat flushes per cgroup depth "level". Further developement depends on tracepoints added in kernel v6.10. Signed-off-by: Jesper Dangaard Brouer <[email protected]>

This implementation for counting locks with different "type" (normal vs yield) and state for contented isn't compatible with the way ebpf_exporter can extract data. The BPF code works, but the ebpf_exporter decoder isn't build for having a complex (e.g. struct) as map value. The 'counters:' metrics isn't build for this. The 'spans:' type can handle and decode map values being complex structs, but seems specific to ringbuf. Signed-off-by: Jesper Dangaard Brouer <[email protected]>

Example output via command: curl -s localhost:9435/metrics # HELP ebpf_exporter_cgroup_rstat_locked_state Number of times rstat lock was obtainted and contended state # TYPE ebpf_exporter_cgroup_rstat_locked_state counter ebpf_exporter_cgroup_rstat_locked_state{contended="0"} 174290 ebpf_exporter_cgroup_rstat_locked_state{contended="1"} 269 Signed-off-by: Jesper Dangaard Brouer <[email protected]>

Signed-off-by: Jesper Dangaard Brouer <[email protected]>

The 'counters:' metric should only be there one time. After this I see all the counters, like this example: # HELP ebpf_exporter_cgroup_rstat_flush_total Total number of times cgroup rstat were flushed (recorded per level) # TYPE ebpf_exporter_cgroup_rstat_flush_total counter ebpf_exporter_cgroup_rstat_flush_total{level="0"} 9480 ebpf_exporter_cgroup_rstat_flush_total{level="1"} 0 ebpf_exporter_cgroup_rstat_flush_total{level="2"} 396127 ebpf_exporter_cgroup_rstat_flush_total{level="3"} 0 ebpf_exporter_cgroup_rstat_flush_total{level="4"} 0 ebpf_exporter_cgroup_rstat_flush_total{level="5"} 0 # HELP ebpf_exporter_cgroup_rstat_locked_state Total number of times rstat lock was obtainted and contended state # TYPE ebpf_exporter_cgroup_rstat_locked_state counter ebpf_exporter_cgroup_rstat_locked_state{contended="0"} 405550 ebpf_exporter_cgroup_rstat_locked_state{contended="1"} 1063 # HELP ebpf_exporter_cgroup_rstat_locked_yield Number of times rstat lock was obtainted again after yield and contended state # TYPE ebpf_exporter_cgroup_rstat_locked_yield counter ebpf_exporter_cgroup_rstat_locked_yield{contended="0"} 1 ebpf_exporter_cgroup_rstat_locked_yield{contended="1"} 0 Signed-off-by: Jesper Dangaard Brouer <[email protected]>

# HELP ebpf_exporter_cgroup_rstat_lock_contended Lock contention counters per cgroup level # TYPE ebpf_exporter_cgroup_rstat_lock_contended counter ebpf_exporter_cgroup_rstat_lock_contended{level="0"} 36 ebpf_exporter_cgroup_rstat_lock_contended{level="1"} 847 ebpf_exporter_cgroup_rstat_lock_contended{level="2"} 415 ebpf_exporter_cgroup_rstat_lock_contended{level="3"} 0 ebpf_exporter_cgroup_rstat_lock_contended{level="4"} 0 ebpf_exporter_cgroup_rstat_lock_contended{level="5"} 0 Signed-off-by: Jesper Dangaard Brouer <[email protected]>

Thinking it will be more obvious for prometheus query to rename the counter cgroup_rstat_locked_state to highlight 'total' in name as cgroup_rstat_locked_total. Example of new naming: # HELP ebpf_exporter_cgroup_rstat_locked_total Total number of times rstat lock was obtainted and contended state # TYPE ebpf_exporter_cgroup_rstat_locked_total counter ebpf_exporter_cgroup_rstat_locked_total{contended="0"} 340710 ebpf_exporter_cgroup_rstat_locked_total{contended="1"} 1234 Signed-off-by: Jesper Dangaard Brouer <[email protected]>

examples/cgroup-rstat-flushing.bpf.c

examples/cgroup-rstat-flushing.yaml

bobrik · 2024-08-01T23:21:14Z

examples/cgroup-rstat-flushing.yaml

+          size: 4
+          decoders: # contended boolean converted to 0 and 1
+            - name: uint
+    - name: cgroup_rstat_locked_yield


cgroup_rstat_locked_yields_total, see https://prometheus.io/docs/practices/naming/

Thanks for the link , as a Prometheus newbie I didn't realize that the _suffix had a meaning.
I now realize that general strucure is library_name_unit_suffix.

Jakub recommended: Prometheus: Up & Running, 2nd Edition

examples/cgroup-rstat-flushing.yaml

bobrik · 2024-08-01T23:23:20Z

examples/cgroup-rstat-flushing.bpf.c

+#define MAX_CGRP_LEVELS	5
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);


The tabs are making clang-format unhappy in CI.

IMHO then BPF restricted-C code should be written with same tabs style as kernel code.
Lets change for formatting requirements, please!

Let's leave it for another issue. For now let's just make clang-format happy.

This projects .clang-format file states:
# Minimal format loosely matching the kernel style.
IMHO it doesn't match kernel style at all...

As it contains:

IndentWidth: 4 TabWidth: 4 UseTab: Never

Looking at kernel source in .clang-format this should have been:

IndentWidth: 8 TabWidth: 8 UseTab: Always

I will respect that this project requires its own special clang-format rule and obey to these, but I still think they are wrong.

examples/cgroup-rstat-flushing.bpf.c

bobrik · 2024-08-01T23:27:48Z

examples/cgroup-rstat-flushing.bpf.c

+		(*cnt)++;
+	}
+
+	/* What cgrp level is interesting, but I didn't manage to encode it in


nit: cgrp -> cgroup

Sure to nit.
I was hoping to get your input to the contents of the comment:

/* What cgrp level is interesting, but I didn't manage to encode it in * above counters. As contended case is the most interesting, have * level counter for contended. */

When coding BPF, I usually do one map lookup to get a stats struct, that I will update with multiple stats.
Here I end-up doing 3 map lookups for single counters.

When I started coding I though I could represent something like:

ebpf_exporter_cgroup_rstat_locked_total{contended="0", level="0"} 6558 ebpf_exporter_cgroup_rstat_locked_total{contended="1", level="0"} 0 ebpf_exporter_cgroup_rstat_locked_total{contended="0", level="1"} 34 ebpf_exporter_cgroup_rstat_locked_total{contended="1", level="1"} 123 ebpf_exporter_cgroup_rstat_locked_total{contended="0", level="2"} 0 ebpf_exporter_cgroup_rstat_locked_total{contended="1", level="2"} 0 ebpf_exporter_cgroup_rstat_locked_total{contended="0", level="3"} 0 ebpf_exporter_cgroup_rstat_locked_total{contended="1", level="3"} 0 ebpf_exporter_cgroup_rstat_locked_total{contended="0", level="4"} 0 ebpf_exporter_cgroup_rstat_locked_total{contended="1", level="4"} 0 ebpf_exporter_cgroup_rstat_locked_total{contended="0", level="5"} 0 ebpf_exporter_cgroup_rstat_locked_total{contended="1", level="5"} 0

Question: But ...

I cannot figure out how to get ebpf_exporter to generate this?

I don't know if these counters makes sense for Prometheus ?

I don't know if these counters makes sense for Prometheus ?

They do

I cannot figure out how to get ebpf_exporter to generate this?

Each line here is a separate key in a single map called ebpf_exporter_cgroup_rstat_locked_total:

ebpf_exporter_cgroup_rstat_locked_total{contended="0", level="0"} 6558 ebpf_exporter_cgroup_rstat_locked_total{contended="1", level="0"} 0 ebpf_exporter_cgroup_rstat_locked_total{contended="0", level="1"} 34 ebpf_exporter_cgroup_rstat_locked_total{contended="1", level="1"} 123 ebpf_exporter_cgroup_rstat_locked_total{contended="0", level="2"} 0 ebpf_exporter_cgroup_rstat_locked_total{contended="1", level="2"} 0 ebpf_exporter_cgroup_rstat_locked_total{contended="0", level="3"} 0 ebpf_exporter_cgroup_rstat_locked_total{contended="1", level="3"} 0 ebpf_exporter_cgroup_rstat_locked_total{contended="0", level="4"} 0 ebpf_exporter_cgroup_rstat_locked_total{contended="1", level="4"} 0 ebpf_exporter_cgroup_rstat_locked_total{contended="0", level="5"} 0 ebpf_exporter_cgroup_rstat_locked_total{contended="1", level="5"} 0

The mapping is effectively:

{contended="0", level="0"} -> 6558

{contended="1", level="0"} -> 0

{contended="0", level="1"} -> 34

...

To encode each key, you turn the raw bytes of the key into the values with the help of decoders.

Working backwards from what you expect on the output, you want something like this in the config:

- name: cgroup_rstat_locked_total help: Total number of times rstat lock was obtainted and contended state labels: - name: contended size: 1 decoders: - name: uint - name: level size: 1 decoders: - name: uint

On the kernel side you'd want something like this to represent the key:

struct key_t { u8 contended; u8 level; };

The names here do not matter, only sizes do (u8 corresponds to size: 1 in the config).

Given that your keys are complex structs and not small numbers, you cannot store these in an array, so you need a hashmap:

struct { __uint(type, BPF_MAP_TYPE_PERCPU_HASH); __uint(max_entries, 128); __type(key, struct key_t); __type(value, u64); } cgroup_rstat_locked_total SEC(".maps");

To actually increment a value, you need to construct the key and to the increment in the map:

// construct however you need to struct key_t key = {}; key.contended = 1; key.level = 2; increment_map_nosync(&cgroup_rstat_locked_total, &key, 1);

Here's an example with a complex key:

https://github.com/cloudflare/ebpf_exporter/blob/master/examples/pci.yaml

https://github.com/cloudflare/ebpf_exporter/blob/master/examples/pci.bpf.c

Hope this makes sense.

These suggestions worked for encoding more information ("labels" in Prometheus terms).
I've coded it up and pushed as part of this patchset/PR.
(p.s. I've also encoded "yield" state)

The current Ubuntu CI system doesn't have kernel v6.10 which contains the tracepoints cgroup-rstat-flushing uses. Thus, disable CI checks via CONFIGS_TO_IGNORE_IN_CHECK. Signed-off-by: Jesper Dangaard Brouer <[email protected]>

Signed-off-by: Jesper Dangaard Brouer <[email protected]>

Spell out CGRP to CGROUP. Replace MAX_CGRP_LEVELS -> MAX_CGROUP_LEVELS Signed-off-by: Jesper Dangaard Brouer <[email protected]>

Signed-off-by: Jesper Dangaard Brouer <[email protected]>

This allow us to remove the other maps. What a nice cleanup :-) Signed-off-by: Jesper Dangaard Brouer <[email protected]>

Signed-off-by: Jesper Dangaard Brouer <[email protected]>

bobrik · 2024-08-06T20:27:25Z

examples/cgroup-rstat-flushing.bpf.c

+#define MAX_CGRP_LEVELS	5
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);


Let's leave it for another issue. For now let's just make clang-format happy.

examples/cgroup-rstat-flushing.bpf.c

bobrik · 2024-08-06T20:32:26Z

examples/cgroup-rstat-flushing.bpf.c

+		lock_key.yield = 1;
+
+	lock_key.contended = contended;
+	lock_key.level = (level & 0xFF);


What does this do?

Variable level is u32 and lock_key.level u16, so this essentially a cast, and I'm also telling BPF verifier variable is bounded.
It shouldn't be needed, but my previous experience with clang/llvm and BPF-verifier, tells me that sometimes it is needed anyhow.
For verifier concerns, I think I can remove it here, because code is putting another bound on level via MAX_CGROUP_LEVELS.

I would normally just cast it explicitly: lock_key.level = (u16) level, which should have the same effect.

bobrik · 2024-08-06T20:33:17Z

examples/cgroup-rstat-flushing.bpf.c

+	u64 now = bpf_ktime_get_ns();
+	u64 pid = bpf_get_current_pid_tgid();
+	struct lock_key_t lock_key = { 0 };
+	u32 level = cgrp->level;


You can operate on lock_key.level directly, no need for a separate variable.

Below I'm using level and bounding it to MAX_CGROUP_LEVELS.
I prefer to keep level type u32 in this bounding code, and do the assignment later to lock_key.level as it is type u16.

examples/cgroup-rstat-flushing.bpf.c

Signed-off-by: Jesper Dangaard Brouer <[email protected]>

Adjusted to clang-format rule rules Signed-off-by: Jesper Dangaard Brouer <[email protected]>

Too many time measurements are in 1 usec bucket. Start buckets at 0.1 usec resolution to get more better resolution results. Signed-off-by: Jesper Dangaard Brouer <[email protected]>

as it is not longer in usec. Signed-off-by: Jesper Dangaard Brouer <[email protected]>

This have special clang-format rules, that needs to be obeyed. The .clang-format file states "it loosely matching the kernel style". IMHO it doesn't match kernel style as it contains: IndentWidth: 4 TabWidth: 4 UseTab: Never It it want to better match kernel style it should use: IndentWidth: 8 TabWidth: 8 UseTab: Always Signed-off-by: Jesper Dangaard Brouer <[email protected]>

Looking at other examples the name usually contains "latency". Thus name it: cgroup_rstat_flush_latency_seconds Signed-off-by: Jesper Dangaard Brouer <[email protected]>

Hopefully this doesn't create too many records, maximum is cgroups on the system. Per cgroup record average latency via two counters per cgroup flush_seconds_count and flush_seconds_sum. Also add a level label, as this allows up to query "without" the cgroup string, and we can get per level average flush latency cost. Signed-off-by: Jesper Dangaard Brouer <[email protected]>

bobrik · 2024-08-07T17:16:50Z

Let me know when it's ready to go. It would be helpful if you attached the exact metrics that you see locally with this running and capturing some events.

netoptimizer · 2024-08-08T12:41:56Z

Running this locally overnight, I see an issue with cgroup decoding: e.g cgroup="unknown_cgroup_id:2196814
Longer output:

ebpf_exporter_cgroup_rstat_flush_seconds_sum{cgroup="unknown_cgroup_id:2196562",level="2"} 224
ebpf_exporter_cgroup_rstat_flush_seconds_sum{cgroup="unknown_cgroup_id:2196598",level="2"} 305
ebpf_exporter_cgroup_rstat_flush_seconds_sum{cgroup="unknown_cgroup_id:2196670",level="2"} 172
ebpf_exporter_cgroup_rstat_flush_seconds_sum{cgroup="unknown_cgroup_id:2196706",level="2"} 167
ebpf_exporter_cgroup_rstat_flush_seconds_sum{cgroup="unknown_cgroup_id:2196742",level="2"} 210
[...]
curl -s localhost:9435/metrics | grep unknown_cgroup_id | grep _sum | wc -l
6043

Something on my system seems to be creating cgroups for a short period and ebpf_exporter is picking these up

I assume what happens is that:
when decoding these for (Prometheus) output, they are gone and we get these "unknown_cgroup_id"
(for completeness: I think pmcd / pmlogger is causing this, but it is irrelevant)

I'm want to change the code such that this per-cgroup stats cannot explode like this

considering using a LRU map with a limit (e.g. 1024)
as we are primarily interested in the most active cgroups
but as cadvisor walks all cgroups, then we have to be careful choosing a limit that is always above number of cgroups in system to avoid LRU trashing corner-cases

netoptimizer · 2024-08-08T12:46:24Z

Regarding above "unknown_cgroup_id:nnn" issue (Cc @bobrik )

Can the cgroup decoder be instructed to exclude unknown cgroups in the output?

or only make a single "unknown_cgroup" output record?

Signed-off-by: Jesper Dangaard Brouer <[email protected]>

bobrik · 2024-08-12T16:18:14Z

Something on my system seems to be creating cgroups for a short period and ebpf_exporter is picking these up

Does ebpf_exporter say anything about fanotify (#244) on startup? Even short lived cgroups should be resolved when fanotify is available.

Can the cgroup decoder be instructed to exclude unknown cgroups in the output?
or only make a single "unknown_cgroup" output record?

Not in the current state, but it seems like a good idea: #460.

Another option, which might not be possible: check for cgroup age and if it's less than some threshold (say 10s), set cgroup_id to zero to do the collapsing manually.

netoptimizer · 2024-08-12T18:21:12Z

Something on my system seems to be creating cgroups for a short period and ebpf_exporter is picking these up

Does ebpf_exporter say anything about fanotify (#244) on startup? Even short lived cgroups should be resolved when fanotify is available.

I don't think fanotify is avail/configured on my testlab.

Can the cgroup decoder be instructed to exclude unknown cgroups in the output?
or only make a single "unknown_cgroup" output record?

Not in the current state, but it seems like a good idea: #460.

Awesome that you created a ticket for this 😃

bobrik · 2024-08-13T16:37:37Z

I don't think fanotify is avail/configured on my testlab.

It's a matter of having v6.6 or newer: torvalds/linux@0ce7c12e88cf. There isn't anything to configure. As long as ebpf_exporter doesn't say anything about fanotify at startup, it is using it.

Resolution of cgroup_rstat_flush_seconds_{sum,count} was wrong in code, this was in 0.1 usec. Fix by changing this to nanosec resolution. Signed-off-by: Jesper Dangaard Brouer <[email protected]>

Noticed histrogram field _sum was always zero. Function (define) _increment_histogram uses max_bucket+1 for storing the sum counter. Because dealing with maps of array type (BPF_MAP_TYPE_PERCPU_ARRAY) we need an additional +1 (now +2) when creating the map. Signed-off-by: Jesper Dangaard Brouer <[email protected]>

Using read_array_ptr() isn't correct for LRU maps as they can fail the lookup. Always using bpf_map_update_elem() on an LRU map is expensive, because kernel will always create a new element (and insert it before old element in list) and copy over key and value data into new element. Doing a lookup first activate less complex kernel code (that just sets a ref bit on the element, which keeps in on the active list). Signed-off-by: Jesper Dangaard Brouer <[email protected]>

Timestamps helpers had use case with more complex key. Signed-off-by: Jesper Dangaard Brouer <[email protected]>

Signed-off-by: Jesper Dangaard Brouer <[email protected]>

Observed this on live system: ebpf_exporter_cgroup_rstat_map_errors_total{type="no_elem"} 239 The issue seems to be the order the tracepoints gets activated. E.g. tracepoints using get_timestamp() gets activiated before those creating timestamps via record_timestamp. The fix is to check that the timestamp from get_timestamp() isn't zero. Signed-off-by: Jesper Dangaard Brouer <[email protected]>

Recording two counters per cgroup generates too much data for prometheus. For troubleshooting this will be a practical feature, but don't enable this on all servers per default. Signed-off-by: Jesper Dangaard Brouer <[email protected]>

The different cgroup levels have different flush completion times. Thus add per level in ebpf_exporter_cgroup_rstat_flush_latency latency tracking histogram via level label. This generates less data than CONFIG_TRACK_PER_CGROUP_FLUSH. And gives us an indicator if some levels are showing an abnormal latency pattern. For troubleshooting the individual cgroups the CONFIG_TRACK_PER_CGROUP_FLUSH can be enabled. Signed-off-by: Jesper Dangaard Brouer <[email protected]>

netoptimizer · 2024-08-19T13:34:54Z

Some example output running this in "localhost" mode

e.g. on port 9435
extracted via command: curl -s localhost:9435/metrics
kernel 6.6.30-cloudflare-2024.5.4

# HELP ebpf_exporter_cgroup_rstat_locked_total Times rstat lock was obtainted with state for cgroup level, contended and yield
# TYPE ebpf_exporter_cgroup_rstat_locked_total counter
ebpf_exporter_cgroup_rstat_locked_total{contended="0",level="0",yield="0"} 24281
ebpf_exporter_cgroup_rstat_locked_total{contended="0",level="0",yield="1"} 7351
ebpf_exporter_cgroup_rstat_locked_total{contended="0",level="1",yield="0"} 14533
ebpf_exporter_cgroup_rstat_locked_total{contended="0",level="1",yield="1"} 39
ebpf_exporter_cgroup_rstat_locked_total{contended="0",level="2",yield="0"} 175181
ebpf_exporter_cgroup_rstat_locked_total{contended="0",level="2",yield="1"} 89
ebpf_exporter_cgroup_rstat_locked_total{contended="0",level="3",yield="0"} 272801
ebpf_exporter_cgroup_rstat_locked_total{contended="0",level="3",yield="1"} 119
ebpf_exporter_cgroup_rstat_locked_total{contended="0",level="4",yield="0"} 1805
ebpf_exporter_cgroup_rstat_locked_total{contended="1",level="0",yield="0"} 86951
ebpf_exporter_cgroup_rstat_locked_total{contended="1",level="0",yield="1"} 1.2791901e+07
ebpf_exporter_cgroup_rstat_locked_total{contended="1",level="1",yield="0"} 73
ebpf_exporter_cgroup_rstat_locked_total{contended="1",level="1",yield="1"} 3864
ebpf_exporter_cgroup_rstat_locked_total{contended="1",level="2",yield="0"} 722
ebpf_exporter_cgroup_rstat_locked_total{contended="1",level="2",yield="1"} 22764
ebpf_exporter_cgroup_rstat_locked_total{contended="1",level="3",yield="0"} 1077
ebpf_exporter_cgroup_rstat_locked_total{contended="1",level="3",yield="1"} 34344
ebpf_exporter_cgroup_rstat_locked_total{contended="1",level="4",yield="0"} 10
ebpf_exporter_cgroup_rstat_locked_total{contended="1",level="4",yield="1"} 399

The build check bots cannot handle include <errno.h> it fails with: In file included from /usr/include/errno.h:25: In file included from /usr/include/features.h:526: /usr/include/x86_64-linux-gnu/gnu/stubs.h:7:11: fatal error: 'gnu/stubs-32.h' file not found # include <gnu/stubs-32.h> The fix is to include <linux/errno.h> Signed-off-by: Jesper Dangaard Brouer <[email protected]>

netoptimizer and others added 5 commits July 29, 2024 12:55

Add examples/cgroup-rstat-flushing basic flush counting

20e65de

For now this implements basic counting of cgroup rstat flushes per cgroup depth "level". Further developement depends on tracepoints added in kernel v6.10. Signed-off-by: Jesper Dangaard Brouer <[email protected]>

cgroup-rstat-flushing: Add seperate counter for yield case

c8bbfaa

Signed-off-by: Jesper Dangaard Brouer <[email protected]>

netoptimizer force-pushed the rstat_lock_tp02 branch from 3b26bbe to 727a241 Compare August 1, 2024 15:59

netoptimizer force-pushed the rstat_lock_tp02 branch from 72b7a2f to c6ff257 Compare August 1, 2024 16:48

bobrik reviewed Aug 1, 2024

View reviewed changes

netoptimizer added 8 commits August 2, 2024 10:41

cgroup-rstat-flushing: CI system missing tracepoint

99c6b9c

The current Ubuntu CI system doesn't have kernel v6.10 which contains the tracepoints cgroup-rstat-flushing uses. Thus, disable CI checks via CONFIGS_TO_IGNORE_IN_CHECK. Signed-off-by: Jesper Dangaard Brouer <[email protected]>

cgroup-rstat-flushing: trivial whitespace fix in yaml

f5dabd2

Signed-off-by: Jesper Dangaard Brouer <[email protected]>

cgroup-rstat-flushing: Fix type for level_key used in map lookup

623c9ad

Signed-off-by: Jesper Dangaard Brouer <[email protected]>

cgroup-rstat-flushing: use longer define name

bd96abc

Spell out CGRP to CGROUP. Replace MAX_CGRP_LEVELS -> MAX_CGROUP_LEVELS Signed-off-by: Jesper Dangaard Brouer <[email protected]>

cgroup-rstat-flushing: Fix comment spell out cgrp to cgroup

e30576c

Signed-off-by: Jesper Dangaard Brouer <[email protected]>

cgroup-rstat-flushing: use increment_map_nosync

7168fd3

Signed-off-by: Jesper Dangaard Brouer <[email protected]>

cgroup-rstat-flushing: Attempt with complex lock_key type

9959e13

Signed-off-by: Jesper Dangaard Brouer <[email protected]>

cgroup-rstat-flushing: also encode yield in lock_key_t

ee3de80

This allow us to remove the other maps. What a nice cleanup :-) Signed-off-by: Jesper Dangaard Brouer <[email protected]>

netoptimizer force-pushed the rstat_lock_tp02 branch from 328bcba to ee3de80 Compare August 6, 2024 13:08

cgroup-rstat-flushing: cleanups

eb44da9

Signed-off-by: Jesper Dangaard Brouer <[email protected]>

bobrik reviewed Aug 6, 2024

View reviewed changes

netoptimizer added 5 commits August 7, 2024 11:22

cgroup-rstat-flushing: measure time waiting for lock

c32df50

Signed-off-by: Jesper Dangaard Brouer <[email protected]>

cgroup-rstat-flushing: add lock hold time histogram

b02f687

Adjusted to clang-format rule rules Signed-off-by: Jesper Dangaard Brouer <[email protected]>

cgroup-rstat-flushing: adjust histogram resolution to 0.1 microsec

dc4ba3e

Too many time measurements are in 1 usec bucket. Start buckets at 0.1 usec resolution to get more better resolution results. Signed-off-by: Jesper Dangaard Brouer <[email protected]>

cgroup-rstat-flushing: rename variable delta_usec

eaba651

as it is not longer in usec. Signed-off-by: Jesper Dangaard Brouer <[email protected]>

netoptimizer force-pushed the rstat_lock_tp02 branch from e5c4870 to 824cb1d Compare August 7, 2024 09:38

netoptimizer added 2 commits August 7, 2024 13:56

cgroup-rstat-flushing: Add histogram for flush time latency

d9a98b3

Looking at other examples the name usually contains "latency". Thus name it: cgroup_rstat_flush_latency_seconds Signed-off-by: Jesper Dangaard Brouer <[email protected]>

cgroup-rstat-flushing: Adjustments to obey clang-format rules

62bdba0

Signed-off-by: Jesper Dangaard Brouer <[email protected]>

netoptimizer added 7 commits August 15, 2024 20:52

cgroup-rstat-flushing: cgroup_rstat_flush_nanoseconds_{sum,count}

b9cf32c

Resolution of cgroup_rstat_flush_seconds_{sum,count} was wrong in code, this was in 0.1 usec. Fix by changing this to nanosec resolution. Signed-off-by: Jesper Dangaard Brouer <[email protected]>

cgroup-rstat-flushing: extend timestampe helpers with key

b80cdd8

Timestamps helpers had use case with more complex key. Signed-off-by: Jesper Dangaard Brouer <[email protected]>

cgroup-rstat-flushing: set MAX_ERROR_TYPES correctly

7a12be4

Signed-off-by: Jesper Dangaard Brouer <[email protected]>

netoptimizer force-pushed the rstat_lock_tp02 branch from 6fed241 to 3397837 Compare August 19, 2024 09:10

netoptimizer force-pushed the rstat_lock_tp02 branch from 1f2bf1e to c5e15de Compare August 19, 2024 18:19

bobrik approved these changes Aug 19, 2024

View reviewed changes

bobrik merged commit 93a204b into cloudflare:master Aug 19, 2024
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learning ebpf_exporter by adding examples/cgroup-rstat-flushing #457

Learning ebpf_exporter by adding examples/cgroup-rstat-flushing #457

netoptimizer commented Aug 1, 2024 •

edited

Loading

bobrik Aug 1, 2024

netoptimizer Aug 2, 2024

bobrik Aug 1, 2024

netoptimizer Aug 2, 2024

bobrik Aug 6, 2024

netoptimizer Aug 7, 2024

bobrik Aug 1, 2024

netoptimizer Aug 2, 2024 •

edited

Loading

bobrik Aug 5, 2024

netoptimizer Aug 6, 2024

bobrik Aug 6, 2024

bobrik Aug 6, 2024

netoptimizer Aug 7, 2024

bobrik Aug 7, 2024

bobrik Aug 6, 2024

netoptimizer Aug 7, 2024

bobrik commented Aug 7, 2024

netoptimizer commented Aug 8, 2024 •

edited

Loading

netoptimizer commented Aug 8, 2024

bobrik commented Aug 12, 2024

netoptimizer commented Aug 12, 2024

bobrik commented Aug 13, 2024

netoptimizer commented Aug 19, 2024 •

edited

Loading

Learning ebpf_exporter by adding examples/cgroup-rstat-flushing #457

Learning ebpf_exporter by adding examples/cgroup-rstat-flushing #457

Conversation

netoptimizer commented Aug 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netoptimizer Aug 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bobrik commented Aug 7, 2024

netoptimizer commented Aug 8, 2024 • edited Loading

netoptimizer commented Aug 8, 2024

bobrik commented Aug 12, 2024

netoptimizer commented Aug 12, 2024

bobrik commented Aug 13, 2024

netoptimizer commented Aug 19, 2024 • edited Loading

netoptimizer commented Aug 1, 2024 •

edited

Loading

netoptimizer Aug 2, 2024 •

edited

Loading

netoptimizer commented Aug 8, 2024 •

edited

Loading

netoptimizer commented Aug 19, 2024 •

edited

Loading