Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign metrics system with thread-safety in mind #2899

Open
alindima opened this issue Feb 14, 2022 · 7 comments
Open

Redesign metrics system with thread-safety in mind #2899

alindima opened this issue Feb 14, 2022 · 7 comments
Assignees
Labels
Status: Parked Indicates that an issues or pull request will be revisited later Type: Enhancement Indicates new feature requests

Comments

@alindima
Copy link
Contributor

alindima commented Feb 14, 2022

The metrics system is not fully thread-safe at the moment, due to some issues:

  1. IncMetrics inner state is mutated on serialisation. This causes race conditions when the write() function is called from multiple threads. See: Fix signal handler race condition on metrics.write() #2893
  2. While SharedIncMetrics use atomics, they always use Relaxed ordering. While on x86 memory access has Acquire-Release semantics, on Arm this is not the case. Hence, the process of writing metrics to file may use outdated values.
  3. Metrics are written from the signal handler, which may cause a deadlock if a thread is preempted by a signal while it was holding the metrics file lock.

Problems 1 and 3 can be fully solved by removing metrics usage from the signal handler. One option here is to have a special file used for logging the exit reason and the latest metrics values, similar to a coredump. We should also enforce that METRICS.write() is called from a single thread (and therefore removing the lazy_static declaration).

Problem 2 could be solved in two ways: by using tighter ordering constraints (need some further dive deep and may incur some overhead due to CPU reordering constraints and prevention of certain compiler optimisations) or by redesigning the metrics system to use per-thread values (this would also solve problem 1).

Another thing to keep in mind is potential need for having device-specific instances of a metric. For example, METRICS.net.tx_bytes_count may have sense to be reported per-device instance instead of being aggregated.

@alindima
Copy link
Contributor Author

Related to #1759. In order to remove the lazy_static usage we'd need to use per-component metrics. One option for the design is the one linked in the issue.

@acatangiu
Copy link
Contributor

+1 on each instance of each component to have its own metrics (so one could see how many packets a particular net device 2 has sent, for example).

Unfortunately, on the signal handler topic, this is not a fix, and will actually make things even worse (more constrained).

Currently, there is a issue of potential deadlock on a global metrics mutex, that can be retried in a signal handler even if it is already taken by same thread in normal (non-signal handler) context.
Moving the metrics from single global object to multiple individual components ultimately means more individual mutexes that can also deadlock if there is a path to them from signal handler.

E.g.: doing something like vmm.flush_all_metrics() involves iteratively locking each device to flush their metrics. Calling that in a signal handler results in a high chance of signal coming in while there is some emulation code running under some device's held lock, resulting in guaranteed deadlock.

Safest thing to do is just not flush/print metrics in signal handlers 😛

@acatangiu
Copy link
Contributor

Metrics themselves don't need locks since they're atomics, so a solution would also be to decouple the writer from the metric implementation.

Then you can use different writers for different contexts with no locking required between them. Or use a single global writer wrapped in a reentrant mutex. Or some other custom approach, the point being to outsource the challenge outside of the metrics implementation.

@raduiliescu
Copy link
Contributor

You definitely need to flush metrics, or at least one metric in a signal handler. Is important to not lose critical events as seccomp failures.
But one can have multiple write/dump functions. One to be called in normal context where everything is fine, and one for signal handlers where the code is in "emergency" situation.

Also I am a bit worried on the performance impact of atomic metrics. As discussed on the issue mentioned above, atomics for metrics on the hotpaths might add a degradation. Depending on architecture specifics, there we need to think alternatives like per thread metrics.

@alindima
Copy link
Contributor Author

Metrics themselves don't need locks since they're atomics, so a solution would also be to decouple the writer from the metric implementation.

This is similar to what I was proposing here (actually an idea from @alsrdn):

Problems 1 and 3 can be fully solved by removing metrics usage from the signal handler. One option here is to have a special file used for logging the exit reason and the latest metrics values, similar to a coredump.

The metrics lock is only for writing to the file. If we have another file that is used from the signal handler, then this problem is solved.

It is then a challenge to make the metrics system available to the signal handler without using a global variable (like with lazy_static). If we're keen on doing any metrics flushing from the signal handler I think we're stuck to using some globally accessible object, which may be fine if we solve the thread-safety issues.

In a nutshell, there are two big problems to be solved here:

  1. Making the metrics thread-safe or making them flushable only from one thread.
  2. Fixing the deadlock potential (which I believe we may only ever solve by writing a coredump-like file from the signal handler).

And indeed the per-component metrics add trouble, unless we have a sophisticated design where each component propagates the metric update to its parent whenever it becomes available (instead triggering it on a specific flush event).

But I don't really see how we can use per-component metrics in the signal handler anyway.

@alindima
Copy link
Contributor Author

Also I am a bit worried on the performance impact of atomic metrics. As discussed on the issue mentioned above, atomics for metrics on the hotpaths might add a degradation. Depending on architecture specifics, there we need to think alternatives like per thread metrics.

This is indeed a nice approach 👍🏻
It would also remove the race condition potential from the metrics serialisation, since at no point could two threads operate on the same metric value

@xmarcalx xmarcalx added the Roadmap: Tracked Items tracked on the roadmap project. label Mar 31, 2022
@pb8o pb8o pinned this issue Aug 16, 2022
@pb8o pb8o unpinned this issue Aug 16, 2022
@JonathanWoollett-Light JonathanWoollett-Light added Type: Fix Indicates a fix to existing code Type: Enhancement Indicates new feature requests and removed Codebase: Refactoring Type: Fix Indicates a fix to existing code labels Mar 23, 2023
@bchalios bchalios assigned bchalios and wearyzen and unassigned bchalios Oct 16, 2023
@xmarcalx xmarcalx added Status: Parked Indicates that an issues or pull request will be revisited later and removed Roadmap: Tracked Items tracked on the roadmap project. labels Jul 29, 2024
@xmarcalx
Copy link
Contributor

xmarcalx commented Jul 29, 2024

We removed the task for the roadmap because we are currently not planning to work on it due to higher priority tasks.
However we split #4709 from this task, which will help to define the first stepping stone toward the more broad refactor proposed in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Parked Indicates that an issues or pull request will be revisited later Type: Enhancement Indicates new feature requests
Projects
None yet
Development

No branches or pull requests

7 participants