Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant jitter increase from 3.6 to 3.17 #1806

Open
PhoenixV2 opened this issue Dec 10, 2024 · 5 comments · May be fixed by #1814
Open

Significant jitter increase from 3.6 to 3.17 #1806

PhoenixV2 opened this issue Dec 10, 2024 · 5 comments · May be fixed by #1814

Comments

@PhoenixV2
Copy link

PhoenixV2 commented Dec 10, 2024

Context

Performing Network stress testing on 24 ports simultaneously with iperf3.6 (what was available with sudo apt install iperf3) yielded consistent jitter rates of approximately 50 µs and packet drops of under 0.0006 %.
I upgraded iperf on the test equipment to 3.17 due to #1260. Since upgrading the rate of packet drops and jitter is significantly higher at 30% and 1 ms.

iperf3 -s -A {1:13} -p 5112 -B 192.168.1.10 -i 0.1 --one-off --forceflush
iperf3 --client 192.168.1.10 -A {1:13} -u -i 0.1 -b 300m --pacing-timer 1 -O 10 --R --forceflush

  • Version of iperf3:

Version 3.6 and iperf 3.17.1 (cJSON 1.7.15)

  • Hardware:

Same hardware between the two tests, so unlikely to be a factor

  • Operating system (and distribution, if any):

4.19.0-26-amd64 #1 SMP Debian 4.19.304-1 (2024-01-09) x86_64 GNU/Linux

  • Other relevant information (for example, non-default compilers,
    libraries, cross-compiling, etc.):

Bug Report

  • Expected Behavior

No change in performance between the two versions

  • Actual Behavior

Significant performance hit
In addition there was a significant increase in user level cpu usage % and decrease in kernel cpu usage. Previously it was about 10 % cpu and 90 % kernel, with the version change it moves to be about 50 % : 50 %

  1. Performance on 3.6

image

  1. Performance on 3.17

image

  • Steps to Reproduce

Was able to move between 3.6 and 3.17 and the same behaviour was demonstrated.

@davidBar-On
Copy link
Contributor

Are you using some iperf3 script that is running iperf3? I am asking since there are at least two wrong parameters: --R should be -R and -A format is -A n[,m]. Also I don't understand how with this one command you are testing 24 ports, so I suspect this is done by that script.

In addition, I assume that with both 3.6 and 3.17.1 the sending rate is 300M, so the issue is not that the 3.6 server's CPU could not keep up with the required sending rate.

In any case, that might be similar to issue #1707 that was fixed by #1787. Can you try build and run iperf3 from the master branch that includes that fix?

@PhoenixV2
Copy link
Author

PhoenixV2 commented Dec 10, 2024

You're correct - I run with a wrapper script that creates each instance of the connection. And I typo'd the --R, it is meant to be -R.

One note with building that latest master branch, on my system it still required manually triggering ldconfig to get the cache to rebuild and recognise libiperf.so.0

Running iperf 3.17.1+ (cJSON 1.7.15) the cpu utilisation significantly dropped, its not longer flooring to 100% (averaging around 60%) and the user side cpu consumption is basically gone.

Jitter rates are back to the expected range and packet loss across all ports is miniscule now!
image

Thanks for your help!

I'm curious as to when this latest master will be pushed to a released version?

@PhoenixV2
Copy link
Author

Running a subsequent 6 hour test showed a distinct shift at about 3 hours into the run which resulted in a significant spike in jitter.
As well the CPU loads for each core went back to %100 and user cpu load increased to ~63% on each core.
image

In addition the packet drop by the cpu failing to read the fifos in time sky rocketed:

Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
  eth9: 3726709691180 2501152309    0    0    0     0          0         0 3724796041113 2499867504    0    0    0     0       0          0
 eth13: 2824138562800 1895403827    0 241243296 241243296     0          0         0 2931477489598 1967438521    0    0    0     0       0          0
...
 eth17:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
  eth7: 3721595249194 2497719955    0 6326 6326     0          0         0 3725013189162 2500013350    0    0    0     0       0          0

@davidBar-On
Copy link
Contributor

By testing on my machine using -P 24 -i 0.1, it seems that there is a memory leak related to statistics reports. I am not sure whether this is a bug or intentional for keeping the statistics history, and I am still evaluating this.

I suspect that this memory leak may cause swapping after all memory is used and that this may be the cause of the high CPU usage after 3 hours. To test whether this is the case, can you run the test using -i 0.2 (or other value larger than 0.1)? If the memory leak is causing the problem, the problem should happen only after about 6 hours in this case.

It seems that each interval adds 400-500 bytes, so with 10 reports a second, this is 4-5KB/sec or about 250KB/minute or 15MB/hour per stream. Running 24 iperf3 in parallel that would be about 300-400MB/hour (on each client/server machine). Can you check the free memory while the test is running to see if these numbers are correct? If this is the case, how much free memory there is at the beginning of the test (e.g. after it is running for 2 minutes) and how long it is expected that all free memory will be used?

@davidBar-On
Copy link
Contributor

Submitted PR #1814 with a suggested fix for the memory leak.

I'm curious as to when this latest master will be pushed to a released version?

PR #1787 was merged into the master branch and is now part of release 3.18.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants