-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kangal 1.6.1 K8s 1.24 race condition jmeter master/worker #332
Comments
I was running into this (#228) and worked around the issue inelegantly by pulling a custom JMeter master artifact with a sleep in launcher.sh. Glad to know I am not the only one who has experienced it. |
Also having this issue. |
@hattivatt |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Part of the fix for hellofresh#332
...before starting master process Part of the fix for hellofresh/kangal#332
Part of the fix for hellofresh#332
Part of the fix for hellofresh#332
...before starting master process Part of the fix for hellofresh/kangal#332
I am bumping up against a race condition between the jmeter master and worker pods. It used to be that most kangal runs configured themselves and succeeded. The workers would be
running
before the master. And the master was able to register all of the workers. However, 1 in 4 masters would come up before all of the workers and only register a portion of the workers.Recently I found the masters always
running
before the workers. Even if I only had 1 worker.I can see the workers in a
pending
state and then the master pod goespending
5-10 seconds later. But lately I've noticed the workers ininit
and the master is alreadyrunning
.This leads to 0 workers detected and I have to manually delete the master so it registers the worker pods. The backoff limit of the master job is hard coded to 1. So I can only kill the master pod once, before the job is considered a failure.
Configuration
Our jmeter pods are tainted so they run on nodes dedicated to their purpose. Karpenter 0.33.1 handles this for us. I am also using custom data.
Solution?
It seems like the master job shouldn't be scheduled until all of the workers are
running
.Work around
To work around this, a script submits the kangal load test and waits for the master job. The job is immediately patched to be suspended. The script waits until all workers
running
. Then the master job is unsuspended. This consistently prevents jmeter master from have too few workers registered.Conclusion
Is anyone running into this? If so, how are you dealing with it?
If it is widespread, can we look to the kangal controller for relief?
The text was updated successfully, but these errors were encountered: