Kangal 1.6.1 K8s 1.24 race condition jmeter master/worker #332

flah00 · 2024-02-12T22:06:45Z

I am bumping up against a race condition between the jmeter master and worker pods. It used to be that most kangal runs configured themselves and succeeded. The workers would be running before the master. And the master was able to register all of the workers. However, 1 in 4 masters would come up before all of the workers and only register a portion of the workers.

Recently I found the masters always running before the workers. Even if I only had 1 worker.

I can see the workers in a pending state and then the master pod goes pending 5-10 seconds later. But lately I've noticed the workers in init and the master is already running.
This leads to 0 workers detected and I have to manually delete the master so it registers the worker pods. The backoff limit of the master job is hard coded to 1. So I can only kill the master pod once, before the job is considered a failure.

Configuration

Our jmeter pods are tainted so they run on nodes dedicated to their purpose. Karpenter 0.33.1 handles this for us. I am also using custom data.

Solution?

It seems like the master job shouldn't be scheduled until all of the workers are running.

Work around

To work around this, a script submits the kangal load test and waits for the master job. The job is immediately patched to be suspended. The script waits until all workers running. Then the master job is unsuspended. This consistently prevents jmeter master from have too few workers registered.

Conclusion

Is anyone running into this? If so, how are you dealing with it?
If it is widespread, can we look to the kangal controller for relief?

The text was updated successfully, but these errors were encountered:

jasonbgilroy · 2024-02-12T23:11:31Z

I was running into this (#228) and worked around the issue inelegantly by pulling a custom JMeter master artifact with a sleep in launcher.sh. Glad to know I am not the only one who has experienced it.

hattivatt · 2024-02-14T13:38:46Z

Also having this issue.
@jasonbgilroy how long is a sleep you adding into launcher.sh ? I was experimenting with small values (like 10 to 20 seconds), but seems like it's still not enough

flah00 · 2024-02-14T14:13:28Z

@hattivatt
I have found that it takes about 1 minute for 1 jmeter worker to transition to running. I think it generally takes closer to 2 minutes for more than 1 worker to transition.

stale · 2024-03-28T09:10:14Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Part of the fix for hellofresh#332

...before starting master process Part of the fix for hellofresh/kangal#332

Part of the fix for hellofresh#332

...before starting master process Part of the fix for hellofresh/kangal#332

stale bot added the stale label Mar 28, 2024

lucasmdrs added pinned issues that should be kept open and removed stale labels Mar 28, 2024

thiagoarrais added a commit to thiagoarrais/kangal that referenced this issue Sep 17, 2024

Exposes expected number of workers to JMeter master pod

a30eb9d

Part of the fix for hellofresh#332

thiagoarrais added a commit to thiagoarrais/kangal-jmeter that referenced this issue Sep 17, 2024

Validates whether all workers are ready...

e31bc8b

...before starting master process Part of the fix for hellofresh/kangal#332

thiagoarrais mentioned this issue Sep 17, 2024

Exposes expected number of workers to JMeter master #368

Open

thiagoarrais added a commit to thiagoarrais/kangal that referenced this issue Sep 17, 2024

Exposes expected number of workers to JMeter master pod

c2dff2a

Part of the fix for hellofresh#332

thiagoarrais linked a pull request Sep 17, 2024 that will close this issue

WIP: Wait for workers before starting master hellofresh/kangal-jmeter#32

Draft

thiagoarrais added a commit to thiagoarrais/kangal that referenced this issue Dec 11, 2024

Exposes expected number of workers to JMeter master pod

08bae14

Part of the fix for hellofresh#332

thiagoarrais added a commit to thiagoarrais/kangal-jmeter that referenced this issue Dec 11, 2024

Validates whether all workers are ready...

2c2413a

...before starting master process Part of the fix for hellofresh/kangal#332

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kangal 1.6.1 K8s 1.24 race condition jmeter master/worker #332

Kangal 1.6.1 K8s 1.24 race condition jmeter master/worker #332

flah00 commented Feb 12, 2024

jasonbgilroy commented Feb 12, 2024

hattivatt commented Feb 14, 2024

flah00 commented Feb 14, 2024

stale bot commented Mar 28, 2024

Kangal 1.6.1 K8s 1.24 race condition jmeter master/worker #332

Kangal 1.6.1 K8s 1.24 race condition jmeter master/worker #332

Comments

flah00 commented Feb 12, 2024

Configuration

Solution?

Work around

Conclusion

jasonbgilroy commented Feb 12, 2024

hattivatt commented Feb 14, 2024

flah00 commented Feb 14, 2024

stale bot commented Mar 28, 2024