-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Admitted RayJobs remain in pending state when manageJobsWithoutQueueName
is true
#1568
Comments
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
I'm wondering if this is more related to #1434 or to the child-owner management. I think there has been numerous changes in Kueue to the child-parent management so would be good to re-test e2e if this remains a problem. |
cc @dgrove-oss @andrewsykim who recently worked on related aspects of the problem. |
/remove-lifecycle stale |
/assign I can take care of this, but if any other contributor also wants to have a look is more than fine for me 😊 |
Assuming kuberay links the RayJob and the RayCluster via a controller ref, then I agree this should work now. |
The submitter Job, that runs "ray job submit" may not be accounted for. But a reasonable workaround is adding labels to map that submitter Job to a specific local queue |
I've tested this on both
And I can see that after deploying the rayjob
The job is created and run
|
@mimowo: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What happened:
When a RayJob managed by Kueue configured with
manageJobsWithoutQueueName
is admitted, it remains in pending state.The Job that KubeRay creates to submit the actual job to the Ray cluster stays in suspended state.
What you expected to happen:
The RayJob should run successfully.
How to reproduce it (as minimally and precisely as possible):
manageJobsWithoutQueueName: true
is Kueue configurationAnything else we need to know?:
Relates to #1434.
Environment:
kubectl version
): v1.25.3git describe --tags --dirty --always
): v0.6.0-devel-146-ged81667f-dirtyThe text was updated successfully, but these errors were encountered: