Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: show RayCluster's total resources #1748

Merged
merged 1 commit into from
Dec 23, 2023

Conversation

davidxia
Copy link
Contributor

@davidxia davidxia commented Dec 13, 2023

Example

kubectl get rayclusters
NAME     DESIRED WORKERS   AVAILABLE WORKERS   CPUS     MEMORY     GPUS   STATUS   AGE
sample   2                 2                   45300m   147840Mi   4      ready    4m58s

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@davidxia davidxia force-pushed the resources branch 3 times, most recently from 8fc88fb to 83b60b4 Compare December 21, 2023 22:00
Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Just leave some questions.

newInstance.Status.DesiredCPUs = totalResources[corev1.ResourceCPU]
newInstance.Status.DesiredMemory = totalResources[corev1.ResourceMemory]
newInstance.Status.DesiredGPUs = getGPUs(totalResources)
newInstance.Status.DesiredTPUs = totalResources[corev1.ResourceName("google.com/tpu")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardsliu would you mind reviewing this? I am not sure whether corev1.ResourceName("google.com/tpu") works or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should. I use it in unit test TestGetGPUs() below.

ray-operator/apis/ray/v1/raycluster_types.go Show resolved Hide resolved
@@ -2336,3 +2337,44 @@ func TestReconcile_Replicas_Optional(t *testing.T) {
})
}
}

func TestGetGPUs(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this unit test. lmk if any other test I should add.

@davidxia davidxia marked this pull request as ready for review December 21, 2023 22:37
## Example

```
kubectl get rayclusters
NAME     DESIRED WORKERS   AVAILABLE WORKERS   CPUS     MEMORY     GPUS   STATUS   AGE
sample   2                 2                   45300m   147840Mi   4      ready    4m58s
```
Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Do we need to update inconsistentRayClusterStatus? If the information needs to be updated in a timely manner, we may need to do so. Otherwise, we can leave the function the same which could potentially reduce the workload on the Kubernetes API server.

@davidxia
Copy link
Contributor Author

LGTM. Do we need to update inconsistentRayClusterStatus? If the information needs to be updated in a timely manner, we may need to do so. Otherwise, we can leave the function the same which could potentially reduce the workload on the Kubernetes API server.

It looks like updating an existing RayCluster's head or worker resources doesn't recreate the Pods with the change. So it seems like we don't need to update that function since the original resource values will never change?

@kevin85421
Copy link
Member

It looks like updating an existing RayCluster's head or worker resources doesn't recreate the Pods with the change. So it seems like we don't need to update that function since the original resource values will never change?

It makes sense. I'm not sure if there are any edge cases that could cause inconsistency between the total and actual resource information. I believe that in most cases, there will be no inconsistency between them with the existing inconsistentRayClusterStatus implementation. That's why I said "potentially" in #1748 (review). I will merge this PR after the CI passes.

@kevin85421 kevin85421 merged commit 3066e42 into ray-project:master Dec 23, 2023
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants