-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: show RayCluster's total resources #1748
Conversation
8fc88fb
to
83b60b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Just leave some questions.
newInstance.Status.DesiredCPUs = totalResources[corev1.ResourceCPU] | ||
newInstance.Status.DesiredMemory = totalResources[corev1.ResourceMemory] | ||
newInstance.Status.DesiredGPUs = getGPUs(totalResources) | ||
newInstance.Status.DesiredTPUs = totalResources[corev1.ResourceName("google.com/tpu")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richardsliu would you mind reviewing this? I am not sure whether corev1.ResourceName("google.com/tpu")
works or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should. I use it in unit test TestGetGPUs()
below.
@@ -2336,3 +2337,44 @@ func TestReconcile_Replicas_Optional(t *testing.T) { | |||
}) | |||
} | |||
} | |||
|
|||
func TestGetGPUs(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added this unit test. lmk if any other test I should add.
## Example ``` kubectl get rayclusters NAME DESIRED WORKERS AVAILABLE WORKERS CPUS MEMORY GPUS STATUS AGE sample 2 2 45300m 147840Mi 4 ready 4m58s ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Do we need to update inconsistentRayClusterStatus? If the information needs to be updated in a timely manner, we may need to do so. Otherwise, we can leave the function the same which could potentially reduce the workload on the Kubernetes API server.
It looks like updating an existing RayCluster's head or worker resources doesn't recreate the Pods with the change. So it seems like we don't need to update that function since the original resource values will never change? |
It makes sense. I'm not sure if there are any edge cases that could cause inconsistency between the total and actual resource information. I believe that in most cases, there will be no inconsistency between them with the existing |
Example
Checks