-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus snapshot not created when kubetest times out #1086
Comments
I also took some notes about that run:
I had two things in mind as a remedy:
Even though 2 seems to be much more generic, I don't remember any situation where Cl2 would timeout for reason other than control-plane unavailability. Because of that I think we should first implement 1 (which has other benefits, e.g. makes it easier for people outside scalability, it's a nice SLO to provide users) and then see whether 2 (or solutions you proposed) are really needed. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Prometheus snapshot is not created if kubetest times out. Snapshotting logic lives inside clusterloader, so when kubetest times out the logic is simply not executed. It is unfortunate, especially timeouts are situation, which we'd like to debug usually.
I was hit by this issue when tryring to debug: https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/88342/pull-kubernetes-e2e-gce-large-performance/1230477531426066432/
I see two options: 1) move snapshotting outside of the test (e.g. similarly to log dumping) 2) reconsider using Cortex (or any other solution that allows live recording of metrics).
@mm4tt - WDYT?
/priority important-soon
/area clusterloader
The text was updated successfully, but these errors were encountered: