Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG:prometheus-k8s 瞬间高占用,依次把所有节点全部打垮失联! #2261

Open
xiasf opened this issue Nov 17, 2023 · 0 comments

Comments

@xiasf
Copy link

xiasf commented Nov 17, 2023

KubeSphere 版本 : v3.3.2

prom/prometheus:v2.34.0

问题描述:使用一段时间后,经常发现有节点卡死失去响应,导致节点失联。一旦发生这个情况,很快其它节点接着依次出现卡死失联。在找了相关的资料后终于定位到了是 负载 prometheus-k8s 的问题,尝试限制过 它的 资源使用也还是偶而出现问题,最终停止这个负载,问题不再出现了。

附相关资料:

间歇性 每隔一会就把所有节点打垮,节点变得未就绪,使其完全卡死,只能在阿里控制台强制重启才能恢复。

https://blog.csdn.net/u012922005/article/details/127167188

https://blog.csdn.net/lyf0327/article/details/105868234

prometheus TSDB写错误,所以重启会疯狂攫取CPU内存资源,最后也无法读取TSDB数据,最终把节点系统资源耗尽,影响别的服务。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant