Skip to content

Commit

Permalink
Add DCGMMonitor plugin document (#34)
Browse files Browse the repository at this point in the history
* docs: add anomaly detection algorithm docs

* docs: format anomaly detection algorithm docs

* docs: add OpenAIMonitor and LangChainMonitor plugin document

* docs: add DCGMMonitor plugin document

* docs: add DCGMMonitor plugin document

---------

Co-authored-by: wsy327643 <[email protected]>
  • Loading branch information
wangsiyuan-code and wsy327643 authored Nov 22, 2023
1 parent 94d02c5 commit 0850fc2
Show file tree
Hide file tree
Showing 14 changed files with 47 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/src/cn/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
- [JVM 性能监控](user-guide/integrations/jvm/jvm-performance-monitor.md)
- [OpenAIMonitor 插件](user-guide/integrations/openai/openai-monitor.md)
- [LangChainMonitor 插件](user-guide/integrations/langchain/langchain-monitor.md)
- [DCGMMonitor 插件](user-guide/integrations/dcgm/dcgm-monitor.md)

# 开发指南
- [构建]()
Expand Down
1 change: 1 addition & 0 deletions docs/src/cn/user-guide/integrations/base.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@
- [JVM 性能监控](jvm/jvm-performance-monitor.md)
- [OpenAIMonitor 插件](openai/openai-monitor.md)
- [LangChainMonitor 插件](langchain/langchain-monitor.md)
- [DCGMMonitor 插件](dcgm/dcgm-monitor.md)
20 changes: 20 additions & 0 deletions docs/src/cn/user-guide/integrations/dcgm/dcgm-monitor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# dcgmMonitor 插件
在您GPU机器上部署k8s环境,并且安装dcgm-exporter和Holoinsight-agent,具体安装方法见文档

[**dcgm-exporter**](https://github.com/NVIDIA/dcgm-exporter#quickstart-on-kubernetes)

[**holoinsight-agent**](https://traas-stack.github.io/holoinsight-docs/en/operations/deployment/k8s.html#deploy-holoinsight-agent)

安装好之后默认会采集GPU数据
打开页面 http://localhost:8080/integration/agentComp?tenant=default.

在集成组件页面安装DCGMMonitor插件
![dcgm1.png](dcgm1.png)
点击预览
![dcgm2.png](dcgm2.png)

可以自动生成dcgm监控仪表盘,监控GPU信息
![dcgm3.png](dcgm3.png)

![dcgm4.png](dcgm4.png)

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions docs/src/en/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@
- [JVM performance monitor](user-guide/integrations/jvm/jvm-performance-monitor.md)
- [OpenAIMonitor plugin](user-guide/integrations/openai/openai-monitor.md)
- [LangChainMonitor plugin](user-guide/integrations/langchain/langchain-monitor.md)
- [DCGMMonitor plugin](user-guide/integrations/dcgm/dcgm-monitor.md)


# Dev Guide
- [Project structure](dev-guide/project-structure.md)
Expand Down
1 change: 1 addition & 0 deletions docs/src/en/user-guide/integrations/base.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ This is a guide to using integration, including integration component libraries
- [JVM performance monitor](jvm/jvm-performance-monitor.md)
- [OpenAIMonitor plugin](openai/openai-monitor.md)
- [LangChainMonitor plugin](langchain/langchain-monitor.md)
- [DCGMMonitor plugin](dcgm/dcgm-monitor.md)
22 changes: 22 additions & 0 deletions docs/src/en/user-guide/integrations/dcgm/dcgm-monitor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# dcgmMonitor 插件
Deploy the k8s environment on your GPU machine, and install dcgm-exporter and Holoinsigh-Agent, as described in the documentation

[**dcgm-exporter**](https://github.com/NVIDIA/dcgm-exporter#quickstart-on-kubernetes)

[**holoinsight-agent**](https://traas-stack.github.io/holoinsight-docs/en/operations/deployment/k8s.html#deploy-holoinsight-agent)

By default, GPU data is collected after installation

Open page http://localhost:8080/integration/agentComp?tenant=default.

Install the DCGMMonitor plug-in on the Integration Components page

![dcgm1.png](dcgm1.png)
Click to preview
![dcgm2.png](dcgm2.png)

DCGMMonitor dashboards can be automatically generated to monitor GPU information
![dcgm3.png](dcgm3.png)

![dcgm4.png](dcgm4.png)

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 0850fc2

Please sign in to comment.