diff --git a/docs/docs-cn/source/1.guide_cn.md b/docs/docs-cn/source/1.guide.md similarity index 100% rename from docs/docs-cn/source/1.guide_cn.md rename to docs/docs-cn/source/1.guide.md diff --git a/docs/docs-cn/source/3.quick_start/1.quick_start.md b/docs/docs-cn/source/3.quick_start/1.quick_start.md index a34f120be..cd34e8b16 100644 --- a/docs/docs-cn/source/3.quick_start/1.quick_start.md +++ b/docs/docs-cn/source/3.quick_start/1.quick_start.md @@ -19,15 +19,19 @@ cd tugraph-analytics/ ``` ## 本地运行流图作业 + 下面介绍如何在本地环境运行一个实时环路查找的图计算作业。 -### Demo1 从本地文件读取数据[1.quick_start_copy.md](..%2F..%2F..%2Fdocs-en%2Fsource%2F3.quick_start%2F1.quick_start_copy.md) +### Demo1 从本地文件读取数据[1.quick_start_copy.md] + 1. 直接运行脚本即可: + ```shell bin/gql_submit.sh --gql geaflow/geaflow-examples/gql/loop_detection_file_demo.sql ``` 其中 loop_detection_file_demo.sql 是一段实时查询图中所有四度环路的 DSL 计算作业,其内容如下: + ```sql set geaflow.dsl.window.size = 1; set geaflow.dsl.ignore.exception = true; @@ -102,11 +106,13 @@ INSERT INTO tbl_result RETURN a.id as a_id, b.id as b_id, c.id as c_id, d.id as d_id, a.id as a1_id ); ``` -该 DSL 会从项目中的resource文件 **demo_job_data.txt** 中读取点边数据,进行构图,然后计算图中所有的 4 度的环路, 并将环路上的点 id 输出到 + +该 DSL 会从项目中的 resource 文件 **demo_job_data.txt** 中读取点边数据,进行构图,然后计算图中所有的 4 度的环路, 并将环路上的点 id 输出到 /tmp/geaflow/demo_job_result, 用户也可通过修改 `geaflow.dsl.file.path` 参数自定义输出路径。 2. 输出结果如下 + ``` 2,3,4,1,2 4,1,2,3,4 @@ -114,15 +120,17 @@ INSERT INTO tbl_result 1,2,3,4,1 ``` -### Demo2 交互式使用socket读取数据 +### Demo2 交互式使用 socket 读取数据 + 用户也可自己在命令台输入数据,实时进行构图。 + 1. 运行脚本: ```shell bin/gql_submit.sh --gql geaflow/geaflow-examples/gql/loop_detection_socket_demo.sql ``` -loop_detection_socket_demo.sql 主要区别是source表是通过socket进行读取: +loop_detection_socket_demo.sql 主要区别是 source 表是通过 socket 进行读取: ```sql CREATE TABLE IF NOT EXISTS tbl_source ( @@ -198,15 +206,15 @@ socket 服务启动后,控制台显示如下信息: ![ide_socket_server_more](../../../static/img/quick_start/ide_socket_server_more.png) -4. 访问可视化dashboard页面 +4. 访问可视化 dashboard 页面 -本地模式的进程会占用本地的8090和8088端口,附带一个可视化页面。 +本地模式的进程会占用本地的 8090 和 8088 端口,附带一个可视化页面。 在浏览器中输入 http://localhost:8090 即可访问前端页面。 ![dashboard_overview](../../../static/img/dashboard/dashboard_overview.png) -关于更多dashboard相关的内容,请参考文档: +关于更多 dashboard 相关的内容,请参考文档: [文档](../7.deploy/3.dashboard.md) ## GeaFlow Console 快速上手 @@ -214,10 +222,11 @@ socket 服务启动后,控制台显示如下信息: GeaFlow Console 是 GeaFlow 提供的图计算研发平台,我们将介绍如何在 Docker 容器里面启动 GeaFlow Console 平台,提交流图计算作业。文档地址: [文档](2.quick_start_docker.md) -## GeaFlow Kubernetes Operator快速上手 -Geaflow Kubernetes Operator是一个可以快速将Geaflow应用部署到kubernetes集群中的部署工具。 -我们将介绍如何通过Helm安装geaflow-kubernetes-operator,通过yaml文件快速提交geaflow作业, -并访问operator的dashboard页面查看集群下的作业状态。文档地址: +## GeaFlow Kubernetes Operator 快速上手 + +Geaflow Kubernetes Operator 是一个可以快速将 Geaflow 应用部署到 kubernetes 集群中的部署工具。 +我们将介绍如何通过 Helm 安装 geaflow-kubernetes-operator,通过 yaml 文件快速提交 geaflow 作业, +并访问 operator 的 dashboard 页面查看集群下的作业状态。文档地址: [文档](../7.deploy/2.quick_start_operator.md) ## 使用 G6VP 进行流图计算作业可视化 diff --git a/docs/docs-en/source/7.deploy/5.install_llm.md b/docs/docs-en/source/7.deploy/5.install_llm.md index 60d56ef39..c660b824e 100644 --- a/docs/docs-en/source/7.deploy/5.install_llm.md +++ b/docs/docs-en/source/7.deploy/5.install_llm.md @@ -1,13 +1,16 @@ # LLM Local Deployment + The users have the capability to locally deploy extensive models as a service. The complete process, encompassing downloading pre-trained models, deploying them as a service, and debugging, is described in the following steps. It is essential for the user's machine to have Docker installed and be granted access to the repository containing these large models. - - ## Step 1: Download the Model File - The pre-trained large model file has been uploaded to the [Hugging Face repository](https://huggingface.co/tugraph/CodeLlama-7b-GQL-hf). Please proceed with downloading and locally unzipping the model file. -![hugging](../../static/img/llm_hugging_face.png) - ## Step 2: Prepare the Docker Container Environment +## Step 1: Download the Model File + +The pre-trained large model file has been uploaded to the [Hugging Face repository](https://huggingface.co/tugraph/CodeLlama-7b-GQL-hf). Please proceed with downloading and locally unzipping the model file. +![hugging](../../../static/img/llm_hugging_face.png) + +## Step 2: Prepare the Docker Container Environment + 1. Run the following command on the terminal to download the Docker image required for model servicing: - + ``` docker pull tugraph/llam_infer_service:0.0.1 @@ -15,23 +18,25 @@ docker pull tugraph/llam_infer_service:0.0.1 docker images ``` - + 2. Run the following command to start the Docker container: - + ``` -docker run -it --name ${Container name} -v ${Local model path}:${Container model path} -p ${Local port}:${Container service port} -d ${Image name} +docker run -it --name ${Container name} -v ${Local model path}:${Container model path} -p ${Local port}:${Container service port} -d ${Image name} // Such as docker run -it --name my-model-container -v /home/huggingface:/opt/huggingface -p 8000:8000 -d llama_inference_server:v1 // Check whether the container is running properly -docker ps +docker ps ``` Here, we map the container's port 8000 to the local machine's port 8000, mount the directory where the local model (/home/huggingface) resides to the container's path (/opt/huggingface), and set the container name to my-model-container. ## Step 3: Model Service Deployment + 1. Model transformation + ``` // Enter the container you just created docker exec -it ${container_id} bash @@ -40,11 +45,12 @@ docker exec -it ${container_id} bash cd /opt/llama_cpp python3 ./convert.py ${Container model path} ``` + When the execution is complete, a file with the prefix ggml-model is generated under the container model path. -![undefined](../../static/img/llm_ggml_model.png) +![undefined](../../../static/img/llm_ggml_model.png) 2. Model quantization (optional) -Take the llam2-7B model as an example: By default, the accuracy of the model converted by convert.py is F16 and the model size is 13.0GB. If the current machine resources cannot satisfy such a large model inference, the converted model can be further quantized by./quantize. + Take the llam2-7B model as an example: By default, the accuracy of the model converted by convert.py is F16 and the model size is 13.0GB. If the current machine resources cannot satisfy such a large model inference, the converted model can be further quantized by./quantize. ``` // As shown below, q4_0 quantizes the original model to int4 and compresses the model size to 3.5GB @@ -52,11 +58,13 @@ Take the llam2-7B model as an example: By default, the accuracy of the model con cd /opt/llama_cpp ./quantize ${Default generated F16 model path} ${Quantized model path} q4_0 ``` + The following are reference indicators such as the size and reasoning speed of the quantized model: -![undefined](../../static/img/llm_quantization_table.png) +![undefined](../../../static/img/llm_quantization_table.png) 3. Model servicing -Run the following command to deploy the above generated model as a service, and specify the address and port of the service binding through the parameters: + Run the following command to deploy the above generated model as a service, and specify the address and port of the service binding through the parameters: + ``` // ./server -h. You can view parameter details // ${ggml-model...file} The file name prefixes the generated ggml-model @@ -69,7 +77,7 @@ cd /opt/llama_cpp ``` 4. Debugging service -Send an http request to the service address, where "prompt" is the query statement and "content" is the inference result. + Send an http request to the service address, where "prompt" is the query statement and "content" is the inference result. ``` curl --request POST \ @@ -77,8 +85,7 @@ curl --request POST \ --header "Content-Type: application/json" \ --data '{"prompt": "请返回小红的10个年龄大于20的朋友","n_predict": 128}' ``` + Debugging service The following is the model inference result after service deployment: -![undefined](../../static/img/llm_chat_result.png) - - \ No newline at end of file +![undefined](../../../static/img/llm_chat_result.png)