diff --git a/.readthedocs.yaml b/.readthedocs.yaml deleted file mode 100644 index 6fe68c59a..000000000 --- a/.readthedocs.yaml +++ /dev/null @@ -1,19 +0,0 @@ -# .readthedocs.yaml -# Read the Docs configuration file -# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details - -# Required -version: 2 - -# Set the version of Python and other tools you might need -build: - os: ubuntu-22.04 - tools: - python: "3.8" -mkdocs: - configuration: mkdocs.yml - -# Optionally declare the Python requirements required to build your docs -python: - install: - - requirements: docs/requirements.txt \ No newline at end of file diff --git a/README.md b/README.md index 45add6337..9b9577393 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# TuGraph Analytics +# Guide [![Star](https://shields.io/github/stars/tugraph-family/tugraph-analytics?logo=startrek&label=Star&color=yellow)](https://github.com/TuGraph-family/tugraph-analytics/stargazers) [![Fork](https://shields.io/github/forks/tugraph-family/tugraph-analytics?logo=forgejo&label=Fork&color=orange)](https://github.com/TuGraph-family/tugraph-analytics/forks) @@ -17,7 +17,7 @@ ## Introduction **TuGraph Analytics** (alias: GeaFlow) is a distributed graph compute engine developed by Ant Group. It supports core capabilities such as trillion-level graph storage, hybrid graph and table processing, real-time graph computation, and interactive graph analysis. Currently, it is widely used in scenarios such as data warehousing acceleration, financial risk control, knowledge graph, and social networks. -For more information about GeaFlow: [GeaFlow Introduction](docs/docs-en/introduction.md) +For more information about GeaFlow: [GeaFlow Introduction](docs/docs-en/source/2.introduction.md) For GeaFlow design paper: [GeaFlow: A Graph Extended and Accelerated Dataflow System](https://dl.acm.org/doi/abs/10.1145/3589771) @@ -43,21 +43,21 @@ For GeaFlow design paper: [GeaFlow: A Graph Extended and Accelerated Dataflow Sy 3. Build Image:`./build.sh --all` 4. Start Container:`docker run -d --name geaflow-console -p 8888:8888 geaflow-console:0.1` -For more details:[Quick Start](docs/docs-cn/quick_start.md)。 +For more details:[Quick Start](docs/docs-cn/source/3.quick_start/1.quick_start.md)。 ## Development Manual GeaFlow supports two sets of programming interfaces: DSL and API. You can develop streaming graph computing jobs using GeaFlow's SQL extension language SQL+ISO/GQL or use GeaFlow's high-level API programming interface to develop applications in Java. -* DSL application development: [DSL Application Development](docs/docs-en/application-development/dsl/overview.md) -* API application development: [API Application Development](docs/docs-en/application-development/api/overview.md) +* DSL application development: [DSL Application Development](docs/docs-en/source/5.application-development/2.dsl/1.overview.md) +* API application development: [API Application Development](docs/docs-en/source/5.application-development/1.api/1.overview.md) ## Real-time Capabilities Compared with traditional stream processing engines such as Flink and Storm, which use tables as their data model for real-time processing, GeaFlow's graph-based data model has significant performance advantages when handling join relationship operations, especially complex multi-hops relationship operations like those involving 3 or more hops of join and complex loop searches. -[![total_time](./docs/static/img/vs_join_total_time_en.jpg)](./docs/docs-en/principle/vs_join.md) +[![total_time](docs/static/img/vs_join_total_time_en.jpg)](docs/docs-en/source/reference/vs_join.md) -[Why using graphs for relational operations is more appealing than table joins?](./docs/docs-en/principle/vs_join.md) +[Why using graphs for relational operations is more appealing than table joins?](docs/docs-en/source/reference/vs_join.md) Association Analysis Demo Based on GQL: @@ -82,7 +82,7 @@ JOIN student s ON sc.srcId = s.id ## Contribution Thank you very much for contributing to GeaFlow, whether bug reporting, documentation improvement, or major feature development, we warmly welcome all contributions. -For more information: [Contribution](docs/docs-en/contribution.md). +For more information: [Contribution](docs/docs-en/source/9.contribution.md). ## Contact Us You can contact us through the following methods: diff --git a/README_cn.md b/README_cn.md index c22423cf0..20612ab53 100644 --- a/README_cn.md +++ b/README_cn.md @@ -1,4 +1,4 @@ -# TuGraph Analytics +# TuGraph Analytics 文档地图 [![Star](https://shields.io/github/stars/tugraph-family/tugraph-analytics?logo=startrek&label=Star&color=yellow)](https://github.com/TuGraph-family/tugraph-analytics/stargazers) [![Fork](https://shields.io/github/forks/tugraph-family/tugraph-analytics?logo=forgejo&label=Fork&color=orange)](https://github.com/TuGraph-family/tugraph-analytics/forks) @@ -17,7 +17,7 @@ ## 介绍 **TuGraph Analytics** (别名:GeaFlow) 是蚂蚁集团开源的流图计算引擎,支持万亿级图存储、图表混合处理、实时图计算、交互式图分析等核心能力,目前广泛应用于数仓加速、金融风控、知识图谱以及社交网络等场景。 -关于GeaFlow更多介绍请参考:[GeaFlow介绍文档](docs/docs-cn/introduction.md) +关于GeaFlow更多介绍请参考:[GeaFlow介绍文档](docs/docs-cn/source/2.introduction.md) GeaFlow设计论文参考:[GeaFlow: A Graph Extended and Accelerated Dataflow System](https://dl.acm.org/doi/abs/10.1145/3589771) @@ -43,21 +43,21 @@ GeaFlow设计论文参考:[GeaFlow: A Graph Extended and Accelerated Dataflow 3. 构建镜像:`./build.sh --all` 4. 启动容器:`docker run -d --name geaflow-console -p 8888:8888 geaflow-console:0.1` -更多详细内容请参考:[快速上手文档](docs/docs-cn/quick_start.md)。 +更多详细内容请参考:[快速上手文档](docs/docs-cn/source/3.quick_start/1.quick_start.md)。 ## 开发手册 GeaFlow支持DSL和API两套编程接口,您既可以通过GeaFlow提供的类SQL扩展语言SQL+ISO/GQL进行流图计算作业的开发,也可以通过GeaFlow的高阶API编程接口通过Java语言进行应用开发。 -* DSL应用开发:[DSL开发文档](docs/docs-cn/application-development/dsl/overview.md) -* API应用开发:[API开发文档](docs/docs-cn/application-development/api/guid.md) +* DSL应用开发:[DSL开发文档](docs/docs-cn/source/5.application-development/2.dsl/1.overview.md) +* API应用开发:[API开发文档](docs/docs-cn/source/5.application-development/1.api/guid.md) ## 实时能力 相比传统的流式计算引擎比如Flink、Storm这些以表为模型的实时处理系统而言,GeaFlow以图为数据模型,在处理Join关系运算,尤其是复杂多跳的关系运算如3跳以上的Join、复杂环路查找上具备极大的性能优势。 -[![total_time](./docs/static/img/vs_join_total_time_cn.jpg)](./docs/docs-cn/principle/vs_join.md) +[![total_time](docs/static/img/vs_join_total_time_cn.jpg)](docs/docs-cn/source/reference/vs_join.md) -[为什么使用图进行关联运算比表Join更具吸引力?](./docs/docs-cn/principle/vs_join.md) +[为什么使用图进行关联运算比表Join更具吸引力?](docs/docs-cn/source/reference/vs_join.md) 基于GQL的关联分析Demo: @@ -82,7 +82,7 @@ JOIN student s ON sc.srcId = s.id ## 参与贡献 非常感谢您参与到GeaFlow的贡献中来,无论是Bug反馈还是文档完善,或者是大的功能点贡献,我们都表示热烈的欢迎。 -具体请参考:[参与贡献文档](docs/docs-cn/contribution.md)。 +具体请参考:[参与贡献文档](docs/docs-cn/source/9.contribution.md)。 **如果您对GeaFlow感兴趣,欢迎给我们项目一颗[ ⭐️ ](https://github.com/TuGraph-family/tugraph-analytics)。** diff --git a/docs/docs-cn/.readthedocs.yaml b/docs/docs-cn/.readthedocs.yaml new file mode 100644 index 000000000..08266a1c1 --- /dev/null +++ b/docs/docs-cn/.readthedocs.yaml @@ -0,0 +1,30 @@ +# .readthedocs.yaml +# Read the Docs configuration file +# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details + +# Required +version: 2 + +# Set the version of Python and other tools you might need +build: + os: ubuntu-20.04 + tools: + python: "3.6" + # You can also specify other tool versions: + # nodejs: "19" + # rust: "1.64" + # golang: "1.19" + +# Build documentation in the docs/ directory with Sphinx +sphinx: + builder: html + configuration: docs/docs-cn/source/conf.py + +# If using Sphinx, optionally build your docs in additional formats such as PDF +# formats: +# - pdf + +# Optionally declare the Python requirements required to build your docs +python: + install: + - requirements: docs/requirements.txt \ No newline at end of file diff --git a/docs/docs-cn/Makefile b/docs/docs-cn/Makefile new file mode 100644 index 000000000..d0c3cbf10 --- /dev/null +++ b/docs/docs-cn/Makefile @@ -0,0 +1,20 @@ +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line, and also +# from the environment for the first two. +SPHINXOPTS ?= +SPHINXBUILD ?= sphinx-build +SOURCEDIR = source +BUILDDIR = build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/docs/docs-cn/application-development/api/guid.md b/docs/docs-cn/application-development/api/guid.md deleted file mode 100644 index 92a786680..000000000 --- a/docs/docs-cn/application-development/api/guid.md +++ /dev/null @@ -1,8 +0,0 @@ -* [API概览](overview.md) -* Stream API - * [Source API](stream/source.md) - * [Process API](stream/process.md) - * [Sink API](stream/sink.md) -* Graph API - * [Compute API](graph/compute.md) - * [Traversal API](graph/traversal.md) \ No newline at end of file diff --git a/docs/docs-cn/concepts/glossary.md b/docs/docs-cn/concepts/glossary.md deleted file mode 100644 index 482ab7920..000000000 --- a/docs/docs-cn/concepts/glossary.md +++ /dev/null @@ -1,27 +0,0 @@ -# 名称解释 - -**图**:图用于展示不同变量之间的关系,通常包括节点(点)和边(线)两部分。节点代表一个个体或对象,边则代表它们之间的关系。图可以用来解释复杂的关系网络和信息流动,如社交网络、交通网络、物流网络等。常见的图形类型包括有向图、无向图、树形图、地图等。 - -**K8S**:k8s是[Kubernetes](https://kubernetes.io/)的简称,是一个开源的容器编排平台,提供了自动化部署、自动扩展、自动管理容器化应用程序的功能。它可以在各种云平台、物理服务器和虚拟机上运行,支持多种容器运行时,可以实现高可用性、负载均衡、自动扩容、自动修复等功能。 - -**Graph Processing**: Graph Processing是一种计算模型,用于处理图形数据结构的计算问题。图计算模型可以用于解决许多现实世界的问题,例如社交网络分析、网络流量分析、医疗诊断等,典型的系统有 [Apache Giraph](https://giraph.apache.org/), [Spark GraphX](https://spark.apache.org/docs/latest/graphx-programming-guide.html)。 - -**DSL**: DSL是领域特定语言(Domain-Specific Language)的缩写。它是一种针对特定领域或问题的编程语言,与通用编程语言不同,DSL主要关注于解决特定领域的问题,并针对该领域的特定需求进行优化。DSL可以使得编程更加简单、高效,同时也能够提高代码的可读性和可维护性。下面的**Gremlin**、**ISO-GQL**就是DSL中的一种。 - -**HLA**: HLA 是 High level language 的缩写,与**DSL**不同,它使用通用语言进行编程,Geaflow目前只支持Java程序编写。它主要通过计算引擎SDK进行程序编写,执行方式是将程序整体打包交给引擎执行,对比**DSL**,它的执行方式更加灵活,但相对应的编程也会更加复杂。 - -**Gremlin**: [Gremlin](https://tinkerpop.apache.org/gremlin.html)是一种图形遍历语言,用于在图形数据库中进行数据查询和操作。它是一种声明式的、基于图的编程语言,可以用于访问各种类型的图形数据库,如Apache TinkerPop、Neo4j等。它提供了一组灵活的API,可以帮助开发者在图形数据库中执行各种操作,如遍历、过滤、排序、连接、修改等。 - -**ISO-GQL**:[GQL](https://www.gqlstandards.org/)是面向属性图的标准查询语言,全称是“图形查询语言”,其为ISO/IEC国际标准数据库语言。GeaFlow不仅支持了Gremlin查询语言,而且还支持了GQL。 - -**Window**: 参考VLDB 2015 [Google Dataflow Model](https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/43864.pdf),窗口的概念在 Geaflow 中是其数据处理逻辑中的关键要素,用于统一有界和无界的数据处理。数据流统一被看成一个个窗口数据的集合,系统处理批次的粒度也就是窗口的粒度。 - -**Cycle**: GeaFlow Scheduler调度模型中的核心数据结构,一个cycle描述为可循环执行的基本单元,包含输入,中间计算和数据交换,输出的描述。由执行计划中的vertex group生成,支持嵌套。 - -**Event**: Runtime层调度和计算交互的核心数据结构,Scheduler将一系列Event集合构建成一个State Machine,将其分发到Worker上进行计算执行。其中有些Event是可执行的,即自身具备计算语义,整个调度和计算过程为异步执行。 - -**Graph Traversal** : Graph Traversal 是指遍历图数据结构中所有节点或者部分节点的过程,在特定的顺序下访问所有节点,主要是深度优先搜索(DFS)和 广度优先搜索(BFS)。用于解决许多问题,包括查找两个节点之间的最短路径、检测图中的循环等。 - -**Graph State**: GraphState 是用来存放Geaflow的图数据或者图迭代计算过程的中间结果,提供Exactly-Once语义,并提供作业级复用的能力。GraphState 分为 Static 和 Dynamic 两种,Static 的 GraphState 将整个图看做是一个完整的,所有操作都在一张全图上进行;Dynamic 的 GraphState 认为图动态变化的,由一个个时间切片构成,所有切片构成一个完整的图,所有的操作都是在切片上进行。 - -**Key State**: KeyState 用于存放计算过程中的中间结果,通常用于流式处理,例如执行aggregation时在KeyState中记录中间聚合结果。类似GraphState,Geaflow 会将 KeyState 定期持久化,因此KeyState也提供Exactly-Once语义。KeyState根据数据结果不同可以分为KeyValueState、KeyListState、KeyMapState等。 \ No newline at end of file diff --git a/docs/docs-cn/make.bat b/docs/docs-cn/make.bat new file mode 100644 index 000000000..7a1131f7b --- /dev/null +++ b/docs/docs-cn/make.bat @@ -0,0 +1,36 @@ +@ECHO OFF + +pushd %~dp0 + +REM Command file for Sphinx documentation + +if "%SPHINXBUILD%" == "" ( + set SPHINXBUILD=sphinx-build +) +set SOURCEDIR=source +set BUILDDIR=build +set SPHINXOPTS=-W + +%SPHINXBUILD% >NUL 2>NUL +if errorlevel 9009 ( + echo. + echo.The 'sphinx-build' command was not found. Make sure you have Sphinx + echo.installed, then set the SPHINXBUILD environment variable to point + echo.to the full path of the 'sphinx-build' executable. Alternatively you + echo.may add the Sphinx directory to PATH. + echo. + echo.If you don't have Sphinx installed, grab it from + echo.https://www.sphinx-doc.org/ + exit /b 1 +) + +if "%1" == "" goto help + +%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% +goto end + +:help +%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% + +:end +popd diff --git a/docs/docs-cn/source/1.guide_cn.md b/docs/docs-cn/source/1.guide_cn.md new file mode 100644 index 000000000..23a654109 --- /dev/null +++ b/docs/docs-cn/source/1.guide_cn.md @@ -0,0 +1,41 @@ +# 文档地图 +这里是文档地图,帮助用户快速学习和使用TuGraph Analytics。 + +## 介绍 +**TuGraph Analytics** (别名:GeaFlow) 是蚂蚁集团开源的[**性能世界一流**](https://ldbcouncil.org/benchmarks/snb-bi/)的OLAP图数据库,支持万亿级图存储、图表混合处理、实时图计算、交互式图分析等核心能力,目前广泛应用于数仓加速、金融风控、知识图谱以及社交网络等场景。 + +关于GeaFlow更多介绍请参考:[GeaFlow介绍文档](2.introduction.md) + +GeaFlow设计论文参考:[GeaFlow: A Graph Extended and Accelerated Dataflow System](https://dl.acm.org/doi/abs/10.1145/3589771) + +## 快速上手 + +1. 准备Git、JDK8、Maven、Docker环境。 +2. 下载源码:`git clone https://github.com/TuGraph-family/tugraph-analytics` +3. 项目构建:`mvn clean install -DskipTests` +4. 测试任务:`./bin/gql_submit.sh --gql geaflow/geaflow-examples/gql/loop_detection.sql` +3. 构建镜像:`./build.sh --all` +4. 启动容器:`docker run -d --name geaflow-console -p 8888:8888 geaflow-console:0.1` + +更多详细内容请参考:[快速上手文档](3.quick_start/1.quick_start.md)。 + +## 开发手册 + +GeaFlow支持DSL和API两套编程接口,您既可以通过GeaFlow提供的类SQL扩展语言SQL+ISO/GQL进行流图计算作业的开发,也可以通过GeaFlow的高阶API编程接口通过Java语言进行应用开发。 +* DSL应用开发:[DSL开发文档](5.application-development/2.dsl/1.overview.md) +* API应用开发:[API开发文档](5.application-development/1.api/1.overview.md) + +## 实时能力 + +相比传统的流式计算引擎比如Flink、Storm这些以表为模型的实时处理系统而言,GeaFlow以图为数据模型,在处理Join关系运算,尤其是复杂多跳的关系运算如3跳以上的Join、复杂环路查找上具备极大的性能优势。 + +[![total_time](../../static/img/vs_join_total_time_cn.jpg)](reference/vs_join.md) + +## 合作伙伴 +| | | | +|------------------|------------------|------------------| +| [![HUST](../../static/img/partners/hust.png)](https://github.com/CGCL-codes/YiTu) | [![FU](../../static/img/partners/fu.png)](http://kw.fudan.edu.cn/) | ![ZJU](../../static/img/partners/zju.png) | +| [![WhaleOps](../../static/img/partners/whaleops.png)](http://www.whaleops.com/) | [![OceanBase](../../static/img/partners/oceanbase.png)](https://github.com/oceanbase/oceanbase) | [![SecretFlow](../../static/img/partners/secretflow.png)](https://github.com/secretflow/secretflow) | + + + diff --git a/docs/docs-cn/introduction.md b/docs/docs-cn/source/2.introduction.md similarity index 82% rename from docs/docs-cn/introduction.md rename to docs/docs-cn/source/2.introduction.md index 6d9386acb..066a9db06 100644 --- a/docs/docs-cn/introduction.md +++ b/docs/docs-cn/source/2.introduction.md @@ -1,4 +1,4 @@ -# GeaFlow简介 +# 产品简介 ## GeaFlow起源 早期的大数据分析主要以离线处理为主,以Hadoop为代表的技术栈很好的解决了大规模数据的分析问题。然而数据处理的时效性不足, @@ -21,12 +21,12 @@ GeaFlow整体架构如下所示: -![GeaFlow架构](../static/img/geaflow_arch_new.png) +![GeaFlow架构](../../static/img/geaflow_arch_new.png) -* [DSL层](./principle/dsl_principle.md):即语言层。GeaFlow设计了SQL+GQL的融合分析语言,支持对表模型和图模型统一处理。 -* [Framework层](./principle/framework_principle.md):即框架层。GeaFlow设计了面向Graph和Stream的两套API支持流、批、图融合计算,并实现了基于Cycle的统一分布式调度模型。 -* [State层](./principle/state_principle.md):即存储层。GeaFlow设计了面向Graph和KV的两套API支持表数据和图数据的混合存储,整体采用了Sharing Nothing的设计,并支持将数据持久化到远程存储。 -* [Console平台](./principle/console_principle.md):GeaFlow提供了一站式图研发平台,实现了图数据的建模、加工、分析能力,并提供了图作业的运维管控支持。 +* [DSL层](4.concepts/2.dsl_principle.md):即语言层。GeaFlow设计了SQL+GQL的融合分析语言,支持对表模型和图模型统一处理。 +* [Framework层](4.concepts/3.framework_principle.md):即框架层。GeaFlow设计了面向Graph和Stream的两套API支持流、批、图融合计算,并实现了基于Cycle的统一分布式调度模型。 +* [State层](4.concepts/4.state_principle.md):即存储层。GeaFlow设计了面向Graph和KV的两套API支持表数据和图数据的混合存储,整体采用了Sharing Nothing的设计,并支持将数据持久化到远程存储。 +* [Console平台](4.concepts/5.console_principle.md):GeaFlow提供了一站式图研发平台,实现了图数据的建模、加工、分析能力,并提供了图作业的运维管控支持。 * **执行环境**:GeaFlow可以运行在多种异构执行环境,如K8S、Ray以及本地模式。 ## 应用场景 @@ -39,9 +39,9 @@ GeaFlow以图作为数据模型,替代DWD层的宽表,可以实现数据实 ### 实时归因分析 在信息化的大背景下,对用户行为进行渠道归因和路径分析是流量分析领域中的核心所在。通过实时计算用户的有效行为路径,构建出完整的转化路径,能够快速帮助业务看清楚产品的价值,帮助运营及时调整运营思路。实时归因分析的核心要点是准确性和实效性。准确性要求在成本可控下保证用户行为路径分析的准确性;实效性则要求计算的实时性足够高,才能快速帮助业务决策。 基于GeaFlow流图计算引擎的能力可以很好的满足归因分析的准确性和时效性要求。如下图所示: -![归因分析](../static/img/guiyin_analysis.png) +![归因分析](../../static/img/guiyin_analysis.png) GeaFlow首先通过实时构图将用户行为日志转换成用户行为拓扑图,以用户作为图中的点,与其相关的每个行为构建成从该用户指向埋点页面的一条边.然后利用流图计算能力分析提前用户行为子图,在子图上基于归因路径匹配的规则进行匹配计算得出该成交行为相应用户的归因路径,并输出到下游系统。 ### 实时反套现 在信贷风控的场景下,如何进行信用卡反套现是一个典型的风控诉求。基于现有的套现模式分析,可以看到套现是一个环路子图,如何快速,高效在大图中快速判定套现,将极大的增加风险的识别效率。以下图为例,通过将实时交易流、转账流等输入数据源转换成实时交易图,然后根据风控策略对用户交易行为做图特征分析,比如环路检查等特征计算,实时提供给决策和监控平台进行反套现行为判定。通过GeaFlow实时构图和实时图计算能力,可以快速发现套现等异常交易行为,极大降低平台风险。 -![实时反套现](../static/img/fantaoxian.png) \ No newline at end of file +![实时反套现](../../static/img/fantaoxian.png) \ No newline at end of file diff --git a/docs/docs-cn/quick_start.md b/docs/docs-cn/source/3.quick_start/1.quick_start.md similarity index 91% rename from docs/docs-cn/quick_start.md rename to docs/docs-cn/source/3.quick_start/1.quick_start.md index 64b9e4d53..a1618aae1 100644 --- a/docs/docs-cn/quick_start.md +++ b/docs/docs-cn/source/3.quick_start/1.quick_start.md @@ -1,4 +1,4 @@ -# 快速上手(本地运行) +# 源码部署 ## 准备工作 @@ -129,7 +129,7 @@ bin/socket.sh socket 服务启动后,控制台显示如下信息: -![socket_start](../static/img/socket_start.png) +![socket_start](../../../static/img/socket_start.png) 3. 输入数据 @@ -155,7 +155,7 @@ socket 服务启动后,控制台显示如下信息: 可以看到 socket 控制台上显示计算出来的环路数据: -![ide_socket_server](../static/img/ide_socket_server.png) +![ide_socket_server](../../../static/img/ide_socket_server.png) 你也可以继续输入新的点边数据,查看最新计算结果,如输入一下数据: @@ -165,7 +165,7 @@ socket 服务启动后,控制台显示如下信息: 可以看到新的环路 3-4-5-6-3 被检查出来: -![ide_socket_server_more](../static/img/ide_socket_server_more.png) +![ide_socket_server_more](../../../static/img/ide_socket_server_more.png) 4. 访问可视化dashboard页面 @@ -173,23 +173,23 @@ socket 服务启动后,控制台显示如下信息: 在浏览器中输入*http://localhost:8090*即可访问前端页面。 -![dashboard_overview](../static/img/dashboard_overview.png) +![dashboard_overview](../../../static/img/dashboard_overview.png) 关于更多dashboard相关的内容,请参考文档: -[文档](dashboard.md) +[文档](../7.deploy/3.dashboard.md) ## GeaFlow Console 快速上手 GeaFlow Console 是 GeaFlow 提供的图计算研发平台,我们将介绍如何在 Docker 容器里面启动 GeaFlow Console 平台,提交流图计算作业。文档地址: -[文档](quick_start_docker.md) +[文档](2.quick_start_docker.md) ## GeaFlow Kubernetes Operator快速上手 Geaflow Kubernetes Operator是一个可以快速将Geaflow应用部署到kubernetes集群中的部署工具。 我们将介绍如何通过Helm安装geaflow-kubernetes-operator,通过yaml文件快速提交geaflow作业, 并访问operator的dashboard页面查看集群下的作业状态。文档地址: -[文档](quick_start_operator.md) +[文档](../7.deploy/2.quick_start_operator.md) ## 使用 G6VP 进行流图计算作业可视化 G6VP 是一个可扩展的图可视分析平台,包括数据源管理、构图、图元素个性化配置、图可视分析等功能模块。使用 G6VP 能够很方便的对 Geaflow 计算结果进行可视化分析。文档地址: -[文档](visualization/collaborate_with_g6vp.md) +[文档](../7.deploy/4.collaborate_with_g6vp.md) diff --git a/docs/docs-cn/quick_start_docker.md b/docs/docs-cn/source/3.quick_start/2.quick_start_docker.md similarity index 88% rename from docs/docs-cn/quick_start_docker.md rename to docs/docs-cn/source/3.quick_start/2.quick_start_docker.md index 510201c25..caf1880eb 100644 --- a/docs/docs-cn/quick_start_docker.md +++ b/docs/docs-cn/source/3.quick_start/2.quick_start_docker.md @@ -1,9 +1,9 @@ -# 开始上手(GeaFlow Console运行) +# 白屏部署 ## 准备工作 1. 下载安装[Docker](https://docs.docker.com/engine/install/),调整Docker服务资源配置(Dashboard-Settings-Resources)后启动Docker服务: -![docker_pref](../static/img/docker_pref.png) +![docker_pref](../../../static/img/docker_pref.png) 2. 拉取GeaFlow Console镜像 @@ -38,7 +38,7 @@ docker images 。本地镜像名称为:**geaflow-console:0.1**,只需选择一种方式构建镜像即可。 ## Docker容器运行GeaFlow作业 -下面介绍在docker容器里面运行前面[本地模式运行](quick_start.md)介绍的流图作业。 +下面介绍在docker容器里面运行前面[本地模式运行](1.quick_start.md)介绍的流图作业。 1. 启动GeaFlow Console平台服务。 @@ -75,7 +75,7 @@ GeaflowApplication:61 - Started GeaflowApplication in 11.437 seconds (JVM runn 首位注册用户将默认被设置为管理员,以管理员身份登录,通过一键安装功能开始系统初始化。 -![install_welcome](../static/img/install_welcome.png) +![install_welcome](../../../static/img/install_welcome.png) 3. 配置运行时环境 @@ -86,31 +86,31 @@ GeaFlow首次运行需要配置运行时环境相关的配置,包括集群配 使用默认Container模式,即本地容器运行。 -![install_container](../static/img/install_container.png) +![install_container](../../../static/img/install_container.png) 3.2 运行时配置 本地运行模式下可以跳过这一步配置,使用系统默认配置,直接点下一步。 -![install_conainer_meta_config.png](../static/img/install_conainer_meta_config.png) +![install_conainer_meta_config.png](../../../static/img/install_conainer_meta_config.png) 3.3 数据存储配置 选择图数据存储位置,本地模式下选择LOCAL,填写一个本地目录。默认不需填写,直接点下一步。 -![install_storage_config](../static/img/install_storage_config.png) +![install_storage_config](../../../static/img/install_storage_config.png) 3.4 文件存储配置 该配置为GeaFlow引擎JAR、用户JAR文件的持久化存储,比如HDFS等。本地运行模式下和数据存储配置相同,选择LOCAL模式,填写一个本地目录。默认不需填写,直接点下一步。 -![install_jar_config](../static/img/install_jar_config.png) +![install_jar_config](../../../static/img/install_jar_config.png) 配置完成后点击一键安装按钮,安装成功后,管理员会自动切换到个人租户下的默认实例,并可以直接创建发布图计算任务。 4. 创建图计算任务 进入图研发页面,选择左侧图任务Tab栏,点击右上角新增按钮,新建一个DSL作业。 -![create_job](../static/img/create_job.png) +![create_job](../../../static/img/create_job.png) 分别填写任务名称、任务描述和DSL内容。其中DSL内容和前面本地运行作业介绍的一样,只需修改DSL,**将tbl_source和tbl_result表的${your.host.ip}替换成本机ip**即可。 ```sql @@ -197,16 +197,16 @@ ifconfig ``` 找到eth0或者en0的网卡,其中ipv4的地址即为你本机的ip地址。 -![ip.png](../static/img/ip.png) +![ip.png](../../../static/img/ip.png) -![add_dsl_job](../static/img/add_dsl_job.png) +![add_dsl_job](../../../static/img/add_dsl_job.png) 创建完成作业后,点击发布按钮发布作业。 -![add_dsl_job](../static/img/job_list.png) +![add_dsl_job](../../../static/img/job_list.png) 然后进入作业管理页面,点击提交按钮提交作业执行。 -![task_detail](../static/img/task_detail.png) +![task_detail](../../../static/img/task_detail.png) 5. 启动socket服务输入数据 @@ -235,7 +235,7 @@ socket服务启动后,输入点边数据,计算结果会实时显示在屏 - 6,7,0.1 ``` -![ide_socket_server](../static/img/ide_socket_server.png) +![ide_socket_server](../../../static/img/ide_socket_server.png) ## K8S部署 -GeaFlow支持K8S部署, 部署详细文档请参考文档:[K8S部署](deploy/install_guide.md) \ No newline at end of file +GeaFlow支持K8S部署, 部署详细文档请参考文档:[K8S部署](../7.deploy/1.install_guide.md) \ No newline at end of file diff --git a/docs/docs-cn/source/3.quick_start/index.rst b/docs/docs-cn/source/3.quick_start/index.rst new file mode 100644 index 000000000..2d19854e4 --- /dev/null +++ b/docs/docs-cn/source/3.quick_start/index.rst @@ -0,0 +1,10 @@ +快速上手 +==== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + 1.quick_start.md + 2.quick_start_docker.md \ No newline at end of file diff --git a/docs/docs-cn/source/4.concepts/1.glossary.md b/docs/docs-cn/source/4.concepts/1.glossary.md new file mode 100644 index 000000000..b19f2da1a --- /dev/null +++ b/docs/docs-cn/source/4.concepts/1.glossary.md @@ -0,0 +1,186 @@ +# 名词解释 + +**图**:图用于展示不同变量之间的关系,通常包括节点(点)和边(线)两部分。节点代表一个个体或对象,边则代表它们之间的关系。图可以用来解释复杂的关系网络和信息流动,如社交网络、交通网络、物流网络等。常见的图形类型包括有向图、无向图、树形图、地图等。 + +**K8S**:k8s是[Kubernetes](https://kubernetes.io/)的简称,是一个开源的容器编排平台,提供了自动化部署、自动扩展、自动管理容器化应用程序的功能。它可以在各种云平台、物理服务器和虚拟机上运行,支持多种容器运行时,可以实现高可用性、负载均衡、自动扩容、自动修复等功能。 + +**Graph Processing**: Graph Processing是一种计算模型,用于处理图形数据结构的计算问题。图计算模型可以用于解决许多现实世界的问题,例如社交网络分析、网络流量分析、医疗诊断等,典型的系统有 [Apache Giraph](https://giraph.apache.org/), [Spark GraphX](https://spark.apache.org/docs/latest/graphx-programming-guide.html)。 + +**DSL**: DSL是领域特定语言(Domain-Specific Language)的缩写。它是一种针对特定领域或问题的编程语言,与通用编程语言不同,DSL主要关注于解决特定领域的问题,并针对该领域的特定需求进行优化。DSL可以使得编程更加简单、高效,同时也能够提高代码的可读性和可维护性。下面的**Gremlin**、**ISO-GQL**就是DSL中的一种。 + +**HLA**: HLA 是 High level language 的缩写,与**DSL**不同,它使用通用语言进行编程,Geaflow目前只支持Java程序编写。它主要通过计算引擎SDK进行程序编写,执行方式是将程序整体打包交给引擎执行,对比**DSL**,它的执行方式更加灵活,但相对应的编程也会更加复杂。 + +**Gremlin**: [Gremlin](https://tinkerpop.apache.org/gremlin.html)是一种图形遍历语言,用于在图形数据库中进行数据查询和操作。它是一种声明式的、基于图的编程语言,可以用于访问各种类型的图形数据库,如Apache TinkerPop、Neo4j等。它提供了一组灵活的API,可以帮助开发者在图形数据库中执行各种操作,如遍历、过滤、排序、连接、修改等。 + +**ISO-GQL**:[GQL](https://www.gqlstandards.org/)是面向属性图的标准查询语言,全称是“图形查询语言”,其为ISO/IEC国际标准数据库语言。GeaFlow不仅支持了Gremlin查询语言,而且还支持了GQL。 + +**Window**: 参考VLDB 2015 [Google Dataflow Model](https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/43864.pdf),窗口的概念在 Geaflow 中是其数据处理逻辑中的关键要素,用于统一有界和无界的数据处理。数据流统一被看成一个个窗口数据的集合,系统处理批次的粒度也就是窗口的粒度。 + +**Cycle**: GeaFlow Scheduler调度模型中的核心数据结构,一个cycle描述为可循环执行的基本单元,包含输入,中间计算和数据交换,输出的描述。由执行计划中的vertex group生成,支持嵌套。 + +**Event**: Runtime层调度和计算交互的核心数据结构,Scheduler将一系列Event集合构建成一个State Machine,将其分发到Worker上进行计算执行。其中有些Event是可执行的,即自身具备计算语义,整个调度和计算过程为异步执行。 + +**Graph Traversal** : Graph Traversal 是指遍历图数据结构中所有节点或者部分节点的过程,在特定的顺序下访问所有节点,主要是深度优先搜索(DFS)和 广度优先搜索(BFS)。用于解决许多问题,包括查找两个节点之间的最短路径、检测图中的循环等。 + +**Graph State**: GraphState 是用来存放Geaflow的图数据或者图迭代计算过程的中间结果,提供Exactly-Once语义,并提供作业级复用的能力。GraphState 分为 Static 和 Dynamic 两种,Static 的 GraphState 将整个图看做是一个完整的,所有操作都在一张全图上进行;Dynamic 的 GraphState 认为图动态变化的,由一个个时间切片构成,所有切片构成一个完整的图,所有的操作都是在切片上进行。 + +**Key State**: KeyState 用于存放计算过程中的中间结果,通常用于流式处理,例如执行aggregation时在KeyState中记录中间聚合结果。类似GraphState,Geaflow 会将 KeyState 定期持久化,因此KeyState也提供Exactly-Once语义。KeyState根据数据结果不同可以分为KeyValueState、KeyListState、KeyMapState等。 + +## Graph View + +### 基本概念 + +GraphView是Geaflow中最核心的数据抽象,表示基于图结构的虚拟视图。它是图物理存储的一种抽象,可以表示存储和操作在多个节点上图数据。在 Geaflow中,GraphView是一等公民,用户对图的所有操作都是基于GraphView,例如将分布式点、边流作为 GraphView 增量的点/边数据集,对当前 GraphView 生成快照,用户可以基于快照图或者动态的 GraphView 触发计算。 + + +### 功能描述 + +GraphView 主要有以下几个功能: +* 图操作,GraphView可以添加或删除点和边数据,亦可以进行查询和在基于某个时间点切片快照。 +* 图介质,GraphView可以存储到图数据库或其他存储介质(如文件系统、KV存储、宽表存储、native graph等)。 +* 图切分,GraphView还支持不同图切分方法。 +* 图计算,GraphView可以进行图的迭代遍历或者计算。 + +![graph_view|(4000x2500)](../../../static/img/graph_view.png) + +### 示例介绍 +定义一个 Social Network 的 GraphView, 描述人际关系。 + +DSL 代码 +```SQL +CREATE GRAPH social_network ( + Vertex person ( + id int ID, + name varchar + ), + Edge knows ( + person1 int SOURCE ID, + person2 int DESTINATION ID, + weight int + ) +) WITH ( + storeType='rocksdb', + shardCount = 128 +); +``` + + +HLA 代码 +```java +//build graph view. +final String graphName = "social_network"; +GraphViewDesc graphViewDesc = GraphViewBuilder + .createGraphView(graphName) + .withShardNum(128) + .withBackend(BackendType.RocksDB) + .withSchema(new GraphMetaType(IntegerType.INSTANCE, ValueVertex.class, + String.class, ValueEdge.class, Integer.class)) + .build(); + +// bind the graphview with pipeline1 +pipeline.withView(graphName, graphViewDesc); +pipeline.submit(new PipelineTask()); + +``` + +## Stream Graph + +### 基本概念 + +Streaming Graph指的是流式、动态、变化的图数据,同时在GeaFlow内部Streaming Graph也指对流式图的计算模式,它针对流式变化的图,基于图的变化进行图遍历、图匹配和图计算等操作。 + +基于GeaFlow框架,可以方便的针对流式变化的图动态计算。在GeaFlow中,我们抽象了Dynamic Graph和Static Graph两个核心的概念。 +* Dynamic Graph 是指流式变化的图,它是图在时间轴上不断变化的切片所组成,可以方便的研究图随着时间推移的演化过程。 +* Static Graph 是图在某个时间点的 Snapshot,相当于 Dynamic Graph 的一个时间切片。 + +### 功能描述 + + +Streaming Graph 主要有以下几个功能: + +* 支持流式地处理点、边数据,支持在最新的图做查询。 +* 支持持续不断的更新和查询图结构,支持图结构变化带来的增量数据处理。 +* 支持回溯历史,基于历史快照做查询。 +* 支持图计算的计算逻辑顺序,例如基于边的时间序做计算。 + + +### 示例介绍 + +读取点、边两个无限数据流增量构图,对于每次增量数据构图完成,会触发 traversal 计算,查找 'Bob' 的2度内的朋友随着时间推移的演进过程。 + +DSL 代码 +```SQL + +set geaflow.dsl.window.size = 1; + +CREATE TABLE table_knows ( + personId int, + friendId int, + weight int +) WITH ( + type='file', + geaflow.dsl.file.path = 'resource:///data/table_knows.txt' +); + +INSERT INTO social_network.knows +SELECT personId, friendId, weight +FROM table_knows; + +CREATE TABLE result ( + personName varchar, + friendName varchar, + weight int +) WITH ( + type='console' +); + +-- Graph View Name Defined in Graph View Concept -- +USE GRAPH social_network; +-- find person id 3's known persons triggered every window. +INSERT INTO result +SELECT + name, + known_name, + weight +FROM ( + MATCH (a:person where a.name = 'Bob') -[e:knows]->{1, 2}(b) + RETURN a.name as name, b.name as known_name, e.weight as weight +) +``` + +HLA 代码 + +```java +//build graph view. +final String graphName = "social_network"; +GraphViewDesc graphViewDesc = GraphViewBuilder.createGraphView(graphName).build(); +pipeline.withView(graphName, graphViewDesc); + +// submit pipeLine task. +pipeline.submit(new PipelineTask() { + @Override + public void execute(IPipelineTaskContext pipelineTaskCxt) { + + // build vertices streaming source. + PStreamSource> persons = + pipelineTaskCxt.buildSource( + new CollectionSource.(getVertices()), SizeTumblingWindow.of(5000)); + // build edges streaming source. + PStreamSource> knows = + pipelineTaskCxt.buildSource( + new CollectionSource<>(getEdges()), SizeTumblingWindow.of(5000)); + // build graphview by graph name. + PGraphView socialNetwork = + pipelineTaskCxt.buildGraphView(graphName); + // incremental build graph view. + PIncGraphView incSocialNetwor = + socialNetwork.appendGraph(vertices, edges); + + // traversal by 'Bob'. + incGraphView.incrementalTraversal(new IncGraphTraversalAlgorithms(2)) + .start('Bob') + .map(res -> String.format("%s,%s", res.getResponseId(), res.getResponse())) + .sink(new ConsoleSink<>()); + } +}); +``` \ No newline at end of file diff --git a/docs/docs-cn/principle/dsl_principle.md b/docs/docs-cn/source/4.concepts/2.dsl_principle.md similarity index 93% rename from docs/docs-cn/principle/dsl_principle.md rename to docs/docs-cn/source/4.concepts/2.dsl_principle.md index 0a3d622dc..7c36c86c9 100644 --- a/docs/docs-cn/principle/dsl_principle.md +++ b/docs/docs-cn/source/4.concepts/2.dsl_principle.md @@ -4,7 +4,7 @@ GeaFlow DSL整体架构如下图所示: -![DSL架构](../../static/img/dsl_arch_new.png) +![DSL架构](../../../static/img/dsl_arch_new.png) DSL层是一个典型的编译器技术架构,即语法分析、语义分析、中间代码生成(IR)、代码优化、目标代码生成(OBJ)的流程。 @@ -20,12 +20,12 @@ DSL层是一个典型的编译器技术架构,即语法分析、语义分析 ## DSL主要执行流程 DSL的主要执行流程如下图所示: -![DSL执行流程](../../static/img/dsl_workflow.png) +![DSL执行流程](../../../static/img/dsl_workflow.png) DSL文本首先经过Parser解析生成AST语法树,然后再经过Validator校验器做语义检查和类型推导生成校验后的AST语法树。接着通过Logical Plan转换器生成图表一体的逻辑执行计划。逻辑执行计划通过优化器进行优化处理生成优化后的逻辑执行计划,接下来由物理执行计划转换器转换成物理执行计划,物理执行计划通过DAG Builder生成图表一体的物理执行逻辑。GeaFlow DSL采用有两级DAG结构来描述图表一体的物理执行逻辑。 ## 两级DAG物理执行计划 和传统的分布式表数据处理引擎Storm、Flink和Spark的系统不同,GeaFlow是一个流图一体的分布式计算系统。其物理执行计划采用图表两级DAG结构,如下图所示: -![DSl DAG结构](../../static/img/dsl_twice_level_dag.png) +![DSl DAG结构](../../../static/img/dsl_twice_level_dag.png) 外层DAG包含表处理相关的算子以及图处理的迭代算子,为物理执行逻辑的主体部分,将图表的计算逻辑链接起来。内层DAG则将图计算的逻辑通过DAG方式展开,代表了图迭代计算具体执行方式. diff --git a/docs/docs-cn/principle/framework_principle.md b/docs/docs-cn/source/4.concepts/3.framework_principle.md similarity index 96% rename from docs/docs-cn/principle/framework_principle.md rename to docs/docs-cn/source/4.concepts/3.framework_principle.md index 06d1910a1..052c5a16d 100644 --- a/docs/docs-cn/principle/framework_principle.md +++ b/docs/docs-cn/source/4.concepts/3.framework_principle.md @@ -4,7 +4,7 @@ GeaFlow Framework的架构如下图所示: -![framework_arch](../../static/img/framework_arch_new.png) +![framework_arch](../../../static/img/framework_arch_new.png) * **高阶API**:GeaFlow通过Environment接口适配异构的分布式执行环境(K8S、Ray、Local),使用Pipeline封装了用户的数据处理流程,使用Window抽象统一了流处理(无界Window)和批处理(有界Window)。Graph接口提供了静态图和动态图(流图)上的计算API,如append/snapshot/compute/traversal等,Stream接口提供了统一流批处理API,如map/reduce/join/keyBy等。 * **逻辑执行计划**:逻辑执行计划信息统一封装在PipelineGraph对象内,将高阶API对应的算子(Operator)组织在DAG中,算子一共分为5大类:SourceOperator对应数据源加载、OneInputOperator/TwoInputOperator对应传统的数据处理、IteratorOperator对应静态/动态图计算。DAG中的点(PipelineVertex)记录了算子(Operator)的关键信息,如类型、并发度、算子函数等信息,边(PipelineEdge)则记录了数据shuffle的关键信息,如Partition规则(forward/broadcast/key等)、编解码器等。 @@ -24,7 +24,7 @@ GeaFlow计算引擎核心模块主要包括执行计划生成和优化、统一C * ExecutionGraph 将PipelineGraph基于不同的计算模型,将一组可重复执行的vertex聚合到一起,构建对应的ExecutionGroup,每个group表示可以独立调度执行的单元,一个group可以由一个或者多个vertex构建,可以看做一个小的执行计划,group内部数据以pipeline模式交换,group之间数据以batch模式交换。group描述了具体的执行模式,支持嵌套,可以只执行一次,也可只执行多次,可以一次执行一个或者多个窗口的数据。group如下图所示。 ExecutionGroup最终会转换为调度执行的基本单元cycle。 - ![group.jpg](../../static/img/framework_dag.jpeg) + ![group.jpg](../../../static/img/framework_dag.jpeg) ### 调度模型 调度将基于ExecutionGraph定义的ExecutionGroup生成调度基本单元cycle。cycle描述为可循环执行的基本单元,包含输入,中间计算和数据交换,输出的描述。调度执行过程主要是: @@ -38,7 +38,7 @@ tail task: cycle数据流的结尾,处理完数据后,向调度发送event 其余非head/tail task: 中间执行task,接收上游输入数据,处理后直接发送给下游执行。 cycle调度执行的过程,就是不断发送event给head,并从tail收到返回event的过程,整个过程类似一个“循环”,如下图所示。调度根据不同的cycle类型,初始化调度状态,调度的过程也是状态变迁的过程,根据收到的event,决定下一轮要发送给head的event类型。 -![scheduler.jpg](../../static/img/framework_cyle.jpeg) +![scheduler.jpg](../../../static/img/framework_cyle.jpeg) ### Runtime执行 #### 整体介绍 @@ -48,7 +48,7 @@ Runtime模块负责GeaFlow所有模式任务(包括流批、静态/动态图 3. TaskRunner(也继承至AbstractTaskRunner)负责从taskQueue中获取TASK(Event),具体Event事件将交由Task进行处理,其整个生命周期包括:创建、处理及结束,对于异常的Task,可以直接中断。 a. Task创建和初始化会根据CreateTaskEvent事件来完成,Task生命周期结束会根据DestroyTaskEvent事件来完成。 b. 其它类型的Event,都将通过相应的CommandEvent的execute()来完成具体计算语义层面的逻辑(例如:根据InitCycleEvent事件Worker将进行上下游依赖构建;根据LaunchSourceEvent事件Worker将触发source开始读数据等) - ![undefined](../../static/img/framework_scheduler.png) + ![undefined](../../../static/img/framework_scheduler.png) 当前Task中的TaskContext核心数据结构,主要包括:负责执行计算的Worker、负责下游节点从上游异步读取数据的FetchService以及负责将执行Worker产生的数据向下游输出的EmitterService。 * Worker:其主要负责流图数据的对齐处理以及将每批处理结束后相应的DoneEvent callback回Scheduler,Scheduler依据相应的DoneEvent进行后续的调度逻辑。 @@ -67,7 +67,7 @@ Runtime模块负责GeaFlow所有模式任务(包括流批、静态/动态图 对于运行时的所有组件进程,比如master/driver/container,都基于context初始化和运行。在新创建进程时,首先构建进程需要的context,每个进程在初始化时将context做持久化。当进程异常重启后,首先恢复context,然后基于context重新初始化进程。 #### 作业异常恢复 -![undefined](../../static/img/framework_failover.jpeg) +![undefined](../../../static/img/framework_failover.jpeg) * 作业分布式快照 调度器根据当前自身调度状态,确定对运行中的任务发送新的windowId,触发对新窗口的计算。当每个算子对应窗口计算结束后,如果需要对当前窗口上下文做快照,则将算子内对应状态持久化到存储中。 最终调度器收到某个窗口的所有任务执行结束的消息后,也会按需要对该调度元信息做一次快照并持久化,才标志这个窗口的处理最终成。当调度和算子恢复到这个窗口上下文时,则可以基于该窗口继续执行。 diff --git a/docs/docs-cn/principle/state_principle.md b/docs/docs-cn/source/4.concepts/4.state_principle.md similarity index 97% rename from docs/docs-cn/principle/state_principle.md rename to docs/docs-cn/source/4.concepts/4.state_principle.md index 22783dd12..0d54b897a 100644 --- a/docs/docs-cn/principle/state_principle.md +++ b/docs/docs-cn/source/4.concepts/4.state_principle.md @@ -11,7 +11,7 @@ Geaflow 中的状态是指图、流计算过程中的直接计算节点的中间 ### 架构图 -![state_arch](../../static/img/state_arch_new.png) +![state_arch](../../../static/img/state_arch_new.png) * **State API**:提供了面向KV存储API,如get/put/delete等。以及面向图存储的API,如V/E/VE,以及点/边的add/update/delete等。 * **State执行层**:通过KeyGroup的设计实现数据的Sharding和扩缩容能力,Accessor提供了面向不同读写策略和数据模型的IO抽象,StateOperator抽象了存储层SPI,如finish(刷盘)、archive(Checkpoint)、compact(压缩)、recover(恢复)等。另外,State提供了多种PushDown优化以加速IO访问效率。通过自定义内存管理和面向属性的二级索引也会提供大量的存储访问优化手段。 @@ -22,7 +22,7 @@ Geaflow 中的状态是指图、流计算过程中的直接计算节点的中间 在作业运行期间State的生命流程为: -![state_flow](../../static/img/state_flow.png) +![state_flow](../../../static/img/state_flow.png) 当FailOver的时候,会从最近一个持久化的数据恢复,以下是其详细流程。 @@ -61,7 +61,7 @@ Store type 的选择与存储性能也息息相关,例如对于Key State,如 ## State 类型 State大体可以区分为Graph State 和 Key State,分别对应不同的数据结构,同时映射到Store层的不同存储模型,例如对于rocksdb 的 store type,将会有KV、Graph等不同类型的存储模型。 -![state_type](../../static/img/state_type.png) +![state_type](../../../static/img/state_type.png) ### Graph State diff --git a/docs/docs-cn/principle/console_principle.md b/docs/docs-cn/source/4.concepts/5.console_principle.md similarity index 97% rename from docs/docs-cn/principle/console_principle.md rename to docs/docs-cn/source/4.concepts/5.console_principle.md index 4d50ae7ff..3639be44e 100644 --- a/docs/docs-cn/principle/console_principle.md +++ b/docs/docs-cn/source/4.concepts/5.console_principle.md @@ -4,7 +4,7 @@ GeaFlow Console提供了一站式图研发、运维的平台能力,同时为 ### 平台架构 -![console_arch](../../static/img/console_arch.png) +![console_arch](../../../static/img/console_arch.png) * **标准化API**:平台提供了标准化的RESTful API和认证机制,同时支持了页面端和应用端的统一API服务能力。 * **任务研发**:平台支持“关系-实体-属性”的图数据建模。基于字段映射配置,可以定义图数据传输任务,包括数据集成(Import)和数据分发(Export)。基于图表模型的图数据加工任务支持多样化的计算场景,如Traversal、Compute、Mining等。基于数据加速器的图数据服务,提供了多协议的实时分析能力,支持BI、可视化分析工具的接入集成。 @@ -17,7 +17,7 @@ GeaFlow Console提供了一站式图研发、运维的平台能力,同时为 GeaFlow支持多种异构环境执行,以常见的K8S部署环境为例,GeaFlow物理部署架构如下: -![deploy_arch](../../static/img/deploy_arch.png) +![deploy_arch](../../../static/img/deploy_arch.png) 在GeaFlow作业的全生命周期过程中,涉及的关键数据流程有: diff --git a/docs/docs-cn/concepts/graph_view.md b/docs/docs-cn/source/4.concepts/graph_view.md similarity index 96% rename from docs/docs-cn/concepts/graph_view.md rename to docs/docs-cn/source/4.concepts/graph_view.md index 46b3c9e41..915ddfdc9 100644 --- a/docs/docs-cn/concepts/graph_view.md +++ b/docs/docs-cn/source/4.concepts/graph_view.md @@ -13,7 +13,7 @@ GraphView 主要有以下几个功能: * 图切分,GraphView还支持不同图切分方法。 * 图计算,GraphView可以进行图的迭代遍历或者计算。 -![graph_view|(4000x2500)](../../static/img/graph_view.png) +![graph_view|(4000x2500)](../../../static/img/graph_view.png) ## 示例介绍 定义一个 Social Network 的 GraphView, 描述人际关系。 diff --git a/docs/docs-cn/source/4.concepts/index.rst b/docs/docs-cn/source/4.concepts/index.rst new file mode 100644 index 000000000..e90ea2f45 --- /dev/null +++ b/docs/docs-cn/source/4.concepts/index.rst @@ -0,0 +1,13 @@ +技术原理 +==== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + 1.glossary.md + 2.dsl_principle.md + 3.framework_principle.md + 4.state_principle.md + 5.console_principle.md \ No newline at end of file diff --git a/docs/docs-cn/concepts/stream_graph.md b/docs/docs-cn/source/4.concepts/stream_graph.md similarity index 100% rename from docs/docs-cn/concepts/stream_graph.md rename to docs/docs-cn/source/4.concepts/stream_graph.md diff --git a/docs/docs-cn/application-development/api/overview.md b/docs/docs-cn/source/5.application-development/1.api/1.overview.md similarity index 98% rename from docs/docs-cn/application-development/api/overview.md rename to docs/docs-cn/source/5.application-development/1.api/1.overview.md index ae7fc3695..0ddf6dca1 100644 --- a/docs/docs-cn/application-development/api/overview.md +++ b/docs/docs-cn/source/5.application-development/1.api/1.overview.md @@ -1,9 +1,9 @@ -# API介绍 +# 简介 GeaFlow API是对高阶用户提供的开发接口,其支持Graph API和Stream API两种类型: * Graph API:Graph是GeaFlow框架的一等公民,当前GeaFlow框架提供了一套基于GraphView的图计算编程接口,包含图构建、图计算及遍历。在GeaFlow中支持Static Graph和Dynamic Graph两种类型。 * Static Graph API:静态图计算API,基于该类API可以进行全量的图计算或图遍历。 * Dynamic Graph API:动态图计算API,GeaFlow中GraphView是动态图的数据抽象,基于GraphView之上,可以进行动态图计算或图遍历。同时支持对Graphview生成Snapshot快照,基于Snapshot可以提供和Static Graph API一样的接口能力。 - ![api_arch](../../../static/img/api_arch.jpeg) + ![api_arch](../../../../static/img/api_arch.jpeg) * Stream API:GeaFlow提供了一套通用计算的编程接口,包括source构建、流批计算及sink输出。在GeaFlow中支持Batch和Stream两种类型。 * Batch API:批计算API,基于该类API可以进行批量计算。 * Stream API:流计算API,GeaFlow中StreamView是动态流的数据抽象,基于StreamView之上,可以进行流计算。 @@ -13,7 +13,7 @@ GeaFlow API是对高阶用户提供的开发接口,其支持Graph API和Stream * 对于批或静态图API来说,Window将采用AllWindow模式,一个窗口将读取全量数据,从而实现全量的计算。 -# Maven依赖 +## Maven依赖 开发GeaFlow API应用需要添加一下maven依赖: ```xml @@ -47,8 +47,8 @@ GeaFlow API是对高阶用户提供的开发接口,其支持Graph API和Stream ``` -# 功能概览 -## Graph API +## 功能概览 +### Graph API Graph API是GeaFlow中的一等公民,其提供了一套基于GraphView的图计算编程接口,包含图构建、图计算及遍历。具体的API说明如下表格所示: @@ -109,7 +109,7 @@ Graph API是GeaFlow中的一等公民,其提供了一套基于GraphView的图
-## Stream API +### Stream API Stream API提供了一套通用计算的编程接口,包括source构建、流批计算及sink输出。具体的API说明如下表格所示: @@ -182,14 +182,14 @@ Stream API提供了一套通用计算的编程接口,包括source构建、流
-# 典型示例 -## PageRank动态图计算示例介绍 -### PageRank的定义 +## 典型示例 +### PageRank动态图计算示例介绍 +#### PageRank的定义 PageRank算法最初作为互联网网页重要度的计算方法,1996年由Page和Brin提出,并用于谷歌搜索引擎的网页排序。事实上,PageRank 可以定义在任意有向图上,后来被应用到社会影响力分析、文本摘要等多个问题。 假设互联网是一个有向图,在其基础上定义随机游走模型,即一阶马尔可夫链,表示网页浏览者在互联网上随机浏览网页的过程。假设浏览者在每个网页依照连接出去的超链接以等概率跳转到下一个网页,并在网上持续不断进行这样的随机跳转,这个过程形成一阶马尔可夫链。PageRank表示这个马尔可夫链的平稳分布。每个网页的PageRank值就是平稳概率。 算法实现思路:1.假设图中每个点的初始影响值相同;2.计算每个点对其他点的跳转概率,并更新点的影响值;3.进行n次迭代计算,直到各点影响值不再变化,即收敛状态。 -### 实例代码 +#### 实例代码 ```java @@ -415,8 +415,8 @@ public class IncrGraphCompute { ``` -## PageRank静态图计算示例介绍 -### 实例代码 +### PageRank静态图计算示例介绍 +#### 实例代码 ```java @@ -551,8 +551,8 @@ public class PageRank { ``` -## WordCount批计算示例介绍 -### 实例代码 +### WordCount批计算示例介绍 +#### 实例代码 ```java @@ -654,8 +654,8 @@ public class WordCountStream { ``` -## KeyAgg流计算示例介绍 -### 实例代码 +### KeyAgg流计算示例介绍 +#### 实例代码 ```java diff --git a/docs/docs-cn/application-development/api/stream/source.md b/docs/docs-cn/source/5.application-development/1.api/2.stream/1.source.md similarity index 99% rename from docs/docs-cn/application-development/api/stream/source.md rename to docs/docs-cn/source/5.application-development/1.api/2.stream/1.source.md index 42e708f4f..f55b080f7 100644 --- a/docs/docs-cn/application-development/api/stream/source.md +++ b/docs/docs-cn/source/5.application-development/1.api/2.stream/1.source.md @@ -2,7 +2,7 @@ GeaFlow对外提供了Source API,在接口层面需要提供IWindow,用于构建相应的window source,用户可以通过实现SourceFunction来定义具体的源头读取逻辑。 -# 接口 +## 接口 | API | 接口说明 | 入参说明 | | -------- | -------- | -------- | @@ -22,7 +22,7 @@ GeaFlow对外提供了Source API,在接口层面需要提供IWindow,用于 SizeTumblingWindow.of(2)); ``` -# 示例 +## 示例 ```java public class WindowStreamWordCount { diff --git a/docs/docs-cn/application-development/api/stream/process.md b/docs/docs-cn/source/5.application-development/1.api/2.stream/2.process.md similarity index 99% rename from docs/docs-cn/application-development/api/stream/process.md rename to docs/docs-cn/source/5.application-development/1.api/2.stream/2.process.md index a12c6668b..d75f5541d 100644 --- a/docs/docs-cn/application-development/api/stream/process.md +++ b/docs/docs-cn/source/5.application-development/1.api/2.stream/2.process.md @@ -1,7 +1,7 @@ # Process API介绍 GeaFlow对外提供了一系列Process API,这些API和通用的流批类似,但不完全相同。我们在Source API中已有介绍,其构建出来的source是带有window的,因此GeaFlow所有的Process API也都带有window语义。 -# 接口 +## 接口 | API | 接口说明 | 入参说明 | | -------- | -------- | -------- | | PWindowStream map(MapFunction mapFunction) | 通过实现mapFunction,可以将输入的T转换成R向下游输出。 |mapFunction:用户自定义转换逻辑,T表示输入类型,R表示输出类型| @@ -16,7 +16,7 @@ GeaFlow对外提供了一系列Process API,这些API和通用的流批类似 -# 示例 +## 示例 ```java public class StreamUnionPipeline implements Serializable { diff --git a/docs/docs-cn/application-development/api/stream/sink.md b/docs/docs-cn/source/5.application-development/1.api/2.stream/3.sink.md similarity index 98% rename from docs/docs-cn/application-development/api/stream/sink.md rename to docs/docs-cn/source/5.application-development/1.api/2.stream/3.sink.md index 8c10733f8..1004f0dcc 100644 --- a/docs/docs-cn/application-development/api/stream/sink.md +++ b/docs/docs-cn/source/5.application-development/1.api/2.stream/3.sink.md @@ -1,7 +1,7 @@ # Sink API介绍 GeaFlow对外提供了Sink API,用于构建Window Sink,用户可以通过实现SinkFunction来定义具体的输出逻辑。 -# 接口 +## 接口 | API | 接口说明 | 入参说明 | | -------- | -------- | -------- | | PStreamSink sink(SinkFunction sinkFunction) | 将结果进行输出。 |sinkFunction:用户可以通过实现SinkFunction接口,用于定义其相应的输出语义。GeaFlow内部集成了几种sink function,例如:Console、File等。| @@ -11,7 +11,7 @@ GeaFlow对外提供了Sink API,用于构建Window Sink,用户可以通过实 source.sink(v -> {LOGGER.info("result: {}", v)}); ``` -# 示例 +## 示例 ```java public class WindowStreamWordCount { diff --git a/docs/docs-cn/source/5.application-development/1.api/2.stream/index.rst b/docs/docs-cn/source/5.application-development/1.api/2.stream/index.rst new file mode 100644 index 000000000..22ee9b1f6 --- /dev/null +++ b/docs/docs-cn/source/5.application-development/1.api/2.stream/index.rst @@ -0,0 +1,11 @@ +流API +==== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + 1.source.md + 2.process.md + 3.sink.md \ No newline at end of file diff --git a/docs/docs-cn/application-development/api/graph/traversal.md b/docs/docs-cn/source/5.application-development/1.api/3.graph/1.traversal.md similarity index 99% rename from docs/docs-cn/application-development/api/graph/traversal.md rename to docs/docs-cn/source/5.application-development/1.api/3.graph/1.traversal.md index dcc6c30e3..791b2c527 100644 --- a/docs/docs-cn/application-development/api/graph/traversal.md +++ b/docs/docs-cn/source/5.application-development/1.api/3.graph/1.traversal.md @@ -1,8 +1,8 @@ # Traversal API介绍 GeaFlow对外提供了实现图遍历算法的接口,通过实现该接口进行子图遍历,全图遍历。用户可在遍历算法中选取点边继续遍历,并定义迭代次数。 -# 动态图 -## 接口 +## 动态图 +### 接口 | API | 接口说明 | 入参说明 | | --- | --- | --- | | void open(IncVertexCentricTraversalFuncContext vertexCentricFuncContext) | vertexCentricFunction进行open操作 | vertexCentricFuncContext:K表示vertexId的类型,VV表示vertex value类型,EV表示edge value类型,M表示图遍历中定义的消息类型,R表示遍历结果类型。 | @@ -55,7 +55,7 @@ GeaFlow对外提供了实现图遍历算法的接口,通过实现该接口进 } ``` -## 示例 +### 示例 ```java public class IncrGraphTraversalAll { @@ -205,9 +205,9 @@ public class IncrGraphTraversalAll { } ``` -# 静态图 +## 静态图 -## 接口 +### 接口 | API | 接口说明 | 入参说明 | | --- | --- | --- | | void open(VertexCentricTraversalFuncContext vertexCentricFuncContext) | vertexCentric function进行open操作 | vertexCentricFuncContext:K表示vertexId的类型,VV表示vertex value类型,EV表示edge value类型,M表示图遍历中定义的消息类型,R表示遍历结果类型。 | @@ -255,7 +255,7 @@ public interface VertexCentricTraversalFunction extends VertexC } ``` -## 示例 +### 示例 ```java public class StaticGraphTraversalAllExample { diff --git a/docs/docs-cn/application-development/api/graph/compute.md b/docs/docs-cn/source/5.application-development/1.api/3.graph/2.compute.md similarity index 99% rename from docs/docs-cn/application-development/api/graph/compute.md rename to docs/docs-cn/source/5.application-development/1.api/3.graph/2.compute.md index c1256bc9d..02d74e50b 100644 --- a/docs/docs-cn/application-development/api/graph/compute.md +++ b/docs/docs-cn/source/5.application-development/1.api/3.graph/2.compute.md @@ -1,8 +1,8 @@ # Compute API介绍 GeaFlow对外提供了实现图计算算法的接口,通过实现相应接口可进行静态图计算或动态图计算,用户可在compute算法中定义具体的计算逻辑及迭代最大次数。 -# 动态图 -## 接口 +## 动态图 +### 接口 | API | 接口说明 | 入参说明 | | --- | --- | --- | | void init(IncGraphComputeContext incGraphContext) | 图计算初始化接口 | incGraphContext: 增量动态图计算的上下文,K表示vertex id的类型,VV表示vertex value类型,EV表示edge value类型,M表示发送消息的类型。 | @@ -115,7 +115,7 @@ public interface IncVertexCentricComputeFunction extends } ``` -## 示例 +### 示例 ```java public class IncrGraphCompute { @@ -231,9 +231,9 @@ public class IncrGraphCompute { } ``` -# 静态图 +## 静态图 -## 接口 +### 接口 | API | 接口说明 | 入参说明 | | --- | --- | --- | | void init(VertexCentricComputeFuncContext vertexCentricFuncContext) | 迭代计算初始化接口 | vertexCentricFuncContext:静态图计算的上下文,K表示vertex id的类型,VV表示vertex value类型,EV表示edge value类型,M表示发送消息的类型。 | @@ -263,7 +263,7 @@ EV, M> { } ``` -## 示例 +### 示例 ```java public class StaticsGraphCompute { diff --git a/docs/docs-cn/source/5.application-development/1.api/3.graph/index.rst b/docs/docs-cn/source/5.application-development/1.api/3.graph/index.rst new file mode 100644 index 000000000..c30efeb49 --- /dev/null +++ b/docs/docs-cn/source/5.application-development/1.api/3.graph/index.rst @@ -0,0 +1,10 @@ +图API +==== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + 1.traversal.md + 2.compute.md \ No newline at end of file diff --git a/docs/docs-cn/source/5.application-development/1.api/index.rst b/docs/docs-cn/source/5.application-development/1.api/index.rst new file mode 100644 index 000000000..f91280b73 --- /dev/null +++ b/docs/docs-cn/source/5.application-development/1.api/index.rst @@ -0,0 +1,11 @@ +API开发 +=========== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + 1.overview.md + 2.stream/index.rst + 3.graph/index.rst diff --git a/docs/docs-cn/application-development/dsl/overview.md b/docs/docs-cn/source/5.application-development/2.dsl/1.overview.md similarity index 60% rename from docs/docs-cn/application-development/dsl/overview.md rename to docs/docs-cn/source/5.application-development/2.dsl/1.overview.md index fc4e25298..e576aa158 100644 --- a/docs/docs-cn/application-development/dsl/overview.md +++ b/docs/docs-cn/source/5.application-development/2.dsl/1.overview.md @@ -1,7 +1,7 @@ -# 融合DSL概述 +# 简介 融合DSL是GeaFlow提供的图表一体的数据分析语言,支持标准SQL+ISO/GQL进行图表分析.通过融合DSL可以对表数据做关系运算处理,也可以对图数据做图匹配和图算法计算,同时也支持同时图表数据的联合处理. -# 融合DSL使用案例 +## 融合DSL使用案例 - **通过SQL处理GQL结果** @@ -39,7 +39,7 @@ 可以给GQL定义一个参数表,参数表的数据逐条触发GQL查询.GQL将分别返回每个参数对应的计算结果. -# Maven依赖 +## Maven依赖 * 开发UDF/UDAF/UDTF/UDGA需要添加以下依赖: ```xml @@ -59,34 +59,4 @@ ``` -# DSL语法文档 -* DSL语法 - * [DDL](reference/ddl.md) - * [DML](reference/dml.md) - * DQL - * [Select](reference/dql/select.md) - * [Union](reference/dql/union.md) - * [Match](reference/dql/match.md) - * [With](reference/dql/with.md) - * [USE](reference/use.md) -* 内置函数 - * [数学运算](build-in/math.md) - * [逻辑运算](build-in/logical.md) - * [字符串函数](build-in/string.md) - * [日期函数](build-in/date.md) - * [条件函数](build-in/condition.md) - * [聚合函数](build-in/aggregate.md) - * [表处理函数](build-in/table.md) -* 用户自定义函数 - * [UDF](udf/udf.md) - * [UDTF](udf/udtf.md) - * [UDAF](udf/udaf.md) - * [UDGA](udf/udga.md) -* Connector - * [Hive Connector](connector/hive.md) - * [File Connector](connector/file.md) - * [Kafka Connector](connector/kafka.md) - * [Pulsar Connector](connector/pulsar.md) - * [用户自定义Connector](connector/udc.md) - \ No newline at end of file diff --git a/docs/docs-cn/application-development/dsl/reference/use.md b/docs/docs-cn/source/5.application-development/2.dsl/2.syntax/1.dcl.md similarity index 92% rename from docs/docs-cn/application-development/dsl/reference/use.md rename to docs/docs-cn/source/5.application-development/2.dsl/2.syntax/1.dcl.md index 3f677f3c1..414aeccae 100644 --- a/docs/docs-cn/application-development/dsl/reference/use.md +++ b/docs/docs-cn/source/5.application-development/2.dsl/2.syntax/1.dcl.md @@ -1,11 +1,13 @@ -# Use Graph +# DCL + +## Use Graph 用户在执行Match语句之前需要通过Use Graph语句指定当前查询的图。 -## Syntax +### Syntax ```sql USE GRAPH Identifier ``` -## Example +### Example ```sql -- Set current using graph. @@ -29,14 +31,14 @@ FROM ( ) ; ``` -# Use Instance +## Use Instance Instance类似于Hive/Mysql中的Database的概念。我们可以通过**Use Instance**命令指定当前语句的实例。 -## Syntax +### Syntax ```sql USE INSTANCE Identifier ``` -## Example +### Example ```sql Use instance geaflow; diff --git a/docs/docs-cn/application-development/dsl/reference/ddl.md b/docs/docs-cn/source/5.application-development/2.dsl/2.syntax/2.ddl.md similarity index 96% rename from docs/docs-cn/application-development/dsl/reference/ddl.md rename to docs/docs-cn/source/5.application-development/2.dsl/2.syntax/2.ddl.md index e645071ef..b955bace6 100644 --- a/docs/docs-cn/application-development/dsl/reference/ddl.md +++ b/docs/docs-cn/source/5.application-development/2.dsl/2.syntax/2.ddl.md @@ -1,6 +1,6 @@ # DDL -# 表相关DDL -## Create Table +## 表相关DDL +### Create Table 该命令用来创建一张表,GeaFlow将其识别为外部表并将元数据存储在Catalog中。 **Syntax** @@ -28,7 +28,7 @@ Create Table v_person_table ( ``` 这个例子创建了一张表**v_person_table**,包含id, name, age三列,表的存储类型为文件,并通过**geaflow.dsl.file.path**参数说明需要访问的文件存放在引擎资源的指定目录中。 -### 数据类型 +#### 数据类型 | 类型 | 说明 | | -------- | -------- | @@ -40,7 +40,7 @@ Create Table v_person_table ( | VARCHAR |字符串类型 | | TIMESTAMP | 时间戳类型 | -### 参数 +#### 参数 创建表的同时,可以使用WITH指定表的参数信息,其中type参数用于指定外部表的存储类型,其他参数为kv类型。 **Example** @@ -59,7 +59,7 @@ CREATE TABLE person ( 这个例子创建一个文件类型的表,并指定表参数。其中type指定表类型为文件; geaflow.dsl.file.path文件路径;geaflow.dsl.column.separator指定字段分隔列; geaflow.dsl.window.size指定每批次读取文件的行数。 -## Create View +### Create View **Syntax** ``` @@ -78,8 +78,8 @@ CREATE VIEW console_1 (a, b, c) AS SELECT id, name, age FROM v_person_table; ``` -# 图相关DDL -## Create Graph +## 图相关DDL +### Create Graph **Syntax** 一个图至少包含一对点边,点表必须包含一个id字段作为主键,边表必须包含srcId和targetId作为主键,边表还可以有一个时间戳字段标识时间。 @@ -149,8 +149,8 @@ CREATE GRAPH dy_modern ( 图的存储分片数通过shardCount配置项指定,图的存储分片数影响图计算时的并发数,设为更大的值可以利用更多机器并发计算,但所需资源数也将增长。 -# 自定义函数 -## Create Function +## 自定义函数 +### Create Function 这个命令用来引入一个自定义函数。 **Syntax** diff --git a/docs/docs-cn/application-development/dsl/reference/dml.md b/docs/docs-cn/source/5.application-development/2.dsl/2.syntax/3.dml.md similarity index 98% rename from docs/docs-cn/application-development/dsl/reference/dml.md rename to docs/docs-cn/source/5.application-development/2.dsl/2.syntax/3.dml.md index 9b0f90f60..517365d97 100644 --- a/docs/docs-cn/application-development/dsl/reference/dml.md +++ b/docs/docs-cn/source/5.application-development/2.dsl/2.syntax/3.dml.md @@ -1,5 +1,5 @@ # DML -# Insert Table +## Insert Table **Syntax** ``` @@ -32,10 +32,10 @@ RETURN a.id as a_id, e.weight as weight, b.id as b_id; 这个例子向**tbl_result**表导入一个走图查询语句返回的结果。 -# Insert Graph +## Insert Graph Insert命令还可以向图导入数据。与表不同,图使用GeaFlow自主维护的存储。 -# 插入点/边 +## 插入点/边 Insert命令向图中的点或边导入数据时,操作对象以点分的图名加点边名表示,支持字段的重新排序。 **Syntax** @@ -74,7 +74,7 @@ SELECT 1, 2, 0.2 ``` 这个例子向图**dy_modern**中的边**knows**导入一行数据。 -# 多表插入 +## 多表插入 有时源表需要同时插入到多个点或边中,特别是源表的外键表示一种关系时,往往需要转化为一类边,键值也将成为边的对端点。INSERT语句也支持这种单一源表,多目标点的插入。 **Syntax** diff --git a/docs/docs-cn/source/5.application-development/2.dsl/2.syntax/4.dql/index.rst b/docs/docs-cn/source/5.application-development/2.dsl/2.syntax/4.dql/index.rst new file mode 100644 index 000000000..a46a4301b --- /dev/null +++ b/docs/docs-cn/source/5.application-development/2.dsl/2.syntax/4.dql/index.rst @@ -0,0 +1,9 @@ +DQL +==== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + * \ No newline at end of file diff --git a/docs/docs-cn/application-development/dsl/reference/dql/match.md b/docs/docs-cn/source/5.application-development/2.dsl/2.syntax/4.dql/match.md similarity index 100% rename from docs/docs-cn/application-development/dsl/reference/dql/match.md rename to docs/docs-cn/source/5.application-development/2.dsl/2.syntax/4.dql/match.md diff --git a/docs/docs-cn/application-development/dsl/reference/dql/select.md b/docs/docs-cn/source/5.application-development/2.dsl/2.syntax/4.dql/select.md similarity index 88% rename from docs/docs-cn/application-development/dsl/reference/dql/select.md rename to docs/docs-cn/source/5.application-development/2.dsl/2.syntax/4.dql/select.md index 89c3133e8..3f17ad78b 100644 --- a/docs/docs-cn/application-development/dsl/reference/dql/select.md +++ b/docs/docs-cn/source/5.application-development/2.dsl/2.syntax/4.dql/select.md @@ -1,4 +1,7 @@ -# Select Syntax +# Select + +## Syntax + ```sql SELECT [ DISTINCT ] { * | expr (, expr )* } @@ -10,8 +13,8 @@ FROM { Table | SubQuery | Match } [ LIMIT number ] ``` -# Example -## Select +## Example +### Select ```sql SELECT id, name, age FROM user; @@ -20,19 +23,19 @@ SELECT DISTINCT id, name, age FROM user; SELECT price * 10 FROM trade; ``` -## From -### From Table +### From +#### From Table ```sql SELECT id, name, age FROM user where id > 10 ``` -### From SubQuery +#### From SubQuery ```sql SELECT id, name, age FROM ( SELECT * FROM user where id > 10 ) ``` -### From Match +#### From Match ```sql SELECT a_id, @@ -45,7 +48,7 @@ FROM ( ``` More information about match, please see the Match Syntax. -## Where +### Where ```sql SELECT id, name, age FROM user where id > 10; @@ -53,23 +56,23 @@ SELECT DISTINCT id, name, age FROM user where id > 10; SELECT price * 10 FROM trade where price > 20; ``` -## Group By +### Group By ```sql SELECT age, count(id) as cnt FROM user GROUP BY age; SELECT type, max(age), min(age), avg(age) FROM user GROUP BY type; ``` -## Having +### Having ```sql SELECT age, count(id) as cnt FROM user GROUP BY age Having count(id) > 10; ``` -## Order By +### Order By ```sql SELECT * from user order by age; SELECT age, count(id) as cnt FROM user GROUP BY age Having count(id) > 10 Order by cnt; ``` -## Limit +### Limit ```sql SELECT * from user order by age limit 10; diff --git a/docs/docs-cn/application-development/dsl/reference/dql/union.md b/docs/docs-cn/source/5.application-development/2.dsl/2.syntax/4.dql/union.md similarity index 92% rename from docs/docs-cn/application-development/dsl/reference/dql/union.md rename to docs/docs-cn/source/5.application-development/2.dsl/2.syntax/4.dql/union.md index 3de7351d1..5473e4779 100644 --- a/docs/docs-cn/application-development/dsl/reference/dql/union.md +++ b/docs/docs-cn/source/5.application-development/2.dsl/2.syntax/4.dql/union.md @@ -1,11 +1,14 @@ -# Union Syntax +# Union + +## Syntax + ```sql select_statement UNION [ ALL ] select_statement ``` -# Example +## Example ```sql SELECT * FROM ( diff --git a/docs/docs-cn/application-development/dsl/reference/dql/with.md b/docs/docs-cn/source/5.application-development/2.dsl/2.syntax/4.dql/with.md similarity index 93% rename from docs/docs-cn/application-development/dsl/reference/dql/with.md rename to docs/docs-cn/source/5.application-development/2.dsl/2.syntax/4.dql/with.md index 08fa7ca39..1ec99ffcc 100644 --- a/docs/docs-cn/application-development/dsl/reference/dql/with.md +++ b/docs/docs-cn/source/5.application-development/2.dsl/2.syntax/4.dql/with.md @@ -1,10 +1,11 @@ -# With Syntax +# With +## Syntax With语句用于指定图计算的起点和相关参数集合,一般和Match语句配合使用,指定Match语句的起始点。 ```sql WITH Identifier AS '(' SubQuery ')' ``` -# Example +## Example ```sql SELECT diff --git a/docs/docs-cn/source/5.application-development/2.dsl/2.syntax/index.rst b/docs/docs-cn/source/5.application-development/2.dsl/2.syntax/index.rst new file mode 100644 index 000000000..859d222c2 --- /dev/null +++ b/docs/docs-cn/source/5.application-development/2.dsl/2.syntax/index.rst @@ -0,0 +1,12 @@ +语法文档 +=========== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + 1.dcl.md + 2.ddl.md + 3.dml.md + 4.dql/index.rst diff --git a/docs/docs-cn/application-development/dsl/build-in/aggregate.md b/docs/docs-cn/source/5.application-development/2.dsl/3.build-in/aggregate.md similarity index 93% rename from docs/docs-cn/application-development/dsl/build-in/aggregate.md rename to docs/docs-cn/source/5.application-development/2.dsl/3.build-in/aggregate.md index a7787d7e0..75607da54 100644 --- a/docs/docs-cn/application-development/dsl/build-in/aggregate.md +++ b/docs/docs-cn/source/5.application-development/2.dsl/3.build-in/aggregate.md @@ -1,11 +1,13 @@ +# Aggregate + GeaFlow支持以下聚合函数: -* [COUNT](#COUNT) -* [MAX](#MAX) -* [MIN](#MIN) -* [SUM](#SUM) -* [AVG](#AVG) +* [count](#count) +* [max](#max) +* [min](#min) +* [sum](#sum) +* [avg](#avg) -# COUNT +## count **Syntax** ```sql @@ -22,7 +24,7 @@ select count(distinct id) from user; select count(1) from user; ``` -# MAX +## max **Syntax** ```sql @@ -41,7 +43,7 @@ select id, max(age) from user group by id; select max(name) from user; ``` -# MIN +## min **Syntax** ```sql @@ -60,7 +62,7 @@ select id, min(age) from user group by id; select min(name) from user; ``` -# SUM +## sum **Syntax** ```sql @@ -79,7 +81,7 @@ select sum(DISTINCT age) from user; select sum(1) from user; ``` -# AVG +## avg **Syntax** ```sql diff --git a/docs/docs-cn/application-development/dsl/build-in/condition.md b/docs/docs-cn/source/5.application-development/2.dsl/3.build-in/condition.md similarity index 96% rename from docs/docs-cn/application-development/dsl/build-in/condition.md rename to docs/docs-cn/source/5.application-development/2.dsl/3.build-in/condition.md index 3936f8674..0c51db6b0 100644 --- a/docs/docs-cn/application-development/dsl/build-in/condition.md +++ b/docs/docs-cn/source/5.application-development/2.dsl/3.build-in/condition.md @@ -1,8 +1,10 @@ +# Condition + GeaFlow支持**Case**和**If**条件函数。 * [Case](#Case) * [If](#If) -# Case +## Case **Syntax** ```sql @@ -42,7 +44,7 @@ CASE WHEN a = 1 THEN '1' END ``` -# If +## If **Syntax** ```sql diff --git a/docs/docs-cn/application-development/dsl/build-in/date.md b/docs/docs-cn/source/5.application-development/2.dsl/3.build-in/date.md similarity index 97% rename from docs/docs-cn/application-development/dsl/build-in/date.md rename to docs/docs-cn/source/5.application-development/2.dsl/3.build-in/date.md index 09a6213b5..450dd9de2 100644 --- a/docs/docs-cn/application-development/dsl/build-in/date.md +++ b/docs/docs-cn/source/5.application-development/2.dsl/3.build-in/date.md @@ -1,3 +1,5 @@ +# Date + GeaFlow支持以下日期函数: * [from_unixtime](#from_unixtime) * [from_unixtime_millis](#from_unixtime_millis) @@ -18,7 +20,7 @@ GeaFlow支持以下日期函数: * [date_part](#date_part) * [date_trunc](#date_trunc) -# from_unixtime +## from_unixtime **Syntax** ```sql @@ -38,7 +40,7 @@ from_unixtime(11111111) = '1970-05-09 22:25:11' from_unixtime(11111111, 'yyyy-MM-dd HH:mm:ss.SSSSSS') = '1970-05-09 22:25:11.000000' ``` -# from_unixtime_millis +## from_unixtime_millis **Syntax** ```sql @@ -58,7 +60,7 @@ from_unixtime_millis(11111111, 'yyyy-MM-dd HH:mm:ss') = '1970-01-01 11:05:11' from_unixtime_millis(11111111, 'yyyy-MM-dd HH:mm:ss.SSSSSS') = '1970-01-01 11:05:11.111000' ``` -# unix_timestamp +## unix_timestamp **Syntax** ```sql @@ -76,7 +78,7 @@ unix_timestamp('1987-06-05 00:11:22') = 549817882 unix_timestamp('1987-06-05 00:11', 'yyyy-MM-dd HH:mm') = 549817860 ``` -# unix_timestamp_millis +## unix_timestamp_millis **Syntax** ```sql @@ -93,7 +95,7 @@ unix_timestamp_millis('1987-06-05 00:11:22') = 549817882000 unix_timestamp_millis('1987-06-05', 'yyyy-mm-dd') = 536774760000 ``` -# isdate +## isdate **Syntax** ```sql @@ -111,7 +113,7 @@ isdate('xxxxxxxxxxxxx') = false isdate('1987-06-05 00:11:22', 'yyyy-MM-dd HH:mm:ss.SSSSSS') = false ``` -# now +## now **Syntax** ```sql @@ -129,7 +131,7 @@ now() now(1000) ``` -# day +## day **Syntax** ```sql @@ -144,7 +146,7 @@ int day(string dateString) day('1987-06-05 00:11:22') = 5 ``` -# weekday +## weekday **Syntax** ```sql @@ -159,7 +161,7 @@ int weekday(string dateString) weekday('1987-06-05 00:11:22') = 5 ``` -# lastday +## lastday **Syntax** ```sql @@ -174,7 +176,7 @@ string lastday(string dateString) lastday('1987-06-05') = '1987-06-30 00:00:00' ``` -# day_of_month +## day_of_month **Syntax** ```sql @@ -189,7 +191,7 @@ int day_of_month(string dateString) day_of_month('1987-06-05 00:11:22') = 5 ``` -# week_of_year +## week_of_year **Syntax** ```sql @@ -205,7 +207,7 @@ week_of_year('1987-06-05 00:11:22') = 23 ``` -# date_add +## date_add **Syntax** ```sql @@ -222,7 +224,7 @@ date_add('2017-09-25', 1) = '2017-09-26' date_add('2017-09-25', -1) = '2017-09-24' ``` -# date_sub +## date_sub **Syntax** ```sql @@ -239,7 +241,7 @@ date_sub('2017-09-25', 1) = '2017-09-24' date_sub('2017-09-25', -1) = '2017-09-26' ``` -# date_diff +## date_diff **Syntax** ```sql @@ -255,7 +257,7 @@ date_diff('2017-09-26', '2017-09-25') = 1 date_diff('2017-09-24', '2017-09-25') = -1 ``` -# add_months +## add_months **Syntax** ```sql @@ -272,7 +274,7 @@ add_months('2017-09-25', 1) = '2017-10-25' add_months('2017-09-25', -1) = '2017-08-25' ``` -# date_format +## date_format **Syntax** ```sql @@ -291,7 +293,7 @@ date_format('1987-06-05 00:11:22', 'MM-dd-yyyy') = '06-05-1987' date_format('00:11:22 1987-06-05', 'HH:mm:ss yyyy-MM-dd', 'MM-dd-yyyy') = '06-05-1987' ``` -# date_part +## date_part **Syntax** ```sql @@ -321,7 +323,7 @@ date_part('1987-06-05 00:11:22', 'ss') = 22 date_part('1987-06-05', 'ss') = 0 ``` -# date_trunc +## date_trunc **Syntax** ```sql diff --git a/docs/docs-cn/source/5.application-development/2.dsl/3.build-in/index.rst b/docs/docs-cn/source/5.application-development/2.dsl/3.build-in/index.rst new file mode 100644 index 000000000..f2a64c0b1 --- /dev/null +++ b/docs/docs-cn/source/5.application-development/2.dsl/3.build-in/index.rst @@ -0,0 +1,9 @@ +内置函数 +==== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + * \ No newline at end of file diff --git a/docs/docs-cn/application-development/dsl/build-in/logical.md b/docs/docs-cn/source/5.application-development/2.dsl/3.build-in/logical.md similarity index 99% rename from docs/docs-cn/application-development/dsl/build-in/logical.md rename to docs/docs-cn/source/5.application-development/2.dsl/3.build-in/logical.md index 402348028..364e218db 100644 --- a/docs/docs-cn/application-development/dsl/build-in/logical.md +++ b/docs/docs-cn/source/5.application-development/2.dsl/3.build-in/logical.md @@ -1,3 +1,5 @@ +# Logical + Geaflow支持以下逻辑运算: 操作|描述 diff --git a/docs/docs-cn/application-development/dsl/build-in/math.md b/docs/docs-cn/source/5.application-development/2.dsl/3.build-in/math.md similarity index 99% rename from docs/docs-cn/application-development/dsl/build-in/math.md rename to docs/docs-cn/source/5.application-development/2.dsl/3.build-in/math.md index 8a877cbad..73d82fa80 100644 --- a/docs/docs-cn/application-development/dsl/build-in/math.md +++ b/docs/docs-cn/source/5.application-development/2.dsl/3.build-in/math.md @@ -1,3 +1,5 @@ +# Math + Geaflow 支持以下数学运算。 | 操作 | 说明 | diff --git a/docs/docs-cn/application-development/dsl/build-in/string.md b/docs/docs-cn/source/5.application-development/2.dsl/3.build-in/string.md similarity index 96% rename from docs/docs-cn/application-development/dsl/build-in/string.md rename to docs/docs-cn/source/5.application-development/2.dsl/3.build-in/string.md index 76fc8b1da..12843451e 100644 --- a/docs/docs-cn/application-development/dsl/build-in/string.md +++ b/docs/docs-cn/source/5.application-development/2.dsl/3.build-in/string.md @@ -1,3 +1,5 @@ +# String + GeaFlow支持以下字符串函数: * [ascii2str](#ascii2str) * [base64_decode](#base64_decode) @@ -27,7 +29,7 @@ GeaFlow支持以下字符串函数: * [urldecode](#urldecode) * [urlencode](#urlencode) -# ascii2str +## ascii2str **Syntax** ```sql @@ -44,7 +46,7 @@ ascii2str(66) = 'B' ascii2str(48) = '0' ``` -# base64_decode +## base64_decode **Syntax** ```sql @@ -61,7 +63,7 @@ base64_decode('dGVzdF9zdHJpbmc=') = 'test_string' base64_decode(null) = null ``` -# base64_encode +## base64_encode **Syntax** ```sql @@ -77,7 +79,7 @@ base64_encode('abc ') = 'YWJjIA==' base64_encode('test_string') = 'dGVzdF9zdHJpbmc=' ``` -# concat +## concat **Syntax** ```sql @@ -94,7 +96,7 @@ concat('1','2',null) = '12' concat(null) = null; ``` -# concat_ws +## concat_ws **Syntax** ```sql @@ -112,7 +114,7 @@ concat_ws(',','1',null,'c') = '1,,c' concat_ws(null, 'a','b','c') = 'abc' ``` -# hash +## hash **Syntax** ```sql @@ -128,7 +130,7 @@ hash('1') = 49 hash(2) = 2 ``` -# index_of +## index_of **Syntax** ```sql @@ -146,7 +148,7 @@ index_of('a test string', 'test') = 2 index_of(null, 'test') = -1 ``` -# instr +## instr **Syntax** ```sql @@ -168,7 +170,7 @@ instr('abc', 'a', 3, -1) = null instr('abc', null) = null ``` -# isBlank +## isBlank **Syntax** ```sql @@ -184,7 +186,7 @@ isBlank('test') = false isBlank(' ') = true ``` -# length +## length **Syntax** ```sql @@ -200,7 +202,7 @@ length('abc') = 3 length('abc ') = 5 ``` -# like +## like **Syntax** ```sql @@ -217,7 +219,7 @@ like('test', 'abc\\%') = false like('abc', 'a%bc') = true ``` -# lower +## lower **Syntax** ```sql @@ -233,7 +235,7 @@ lower('ABC') = 'abc' lower(null) = null ``` -# ltrim +## ltrim **Syntax** ```sql @@ -249,7 +251,7 @@ ltrim(' abc ') = 'abc ' ltrim(' test') = 'test' ``` -# regexp +## regexp **Syntax** ```sql @@ -266,7 +268,7 @@ regexp('a.b.c.d.e.f', '.d%') = false regexp('a.b.c.d.e.f', null) = null ``` -# regexp_count +## regexp_count **Syntax** ```sql @@ -284,7 +286,7 @@ regexp('ab1d2d3dsss', '[0-9]d', 8) = 0 regexp('ab1d2d3dsss', '.b') = 1 ``` -# regexp_extract +## regexp_extract **Syntax** ```sql @@ -301,7 +303,7 @@ regexp_extract('abchebar', 'abc(.*?)(bar)', 1) = 'he' regexp_extract('100-200', '(\d+)-(\d+)') = '100' ``` -# regexp_replace +## regexp_replace **Syntax** ```sql @@ -319,7 +321,7 @@ regexp_replace('adfabadfasdf', '[a]', '3') = '3df3b3df3sdf' ``` -# repeat +## repeat **Syntax** ```sql @@ -335,7 +337,7 @@ repeat('abc', 3) = 'abcabcabc' repeat(null, 4) = null ``` -# replace +## replace **Syntax** ```sql @@ -351,7 +353,7 @@ replace('test test', 'test', 'c') = 'c c' replace('test test', 'test', '') = ' ' ``` -# reverse +## reverse **Syntax** ```sql @@ -367,7 +369,7 @@ reverse('abc') = 'cba' reverse(null) = null ``` -# rtrim +## rtrim **Syntax** ```sql @@ -383,7 +385,7 @@ rtrim(' abc ') = ' abc' rtrim('test') = 'test' ``` -# space +## space **Syntax** ```sql @@ -399,7 +401,7 @@ space(5) = ' ' space(null) = null ``` -# split_ex +## split_ex **Syntax** ```sql @@ -417,7 +419,7 @@ split_ex('a.b.c.d.e', '.', -1) = null ``` -# substr +## substr **Syntax** ```sql @@ -434,7 +436,7 @@ substr('testString', 5, 10) = 'String' substr('testString', -6) = 'String' ``` -# trim +## trim **Syntax** ```sql @@ -450,7 +452,7 @@ trim(' abc ') = 'abc' trim('abc') = 'abc' ``` -# upper +## upper **Syntax** ```sql @@ -466,7 +468,7 @@ upper('abc') = 'ABC' upper(null) = null ``` -# urldecode +## urldecode **Syntax** ```sql @@ -482,7 +484,7 @@ urldecode('a%3d0%26c%3d1') = 'a=0&c=1' urldecode('a%3D2') = 'a=2' ``` -# urlencode +## urlencode **Syntax** ```sql diff --git a/docs/docs-cn/application-development/dsl/build-in/table.md b/docs/docs-cn/source/5.application-development/2.dsl/3.build-in/table.md similarity index 96% rename from docs/docs-cn/application-development/dsl/build-in/table.md rename to docs/docs-cn/source/5.application-development/2.dsl/3.build-in/table.md index b03f94d45..21149e8d0 100644 --- a/docs/docs-cn/application-development/dsl/build-in/table.md +++ b/docs/docs-cn/source/5.application-development/2.dsl/3.build-in/table.md @@ -1,3 +1,5 @@ +# Table + TABLE 函数对每个输入返回若干行数据。 **Table Function Syntax** @@ -8,7 +10,7 @@ FROM (Table | SubQuery), LATERAL TABLE '('TableFunctionRef')' AS Identifier '(' Identifier (,Identifier)* ')' ``` -# split +## split **Syntax** ```sql diff --git a/docs/docs-cn/application-development/dsl/udf/udf.md b/docs/docs-cn/source/5.application-development/2.dsl/4.udf/1.udf.md similarity index 97% rename from docs/docs-cn/application-development/dsl/udf/udf.md rename to docs/docs-cn/source/5.application-development/2.dsl/4.udf/1.udf.md index ddbaa1732..756f4a039 100644 --- a/docs/docs-cn/application-development/dsl/udf/udf.md +++ b/docs/docs-cn/source/5.application-development/2.dsl/4.udf/1.udf.md @@ -1,6 +1,6 @@ # UDF介绍 UDF(User Defined Function)将标量值映射到标量值。 -# 接口 +## 接口 ```java public abstract class UserDefinedFunction implements Serializable { @@ -23,7 +23,7 @@ public abstract class UDF extends UserDefinedFunction { } ``` -# 示例 +## 示例 ```java public class ConcatWS extends UDF { diff --git a/docs/docs-cn/application-development/dsl/udf/udaf.md b/docs/docs-cn/source/5.application-development/2.dsl/4.udf/2.udaf.md similarity index 99% rename from docs/docs-cn/application-development/dsl/udf/udaf.md rename to docs/docs-cn/source/5.application-development/2.dsl/4.udf/2.udaf.md index a6ecf2c02..cee108553 100644 --- a/docs/docs-cn/application-development/dsl/udf/udaf.md +++ b/docs/docs-cn/source/5.application-development/2.dsl/4.udf/2.udaf.md @@ -1,6 +1,6 @@ # UDAF介绍 UDAF(User Define Aggregate Function)将多行数据聚合为单个值。 -# 接口 +## 接口 ```java public abstract class UserDefinedFunction implements Serializable { @@ -51,7 +51,7 @@ public abstract class UDAF extends UserDefinedFunction ``` -# 示例 +## 示例 ```java public class AvgDouble extends UDAF { diff --git a/docs/docs-cn/application-development/dsl/udf/udtf.md b/docs/docs-cn/source/5.application-development/2.dsl/4.udf/3.udtf.md similarity index 99% rename from docs/docs-cn/application-development/dsl/udf/udtf.md rename to docs/docs-cn/source/5.application-development/2.dsl/4.udf/3.udtf.md index 6bd954b6f..3778d7584 100644 --- a/docs/docs-cn/application-development/dsl/udf/udtf.md +++ b/docs/docs-cn/source/5.application-development/2.dsl/4.udf/3.udtf.md @@ -1,6 +1,6 @@ # UDTF介绍 UDTF(User Defined Table Function)将输入扩展为多行。 -# 接口 +## 接口 ```java public abstract class UserDefinedFunction implements Serializable { @@ -45,7 +45,7 @@ public abstract class UDTF extends UserDefinedFunction { ``` 每个UDTF都应该有一个或多个**eval**方法。 -# 示例 +## 示例 ```java public class Split extends UDTF { diff --git a/docs/docs-cn/application-development/dsl/udf/udga.md b/docs/docs-cn/source/5.application-development/2.dsl/4.udf/4.udga.md similarity index 99% rename from docs/docs-cn/application-development/dsl/udf/udga.md rename to docs/docs-cn/source/5.application-development/2.dsl/4.udf/4.udga.md index 4c0bee557..7f5a0e04a 100644 --- a/docs/docs-cn/application-development/dsl/udf/udga.md +++ b/docs/docs-cn/source/5.application-development/2.dsl/4.udf/4.udga.md @@ -1,6 +1,6 @@ # UDGA介绍 UDGA(User Define Graph Algorithm)定义了一个图算法,例如SSSP(单源最短路径)和PageRank算法。 -# 接口 +## 接口 ```java /** @@ -30,7 +30,7 @@ public interface AlgorithmUserFunction extends Serializable { ``` -# 示例 +## 示例 ```java public class PageRank implements AlgorithmUserFunction { diff --git a/docs/docs-cn/source/5.application-development/2.dsl/4.udf/index.rst b/docs/docs-cn/source/5.application-development/2.dsl/4.udf/index.rst new file mode 100644 index 000000000..ce88b0601 --- /dev/null +++ b/docs/docs-cn/source/5.application-development/2.dsl/4.udf/index.rst @@ -0,0 +1,12 @@ +自定义函数 +==== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + 1.udf.md + 2.udaf.md + 3.udtf.md + 4.udga.md \ No newline at end of file diff --git a/docs/docs-cn/source/5.application-development/2.dsl/index.rst b/docs/docs-cn/source/5.application-development/2.dsl/index.rst new file mode 100644 index 000000000..8b4b06e31 --- /dev/null +++ b/docs/docs-cn/source/5.application-development/2.dsl/index.rst @@ -0,0 +1,12 @@ +DSL开发 +=========== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + 1.overview.md + 2.syntax/index.rst + 3.build-in/index.rst + 4.udf/index.rst diff --git a/docs/docs-cn/application-development/dsl/connector/common.md b/docs/docs-cn/source/5.application-development/3.connector/1.common.md similarity index 97% rename from docs/docs-cn/application-development/dsl/connector/common.md rename to docs/docs-cn/source/5.application-development/3.connector/1.common.md index fe2ac6dde..92b6e6446 100644 --- a/docs/docs-cn/application-development/dsl/connector/common.md +++ b/docs/docs-cn/source/5.application-development/3.connector/1.common.md @@ -1,7 +1,7 @@ -# Connector基础介绍 +# 简介 GeaFlow 支持从各类connector中读写数据,GeaFlow将它们都识别为外部表,并将元数据存储在Catalog中。 -# 语法 +## 语法 ```sql CREATE [TEMPORARY] TABLE [IF NOT EXISTS] table ( @@ -23,7 +23,7 @@ WITH子句用于指定Connector的配置信息,其中的type字段必须,用 同时我们可以在WITH中添加表的参数,这些参数会覆盖掉外部(SQL文件中、作业参数中)的配置项,具有最高优先级。 -# 主要参数 +## 主要参数 | 参数名 | 是否必须 | 描述 | |------------------------------| -------- |---------------------------------------------------------------------------------------------------------------| @@ -32,7 +32,7 @@ WITH子句用于指定Connector的配置信息,其中的type字段必须,用 | geaflow.dsl.partitions.per.source.parallelism | 否 | 将Source的分片若干个编为一组,减少并发数关联的资源使用量。 | -# 示例 +## 示例 ```sql CREATE TABLE console_sink ( diff --git a/docs/docs-cn/application-development/dsl/connector/udc.md b/docs/docs-cn/source/5.application-development/3.connector/10.udc.md similarity index 99% rename from docs/docs-cn/application-development/dsl/connector/udc.md rename to docs/docs-cn/source/5.application-development/3.connector/10.udc.md index 88dc5e9fe..36e3f5ee6 100644 --- a/docs/docs-cn/application-development/dsl/connector/udc.md +++ b/docs/docs-cn/source/5.application-development/3.connector/10.udc.md @@ -117,9 +117,9 @@ public interface TableSink extends Serializable { } ``` -# 示例 +## 示例 下面是一个用于控制台的Table Connector的示例。 -## 实现 +### 实现 ```java public class ConsoleTableConnector implements TableWritableConnector { @@ -173,7 +173,7 @@ public class ConsoleTableSink implements TableSink { 在实现了 ConsoleTableConnector 后,您需要将完整的类名添加到 resources/META-INF.services/com.antgroup.geaflow.dsl.connector.api. TableConnector 文件中。该文件应列出所有实现了 TableConnector 接口的连接器类的全名,以便 GeaFlow 在启动时能够扫描到这些类,并将它们注册为可用的Connector。 -## 用法 +### 用法 ```sql CREATE TABLE file_source ( diff --git a/docs/docs-cn/application-development/dsl/connector/file.md b/docs/docs-cn/source/5.application-development/3.connector/2.file.md similarity index 97% rename from docs/docs-cn/application-development/dsl/connector/file.md rename to docs/docs-cn/source/5.application-development/3.connector/2.file.md index e67a4655b..7e5791ad5 100644 --- a/docs/docs-cn/application-development/dsl/connector/file.md +++ b/docs/docs-cn/source/5.application-development/3.connector/2.file.md @@ -1,6 +1,6 @@ # File Connector介绍 GeaFlow 支持从文件中读取数据,也支持向文件写入数据。 -# 语法 +## 语法 ```sql CREATE TABLE file_table ( @@ -12,7 +12,7 @@ CREATE TABLE file_table ( geaflow.dsl.file.path = '/path/to/file' ) ``` -# 参数 +## 参数 | 参数名 | 是否必须 | 描述 | | -------- |------|------------------------------| @@ -23,7 +23,7 @@ CREATE TABLE file_table ( | geaflow.dsl.file.name.regex | 否 | 读取文件名称正则过滤规则,默认为空。 | | geaflow.dsl.file.format | 否 | 读写文件格式,支持parquet、txt,默认为txt。 | -# 示例 +## 示例 ```sql CREATE TABLE file_source ( diff --git a/docs/docs-cn/application-development/dsl/connector/console.md b/docs/docs-cn/source/5.application-development/3.connector/3.console.md similarity index 95% rename from docs/docs-cn/application-development/dsl/connector/console.md rename to docs/docs-cn/source/5.application-development/3.connector/3.console.md index a04da87c4..7ec733592 100644 --- a/docs/docs-cn/application-development/dsl/connector/console.md +++ b/docs/docs-cn/source/5.application-development/3.connector/3.console.md @@ -1,6 +1,6 @@ # Console Connector介绍 -# 语法 +## 语法 ```sql CREATE TABLE console_table ( @@ -12,13 +12,13 @@ CREATE TABLE console_table ( geaflow.dsl.console.skip = true ) ``` -# 参数 +## 参数 | 参数名 | 是否必须 | 描述 | | -------- | -------- |------------------------| | geaflow.dsl.console.skip | 否 | 是否跳过日志,即无任何输出,默认为false | -# 示例 +## 示例 ```sql CREATE TABLE file_source ( diff --git a/docs/docs-cn/application-development/dsl/connector/jdbc.md b/docs/docs-cn/source/5.application-development/3.connector/4.jdbc.md similarity index 98% rename from docs/docs-cn/application-development/dsl/connector/jdbc.md rename to docs/docs-cn/source/5.application-development/3.connector/4.jdbc.md index c066570ac..10d0b54bf 100644 --- a/docs/docs-cn/application-development/dsl/connector/jdbc.md +++ b/docs/docs-cn/source/5.application-development/3.connector/4.jdbc.md @@ -1,6 +1,6 @@ # JDBC Connector介绍 JDBC Connector由社区贡献,支持读和写。 -# 语法 +## 语法 ```sql CREATE TABLE jdbc_table ( @@ -16,7 +16,7 @@ CREATE TABLE jdbc_table ( geaflow.dsl.jdbc.table.name = 'source_table' ); ``` -# 参数 +## 参数 | 参数名 | 是否必须 | 描述 | | -------- |------|---------------------------------------------------| @@ -31,7 +31,7 @@ CREATE TABLE jdbc_table ( | geaflow.dsl.jdbc.partition.upperbound | 否 | The upperbound of JDBC partition, just used to decide the partition stride, not for filtering the rows in table. | -# 示例 +## 示例 ```sql set geaflow.dsl.jdbc.driver = 'org.h2.Driver'; diff --git a/docs/docs-cn/application-development/dsl/connector/hive.md b/docs/docs-cn/source/5.application-development/3.connector/5.hive.md similarity index 97% rename from docs/docs-cn/application-development/dsl/connector/hive.md rename to docs/docs-cn/source/5.application-development/3.connector/5.hive.md index 8df67d7bb..4509301e5 100644 --- a/docs/docs-cn/application-development/dsl/connector/hive.md +++ b/docs/docs-cn/source/5.application-development/3.connector/5.hive.md @@ -1,6 +1,6 @@ # Hive Connector介绍 GeaFlow 支持通过 Hive metastore 服务器读取 Hive 表中的数据。目前,我们支持 Hive 2.3.x系列版本。 -# 语法 +## 语法 ```sql CREATE TABLE hive_table ( @@ -14,7 +14,7 @@ CREATE TABLE hive_table ( geaflow.dsl.hive.metastore.uris = 'thrift://localhost:9083' ) ``` -# 参数 +## 参数 | 参数名 | 是否必须 | 描述 | | -------- | ---- | -------- | @@ -23,7 +23,7 @@ CREATE TABLE hive_table ( | geaflow.dsl.hive.metastore.uris | 是 | 连接Hive元数据metastore的uri列表 | | geaflow.dsl.hive.splits.per.partition | 否 | 每个Hive分片的逻辑分片数量,默认为1 | -# 示例 +## 示例 ```sql CREATE TABLE hive_table ( diff --git a/docs/docs-cn/application-development/dsl/connector/kafka.md b/docs/docs-cn/source/5.application-development/3.connector/6.kafka.md similarity index 97% rename from docs/docs-cn/application-development/dsl/connector/kafka.md rename to docs/docs-cn/source/5.application-development/3.connector/6.kafka.md index 0265d5441..ff65ae8ee 100644 --- a/docs/docs-cn/application-development/dsl/connector/kafka.md +++ b/docs/docs-cn/source/5.application-development/3.connector/6.kafka.md @@ -1,6 +1,6 @@ # Kafka Connector介绍 GeaFlow 支持从 Kafka 中读取数据,并向 Kafka 写入数据。目前支持的 Kafka 版本为 2.4.1。 -# 语法 +## 语法 ```sql CREATE TABLE kafka_table ( @@ -13,7 +13,7 @@ CREATE TABLE kafka_table ( geaflow.dsl.kafka.topic = 'test-topic' ) ``` -# 参数 +## 参数 | 参数名 | 是否必须 | 描述 | | -------- | -------- | -------- | @@ -22,7 +22,7 @@ CREATE TABLE kafka_table ( | geaflow.dsl.kafka.group.id | 否 | Kafka组(group id),默认是'default-group-id'.| -# 示例 +## 示例 ```sql CREATE TABLE kafka_source ( diff --git a/docs/docs-cn/application-development/dsl/connector/hbase.md b/docs/docs-cn/source/5.application-development/3.connector/7.hbase.md similarity index 97% rename from docs/docs-cn/application-development/dsl/connector/hbase.md rename to docs/docs-cn/source/5.application-development/3.connector/7.hbase.md index 5dfaeb4a7..95d928626 100644 --- a/docs/docs-cn/application-development/dsl/connector/hbase.md +++ b/docs/docs-cn/source/5.application-development/3.connector/7.hbase.md @@ -1,7 +1,7 @@ # Hbase Connector介绍 Hbase Connector由社区贡献,目前仅支持Sink。 -# 语法示例 +## 语法示例 ```sql CREATE TABLE hbase_table ( @@ -15,7 +15,7 @@ CREATE TABLE hbase_table ( geaflow.dsl.hbase.rowkey.column = 'id' ); ``` -# 参数 +## 参数 | 参数名 | 是否必须 | 描述 | | -------- | -------- | -------- | @@ -27,7 +27,7 @@ CREATE TABLE hbase_table ( | geaflow.dsl.hbase.familyname.mapping | 否 | HBase column family name mapping. | | geaflow.dsl.hbase.buffersize | 否 | HBase writer buffer size. | -# 示例 +## 示例 ```sql CREATE TABLE file_source ( diff --git a/docs/docs-cn/application-development/dsl/connector/hudi.md b/docs/docs-cn/source/5.application-development/3.connector/8.hudi.md similarity index 97% rename from docs/docs-cn/application-development/dsl/connector/hudi.md rename to docs/docs-cn/source/5.application-development/3.connector/8.hudi.md index 5263c305d..bc5cd2a58 100644 --- a/docs/docs-cn/application-development/dsl/connector/hudi.md +++ b/docs/docs-cn/source/5.application-development/3.connector/8.hudi.md @@ -1,6 +1,6 @@ # Hudi Connector介绍 GeaFlow Hudi 目前支持从文件中读取数据。 -# 语法 +## 语法 ```sql CREATE TABLE IF NOT EXISTS hudi_person ( @@ -12,14 +12,14 @@ CREATE TABLE IF NOT EXISTS hudi_person ( geaflow.dsl.file.path='/path/to/hudi_person' ); ``` -# 参数 +## 参数 | 参数名 | 是否必须 | 描述 | | -------- | -------- | -------- | | geaflow.dsl.file.path | 是 | 读取或写入的文件或文件夹的路径 | | geaflow.file.persistent.config.json | 否 | JSON格式的DFS配置,会覆盖系统环境配置。 | -# 示例 +## 示例 ```sql set geaflow.dsl.window.size = -1; diff --git a/docs/docs-cn/application-development/dsl/connector/pulsar.md b/docs/docs-cn/source/5.application-development/3.connector/9.pulsar.md similarity index 98% rename from docs/docs-cn/application-development/dsl/connector/pulsar.md rename to docs/docs-cn/source/5.application-development/3.connector/9.pulsar.md index e4fb0354b..e9489d788 100644 --- a/docs/docs-cn/application-development/dsl/connector/pulsar.md +++ b/docs/docs-cn/source/5.application-development/3.connector/9.pulsar.md @@ -1,6 +1,6 @@ # Pulsar Connector介绍 GeaFlow 支持从 Pulsar 中读取数据,并向 Pulsar 写入数据。目前支持的 Pulsar 版本为 2.8.1。 -# 语法 +## 语法 ```sql CREATE TABLE pulsar_table ( id BIGINT, @@ -14,7 +14,7 @@ CREATE TABLE pulsar_table ( `geaflow.dsl.pulsar.subscriptionInitialPosition` = 'latest' ) ``` -# 参数 +## 参数 | 参数名 | 是否必须 | 描述 | | -------- | -------- | -------- | @@ -25,7 +25,7 @@ CREATE TABLE pulsar_table ( 注意:pulsar connector不能指定一个分区topic, 如果你要消费某个分区的消息,请选择分区topic的子topic名称。 -# 示例1 +## 示例1 示例1是从pulsar从`topic_read`中读取数据并且将数据写入`topic_write`中。 ```sql CREATE TABLE pulsar_source ( @@ -52,7 +52,7 @@ CREATE TABLE pulsar_sink ( INSERT INTO pulsar_sink SELECT * FROM pulsar_source; ``` -# 示例2 +## 示例2 同样我们也可以进行四度环路检测。 ```sql set geaflow.dsl.window.size = 1; diff --git a/docs/docs-cn/source/5.application-development/3.connector/index.rst b/docs/docs-cn/source/5.application-development/3.connector/index.rst new file mode 100644 index 000000000..5ca142767 --- /dev/null +++ b/docs/docs-cn/source/5.application-development/3.connector/index.rst @@ -0,0 +1,19 @@ +连接器(Connector) +==== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + 1.common.md + 2.file.md + 3.console.md + 4.jdbc.md + 5.hive.md + 6.kafka.md + 7.hbase.md + 8.hudi.md + 9.pulsar.md + 10.udc.md + diff --git a/docs/docs-cn/application-development/dsl/chat/chat_guide.md b/docs/docs-cn/source/5.application-development/4.chat_guide.md similarity index 99% rename from docs/docs-cn/application-development/dsl/chat/chat_guide.md rename to docs/docs-cn/source/5.application-development/4.chat_guide.md index f496303b6..62c8fb5b6 100644 --- a/docs/docs-cn/application-development/dsl/chat/chat_guide.md +++ b/docs/docs-cn/source/5.application-development/4.chat_guide.md @@ -1,4 +1,4 @@ -# GQL大模型语法提示手册 +# Text2GQL语法提示手册 本手册列出了GQL常见的语法点以及参考的提示方法,用户可参考提问示例让模型生成对应的语法的GQL语句。 | 语法 | 提问示例 | 结果 | diff --git a/docs/docs-cn/source/5.application-development/index.rst b/docs/docs-cn/source/5.application-development/index.rst new file mode 100644 index 000000000..b759d4c84 --- /dev/null +++ b/docs/docs-cn/source/5.application-development/index.rst @@ -0,0 +1,12 @@ +开发指南 +=========== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + 1.api/index.rst + 2.dsl/index.rst + 3.connector/index.rst + 4.chat_guide.md \ No newline at end of file diff --git a/docs/docs-cn/deploy/install_guide.md b/docs/docs-cn/source/7.deploy/1.install_guide.md similarity index 80% rename from docs/docs-cn/deploy/install_guide.md rename to docs/docs-cn/source/7.deploy/1.install_guide.md index f16760cf0..09cb9a71f 100644 --- a/docs/docs-cn/deploy/install_guide.md +++ b/docs/docs-cn/source/7.deploy/1.install_guide.md @@ -1,36 +1,7 @@ -# 安装指南 +# K8S集群部署 ## 准备K8S环境 -这里以minikube为例,单机模拟K8S集群。如果已有K8S集群可以直接使用,跳过该部分。 - - -下载安装minikube -``` -# arm架构 -curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-darwin-arm64 -sudo install minikube-darwin-arm64 /usr/local/bin/minikube - -# x86架构 -curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-darwin-amd64 -sudo install minikube-darwin-amd64 /usr/local/bin/minikube -``` - -启动minikube和dashboard -``` -# 以docker为driver启动minikube -minikube start --driver=docker --ports=32761:32761 —image-mirror-country='cn' -# 启动minikube dashboard,会自动在浏览器中打开dashboard页面,如未打开复制终端给出的dashboard地址到浏览器打开 -minikube dashboard -``` - -**注意:** -注意不要关闭dashboard所属的terminal进程,后续操作请另起终端进行,否则api server进程会退出 - -如果希望在本地minikube环境使用GeaFlow,请确保minikube正常启动,GeaFlow引擎镜像会自动构建到minikube环境,否则构建到本地Docker环境,需要手工自行push到镜像仓库使用。 -```shell -# confirm host、kubelet、apiserver is running -minikube status -``` +这里以minikube为例,单机模拟K8S集群。如果已有K8S集群可以直接使用,跳过该部分。安装minikube可以参考[安装minikube](6.install_minikube.md)章节。 创建geaflow服务账号,否则程序无权限新建k8s资源(只有第一次需要) ```shell @@ -126,14 +97,14 @@ geaflow-console:1.0 ## 注册登录 首位注册用户将默认被设置为管理员,以管理员身份登录,通过一键安装功能开始系统初始化。 -![welcome](../../static/img/install_welcome.png) +![welcome](../../../static/img/install_welcome.png) ## 一键安装 管理员首次进入GeaFlow系统,会触发一键安装流程,为系统做初始化准备工作。 ### 集群配置 配置GeaFlow作业的运行时集群,推荐使用Kubernates。本地模式下默认为本地的代理地址${your.host.name}:8000,请确保本地已经启动minikube并设置好代理地址。如果设置K8S集群地址,请确保集群地址的连通性正常。 -![install_cluster_config](../../static/img/install_cluster_config.png) +![install_cluster_config](../../../static/img/install_cluster_config.png) K8S集群模式添加以下配置 ``` @@ -151,7 +122,7 @@ K8S集群模式添加以下配置 配置GeaFlow作业运行时元数据的存储,运行时元数据包含了作业的Pipeline、Cycle、位点、异常等信息,推荐使用MySQL。 配置GeaFlow作业运行时HA元数据的存储,HA元数据包含Master、Driver、Container等主要组件信息,推荐使用Redis。 配置GeaFlow作业运行时指标数据存储,用于作业指标监控,推荐使用InfluxDB。 -![undefined](../../static/img/install_meta_config.png) +![undefined](../../../static/img/install_meta_config.png) 本地模式下,docker容器启动时会默认自动拉起MySQL、Redis和InfluxDB服务。 @@ -163,11 +134,11 @@ K8S集群模式添加以下配置 ### 数据存储配置 配置GeaFlow作业、图、表等数据的持久化存储,推荐使用HDFS。本地模式默认为容器内磁盘。 -![install_storage_config](../../static/img/install_storage_config.png) +![install_storage_config](../../../static/img/install_storage_config.png) ### 文件存储配置 配置GeaFlow引擎JAR、用户JAR文件的持久化存储,推荐使用HDFS。本地模式默认为容器内磁盘。 -![install_jar_config](../../static/img/install_jar_config.png) +![install_jar_config](../../../static/img/install_jar_config.png) 安装成功后,**管理员会自动切换到租户默认下的默认实例**,此时可以直接创建发布图计算任务。 @@ -180,26 +151,26 @@ GeaFlow控制台支持租户隔离,支持系统模式视角和租户模式视 ### 系统模式 用户以管理员身份登录后,会进入系统模式。此时可以进行系统的一键安装、系统管理等操作。 -![install_system_mode](../../static/img/install_system_mode.png) +![install_system_mode](../../../static/img/install_system_mode.png) 系统模式下,管理员可以管理集群、GeaFlow引擎版本、文件、用户、租户等信息。 ### 租户模式 用户正常登录后,会进入租户模式。此时可以进行图计算的研发和运维操作。 -![install_tenant_mode](../../static/img/install_tenant_mode.png) +![install_tenant_mode](../../../static/img/install_tenant_mode.png) 租户模式下,用户可以创建实例、图、表、图计算任务等研发资源,并可以发布图计算任务,提交图计算作业等操作。 ## 任务管理 添加图计算任务,使用SQL+GQL的方式描述图计算的业务逻辑。 -![install_task_manager](../../static/img/install_task_manager.png) +![install_task_manager](../../../static/img/install_task_manager.png) 任务创建后,点击发布,进入作业运维。 ## 作业运维 作业提交前,还可以调整默认生成的任务参数和集群参数,以方便对作业行为进行调整。 -![install_job_op](../../static/img/install_job_op.png) +![install_job_op](../../../static/img/install_job_op.png) 访问作业详情页面的其他标签页,查看作业的运行时、指标、容器、异常、日志等信息。 diff --git a/docs/docs-cn/quick_start_operator.md b/docs/docs-cn/source/7.deploy/2.quick_start_operator.md similarity index 95% rename from docs/docs-cn/quick_start_operator.md rename to docs/docs-cn/source/7.deploy/2.quick_start_operator.md index 7210f8fa0..1c5135e21 100644 --- a/docs/docs-cn/quick_start_operator.md +++ b/docs/docs-cn/source/7.deploy/2.quick_start_operator.md @@ -1,9 +1,7 @@ -# 开始上手(Geaflow Kubernetes Operator运行) +# K8S Operator部署 ## 准备工作 -1. 下载安装docker和minikube。参考文档:[安装minikube](deploy/install_minikube.md) - - +1. 下载安装docker和minikube。参考文档:[安装minikube](6.install_minikube.md) 2. 拉取GeaFlow镜像 ```shell @@ -66,20 +64,20 @@ docker images cd tugraph-analytics/geaflow-kubernetes-operator/ helm install geaflow-kubernetes-operator helm/geaflow-kubernetes-operator ``` -![img.png](../static/img/helm_install_operator.png) +![img.png](../../../static/img/helm_install_operator.png) 5. 在minikube dashboard中查看pod是否正常运行 -![img.png](../static/img/view_operator_pod.png) +![img.png](../../../static/img/view_operator_pod.png) 6. 将GeaFlow-Operator-Dashboard通过portforward代理到本地端口(默认为8089端口) 请将operator-pod-name替换为实际的operator pod名称。 ```shell kubectl port-forward ${operator-pod-name} 8089:8089 ``` -![img.png](../static/img/port_forward_operator.png) +![img.png](../../../static/img/port_forward_operator.png) 7. 浏览器访问localhost:8089即可打开operator集群页面 -![img.png](../static/img/operator_dashboard.png) +![img.png](../../../static/img/operator_dashboard.png) ## 通过Geaflow Kubernetes Operator提交作业 @@ -212,7 +210,7 @@ spec: #### 通过geaflow-kubernetes-operator-dashboard查看 浏览器访问http://localhost:8089/,即可打开集群页面查看集群下所有geaflowjob作业列表和详情。 -![img.png](../static/img/operator_dashboard_jobs.png) +![img.png](../../../static/img/operator_dashboard_jobs.png) #### 通过命令行查看 执行以下命令可以查看CR的状态 diff --git a/docs/docs-cn/dashboard.md b/docs/docs-cn/source/7.deploy/3.dashboard.md similarity index 68% rename from docs/docs-cn/dashboard.md rename to docs/docs-cn/source/7.deploy/3.dashboard.md index 9a57e066e..1d4cd8435 100644 --- a/docs/docs-cn/dashboard.md +++ b/docs/docs-cn/source/7.deploy/3.dashboard.md @@ -1,4 +1,4 @@ -# GeaFlow Dashboard +# 作业Dashboard监控 ## 简介 Geaflow-dashboard为Geaflow提供作业级别的监控页面,可以轻松地通过dashboard查看作业的以下信息: * 作业的健康度(Container和Worker活跃度) @@ -13,12 +13,12 @@ Geaflow-dashboard为Geaflow提供作业级别的监控页面,可以轻松地 在本地或开发环境,也可以直接通过kubectl port-forward指令来直接映射到master pod的端口。 ### 以minikube为例 -1. 将作业部署到minikube中,部署作业方式参考[快速上手](quick_start.md)。 +1. 将作业部署到minikube中,部署作业方式参考[快速上手](../3.quick_start/1.quick_start.md)。 2. 打开minikube-dashboard,找到master的pod名称(或者在终端中输入以下命令获取)。 ```shell kubectl get pods ``` -![kubectl_get_pods.png](../static/img/kubectl_get_pods.png) +![kubectl_get_pods.png](../../../static/img/kubectl_get_pods.png) 3. 打开终端,输入以下命令,即可将pod容器内的8090端口映射到localhost的本机8090端口。 请将 **${your-master-pod-name}** 替换为你自己的pod名称 ```shell @@ -31,18 +31,18 @@ kubectl port-forward ${your-master-pod-name} 8090:8090 Overview页面会展示整个作业的健康状态。你可以在这里查看container和driver是否都在正常运行。 除此之外,Overview页面也会展示作业的Pipeline列表。 -![dashboard_overview.png](../static/img/dashboard_overview.png) +![dashboard_overview.png](../../../static/img/dashboard_overview.png) ### Pipeline列表 也可以通过侧边栏的Pipeline菜单进入页面。页面包括作业的每一项Pipeline的名称、开始时间和耗时。 耗时为0表示该Pipeline已开始执行,但尚未完成。 -![dashboard_pipelines.png](../static/img/dashboard_pipelines.png) +![dashboard_pipelines.png](../../../static/img/dashboard_pipelines.png) ### Cycle列表 点击Pipeline名称可以进入二级菜单,查看当前Pipeline下所有的Cycle列表的各项信息。 -![dashboard_cycles.png](../static/img/dashboard_cycles.png) +![dashboard_cycles.png](../../../static/img/dashboard_cycles.png) ### 作业组件详情 可以查看作业的各个组件(包括master、driver、container)的各项信息。 @@ -50,38 +50,38 @@ Overview页面会展示整个作业的健康状态。你可以在这里查看con 其中Driver详情展示所有driver的基础信息。Container详情展示所有Container的基础信息。 -![dashboard_containers.png](../static/img/dashboard_containers.png) -![dashboard_drivers.png](../static/img/dashboard_drivers.png) +![dashboard_containers.png](../../../static/img/dashboard_containers.png) +![dashboard_drivers.png](../../../static/img/dashboard_drivers.png) ### 组件运行时详情 通过点击左边栏的Master详情,或者通过点击Driver/Container详情中的组件名称,可以跳转到组件的运行时页面。 在运行时页面中,可以查看和操作以下内容。 * 查看容器的进程指标 -![dashboard_runtime_metrics.png](../static/img/dashboard_runtime_metrics.png) +![dashboard_runtime_metrics.png](../../../static/img/dashboard_runtime_metrics.png) * 查看实时的日志。这里以master为例介绍其中的日志文件。 * master.log:Master的java主进程日志。 * master.log.1 / master.log.2:Master的java主进程日志备份。 * agent.log:Master的agent服务日志。 * geaflow.log:进入容器后的shell启动脚本日志。 -![dashboard_runtime_logs.png](../static/img/dashboard_runtime_logs.png) -![dashboard_runtime_log_content.png](../static/img/dashboard_runtime_log_content.png) +![dashboard_runtime_logs.png](../../../static/img/dashboard_runtime_logs.png) +![dashboard_runtime_log_content.png](../../../static/img/dashboard_runtime_log_content.png) * 对进程进行CPU/ALLOC分析,生成火焰图。 火焰图分析类型可选择CPU或ALLOC,单次最多分析60秒,最多保留10份历史记录。 -![dashboard_runtime_profiler_execute.png](../static/img/dashboard_runtime_profiler_execute.png) -![dashboard_runtime_profiler_history.png](../static/img/dashboard_runtime_profiler_history.png) -![dashboard_runtime_profiler_content.png](../static/img/dashboard_runtime_profiler_content.png) +![dashboard_runtime_profiler_execute.png](../../../static/img/dashboard_runtime_profiler_execute.png) +![dashboard_runtime_profiler_history.png](../../../static/img/dashboard_runtime_profiler_history.png) +![dashboard_runtime_profiler_content.png](../../../static/img/dashboard_runtime_profiler_content.png) * 对进程进行Thread Dump。保留最新一次dump的结果。 -![dashboard_runtime_thread_dump.png](../static/img/dashboard_runtime_thread_dump.png) -![dashboard_runtime_thread_dump_execute.png](../static/img/dashboard_runtime_thread_dump_execute.png) +![dashboard_runtime_thread_dump.png](../../../static/img/dashboard_runtime_thread_dump.png) +![dashboard_runtime_thread_dump_execute.png](../../../static/img/dashboard_runtime_thread_dump_execute.png) * 查看进程的所有配置项(仅master拥有此页面) -![dashboard_runtime_master_configuration.png](../static/img/dashboard_runtime_master_configuration.png) +![dashboard_runtime_master_configuration.png](../../../static/img/dashboard_runtime_master_configuration.png) ## 其他功能 @@ -92,9 +92,9 @@ Overview页面会展示整个作业的健康状态。你可以在这里查看con 重置时,点击“重置”按钮,列表会重新刷新。 -![dashboard_table_search.png](../static/img/dashboard_table_search.png) +![dashboard_table_search.png](../../../static/img/dashboard_table_search.png) ### 本地化 页面支持中英文切换,点击右上角的“文A”图标,即可选择语言。 -![dashboard_locale.png](../static/img/dashboard_locale.png) \ No newline at end of file +![dashboard_locale.png](../../../static/img/dashboard_locale.png) \ No newline at end of file diff --git a/docs/docs-cn/visualization/collaborate_with_g6vp.md b/docs/docs-cn/source/7.deploy/4.collaborate_with_g6vp.md similarity index 96% rename from docs/docs-cn/visualization/collaborate_with_g6vp.md rename to docs/docs-cn/source/7.deploy/4.collaborate_with_g6vp.md index 12df6defb..8337e5c9c 100644 --- a/docs/docs-cn/visualization/collaborate_with_g6vp.md +++ b/docs/docs-cn/source/7.deploy/4.collaborate_with_g6vp.md @@ -1,4 +1,4 @@ -# 🌈 [G6VP](https://github.com/antvis/g6vp) 现在支持与 Tugraph 协作实现流图作业可视化了! +# 🌈 [G6VP](https://github.com/antvis/g6vp) 图可视化 ## 仅需 5 步,即可呈现 🎊 diff --git a/docs/docs-cn/deploy/install_llm.md b/docs/docs-cn/source/7.deploy/5.install_llm.md similarity index 92% rename from docs/docs-cn/deploy/install_llm.md rename to docs/docs-cn/source/7.deploy/5.install_llm.md index d65fd28f7..2e0662bcd 100644 --- a/docs/docs-cn/deploy/install_llm.md +++ b/docs/docs-cn/source/7.deploy/5.install_llm.md @@ -1,9 +1,11 @@ +# LLM本地部署 + 用户可以在本地完成大模型的服务化部署,以下步骤描述了从下载预训练模型,服务化部署和调试的整个过程。用户机器确保已安装Docker,可访问大模型存储库。 ## 步骤1:下载大模型文件 我们已将预训练好的大模型文件,上传至[Hugging Face仓库](https://huggingface.co/tugraph/CodeLlama-7b-GQL-hf),下载模型文件后解压到本地。下载完成后,解压模型文件到指定的本地目录,如 /home/huggingface。 -![hugging](../../static/img/llm_hugging_face.png) +![hugging](../../../static/img/llm_hugging_face.png) ## 步骤2:准备 Docker 容器环境 1. 在终端运行以下命令下载模型服务化所需的 Docker 镜像: @@ -40,7 +42,7 @@ cd /opt/llama_cpp python3 ./convert.py ${容器模型路径} ``` 执行完成后, 会在容器模型路径下生成前缀为ggml-model的文件 -![undefined](../../static/img/llm_ggml_model.png) +![undefined](../../../static/img/llm_ggml_model.png) 2. 模型量化(可选) 以llam2-7B模型为例:通过convert.py默认转换后的模型精度为F16,模型大小为13.0GB。如果当前机器资源无法满足这么大的模型推理,可通过./quantize对转换后的模型进一步量化。 @@ -51,7 +53,7 @@ cd /opt/llama_cpp ./quantize 默认生成的F16模型路径 量化后模型路径 q4_0 ``` 以下是量化后模型的大小和推理速度等参考指标: -![undefined](../../static/img/llm_quantization_table.png) +![undefined](../../../static/img/llm_quantization_table.png) 3. 模型服务化 通过以下命令,将上述生成的模型进行服务化的部署,通过参数指定服务绑定的地址和端口: @@ -76,5 +78,5 @@ curl --request POST \ --data '{"prompt": "请返回小红的10个年龄大于20的朋友","n_predict": 128}' ``` 如下是服务化部署后的模型推理结果: -![undefined](../../static/img/llm_chat_result.png) +![undefined](../../../static/img/llm_chat_result.png) \ No newline at end of file diff --git a/docs/docs-cn/deploy/install_minikube.md b/docs/docs-cn/source/7.deploy/6.install_minikube.md similarity index 100% rename from docs/docs-cn/deploy/install_minikube.md rename to docs/docs-cn/source/7.deploy/6.install_minikube.md diff --git a/docs/docs-cn/source/7.deploy/index.rst b/docs/docs-cn/source/7.deploy/index.rst new file mode 100644 index 000000000..50734e71d --- /dev/null +++ b/docs/docs-cn/source/7.deploy/index.rst @@ -0,0 +1,9 @@ +部署 +==== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + * \ No newline at end of file diff --git a/docs/docs-cn/contribution.md b/docs/docs-cn/source/9.contribution.md similarity index 89% rename from docs/docs-cn/contribution.md rename to docs/docs-cn/source/9.contribution.md index b5b53b5d3..3791531e1 100644 --- a/docs/docs-cn/contribution.md +++ b/docs/docs-cn/source/9.contribution.md @@ -26,4 +26,6 @@ ## 首次贡献 -GeaFlow项目ISSUE里面提供了一下简单的issue方便快速上手参与社区贡献,这些ISSUE被标签为 **good first issue** ,您可以选择感兴趣的问题参与贡献。 \ No newline at end of file +GeaFlow项目ISSUE里面提供了一下简单的issue方便快速上手参与社区贡献,这些ISSUE被标签为 **good first issue** ,您可以选择感兴趣的问题参与贡献。 + +如果您对GeaFlow感兴趣,欢迎给我们项目一颗 [⭐](https://github.com/TuGraph-family/tugraph-analytics) 。 \ No newline at end of file diff --git a/docs/docs-cn/source/conf.py b/docs/docs-cn/source/conf.py new file mode 100644 index 000000000..dd245d844 --- /dev/null +++ b/docs/docs-cn/source/conf.py @@ -0,0 +1,31 @@ +# Configuration file for the Sphinx documentation builder. +# +# For the full list of built-in configuration values, see the documentation: +# https://www.sphinx-doc.org/en/master/usage/configuration.html + +# -- Project information ----------------------------------------------------- + +import os, subprocess, sys, shlex +project = 'TuGraph' +copyright = '2023, Ant Group' +author = 'Ant Group' + +# -- General configuration --------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration + +extensions = ['myst_parser', + 'sphinx_panels', + 'sphinx.ext.autodoc', + 'sphinx.ext.napoleon', + 'sphinx.ext.viewcode'] + +# templates_path = ['../../_templates'] +exclude_patterns = [] + + +# -- Options for HTML output ------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output + +html_theme = 'sphinx_rtd_theme' + +read_the_docs_build = os.environ.get('READTHEDOCS', None) == 'True' diff --git a/docs/docs-cn/source/contacts.md b/docs/docs-cn/source/contacts.md new file mode 100644 index 000000000..099d5a838 --- /dev/null +++ b/docs/docs-cn/source/contacts.md @@ -0,0 +1,4 @@ +# 联系我们 +您可以通过以下方式联系我们。 + +![contacts](../../static/img/contacts.png) \ No newline at end of file diff --git a/docs/docs-cn/source/index.rst b/docs/docs-cn/source/index.rst new file mode 100644 index 000000000..6fa3b54f7 --- /dev/null +++ b/docs/docs-cn/source/index.rst @@ -0,0 +1,18 @@ +geaflow +===================== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + 1.guide_cn.md + 2.introduction.md + 3.quick_start/index.rst + 4.concepts/index.rst + 5.application-development/index.rst + 7.deploy/index.rst + 9.contribution.md + contacts.md + thanks.md + reference/index.rst diff --git a/docs/docs-cn/source/reference/index.rst b/docs/docs-cn/source/reference/index.rst new file mode 100644 index 000000000..53d563675 --- /dev/null +++ b/docs/docs-cn/source/reference/index.rst @@ -0,0 +1,9 @@ +参考资料 +==== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + * \ No newline at end of file diff --git a/docs/docs-cn/principle/vs_join.md b/docs/docs-cn/source/reference/vs_join.md similarity index 95% rename from docs/docs-cn/principle/vs_join.md rename to docs/docs-cn/source/reference/vs_join.md index e5fbecd12..9eeaad843 100644 --- a/docs/docs-cn/principle/vs_join.md +++ b/docs/docs-cn/source/reference/vs_join.md @@ -13,7 +13,7 @@ 无论是在批或流的计算系统中,Join操作都涉及大量shuffle和计算开销。同时,Join产生的中间结果由于关联会放大多份,造成数据量指数级膨胀和冗余,存储消耗大。 在下图的实验中,我们模拟了依次执行一跳、两跳和三跳关系运算的场景。足以见得,越是复杂的多跳关系计算,关系模型中Join的性能表现越差。在总时间对比中,利用图的Match计算能够节约超过90%的耗时。 -![total_time](../../static/img/vs_join_total_time_cn.jpg) +![total_time](../../../static/img/vs_join_total_time_cn.jpg)
图1
### 痛点二:数据冗余,时效性低 @@ -32,7 +32,7 @@ 相比宽表的关系物化方式,由于图结构本身的点边聚合性,构图表现得十分节约。 下图是GeaFlow中高性能构图的表现,可见构图操作本身极为迅速,且由于图可以分片的特性,具有十分良好的可扩展性。 -![insert_throuput](../../static/img/insert_throuput_cn.jpg) +![insert_throuput](../../../static/img/insert_throuput_cn.jpg)
图2
在图一的实验中也可以发现,实质上我们用少量的插入图(青色的insert to graph部分开销)耗时,换取了图建模方式对之后关联查询的加速效果。 @@ -44,7 +44,7 @@ GeaFlow提供融合GQL和SQL样式的查询语言,这是一种图表一体的数据分析语言,继承自标准SQL+ISO/GQL,可以方便进行图表分析。 -![code_style](../../static/img/code_style.jpg) +![code_style](../../../static/img/code_style.jpg)
图3
**在融合DSL中,图计算的结果与表查询等价,都可以像表数据一样做关系运算处理。**这意味着图3中GQL和SQL两种描述都可以达到类似的效果,极大灵活了用户的查询表达能力。 @@ -58,7 +58,7 @@ GeaFlow(品牌名TuGraph-Analytics)是蚂蚁集团开源的分布式流式图计 TuGraph-Analytics已经于2023年6月正式对外开源,开放其以图为数据模型的流批一体计算核心能力。相比传统的流式计算引擎,如Flink、Storm这些以表为模型的实时处理系统,GeaFlow以自研图存储为底座,流批一体计算引擎为矛,融合GQL/SQL DSL语言为旗帜,在复杂多度的关系运算上具备极大的优势。 -![insert_throuput](../../static/img/query_throuput_cn.jpg) +![insert_throuput](../../../static/img/query_throuput_cn.jpg)
图4
图4展示了GeaFlow使用Match算子在图上进行多跳关联查询,相比Flink的Join算子带来的实时吞吐提升。在复杂多跳场景下,现有的流式计算引擎已经基本不能胜任实时处理。而图模型的存在,则突破这一限制,扩展了实时流计算的应用场景。 \ No newline at end of file diff --git a/docs/docs-cn/source/thanks.md b/docs/docs-cn/source/thanks.md new file mode 100644 index 000000000..f2b0592f9 --- /dev/null +++ b/docs/docs-cn/source/thanks.md @@ -0,0 +1,2 @@ +# 致谢 +GeaFlow开发过程中部分模块参考了一些业界优秀的开源项目,包括Apache Flink、Apache Spark以及Apache Calcite等, 这里表示特别的感谢。 \ No newline at end of file diff --git a/docs/docs-en/.readthedocs.yaml b/docs/docs-en/.readthedocs.yaml new file mode 100644 index 000000000..150ebe06c --- /dev/null +++ b/docs/docs-en/.readthedocs.yaml @@ -0,0 +1,30 @@ +# .readthedocs.yaml +# Read the Docs configuration file +# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details + +# Required +version: 2 + +# Set the version of Python and other tools you might need +build: + os: ubuntu-20.04 + tools: + python: "3.6" + # You can also specify other tool versions: + # nodejs: "19" + # rust: "1.64" + # golang: "1.19" + +# Build documentation in the docs/ directory with Sphinx +sphinx: + builder: html + configuration: docs/docs-en/source/conf.py + +# If using Sphinx, optionally build your docs in additional formats such as PDF +# formats: +# - pdf + +# Optionally declare the Python requirements required to build your docs +python: + install: + - requirements: docs/requirements.txt \ No newline at end of file diff --git a/docs/docs-en/Makefile b/docs/docs-en/Makefile new file mode 100644 index 000000000..d0c3cbf10 --- /dev/null +++ b/docs/docs-en/Makefile @@ -0,0 +1,20 @@ +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line, and also +# from the environment for the first two. +SPHINXOPTS ?= +SPHINXBUILD ?= sphinx-build +SOURCEDIR = source +BUILDDIR = build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/docs/docs-en/application-development/api/guid.md b/docs/docs-en/application-development/api/guid.md deleted file mode 100644 index 5731cc577..000000000 --- a/docs/docs-en/application-development/api/guid.md +++ /dev/null @@ -1,8 +0,0 @@ -* [API Overview](overview.md) -* Stream API - * [Source API](stream/source.md) - * [Process API](stream/process.md) - * [Sink API](stream/sink.md) -* Graph API - * [Compute API](graph/compute.md) - * [Traversal API](graph/traversal.md) \ No newline at end of file diff --git a/docs/docs-en/concepts/glossary.md b/docs/docs-en/concepts/glossary.md deleted file mode 100644 index 519371e5b..000000000 --- a/docs/docs-en/concepts/glossary.md +++ /dev/null @@ -1,15 +0,0 @@ -**K8S**:k8s is short for Kubernetes, which is an open-source container orchestration platform that provides automated deployment, scaling, and management of containerized applications. It can run on various cloud platforms, physical servers, and virtual machines, and supports multiple container runtimes, enabling high availability, load balancing, automatic scaling, and automatic repair, and other functions. - -**Graph Processing**: Graph Processing is a computing model used to solve computational problems related to graph data structures. The graph computing model can be applied to solve many real-world problems, such as social network analysis, network traffic analysis, medical diagnosis, and more. - -**ISO-GQL**:GQL is a standard query language for property graphs, which stands for "Graph Query Language", and is an ISO/IEC international standard database language. In addition to supporting the Gremlin query language, GeaFlow also supports GQL. This means that GeaFlow users can use the GQL language to query and analyze their graph data, thereby enhancing their graph data processing capabilities. - -**Cycle**: The GeaFlow Scheduler is a core data structure in the scheduling model. A cycle is described as a basic unit that can be executed repeatedly, and it includes a description of input, intermediate calculations, data exchange, and output. It is generated by the vertex groups in the execution plan and supports nesting. - -**Event**: The core data structure for the interaction between scheduling and computation at the Runtime layer is the Scheduler. The Scheduler constructs a state machine from a series of event sets and distributes it to workers for computation and execution. Some of these events are executable, meaning they have their own computational semantics, and the entire scheduling and computation process is executed asynchronously. - -**Graph Traversal** : Graph Traversal refers to the process of traversing all nodes or some nodes in a graph data structure, visiting all nodes in a specific order, mainly using depth-first search (DFS) and breadth-first search (BFS). It is used to solve many problems, including finding the shortest path between two nodes, detecting cycles in a graph, and so on. - -**Graph State**: GraphState is used to store the graph data or intermediate results of graph iteration calculations in Geaflow. It provides exactly-once semantics and the ability to reuse jobs at the job level. GraphState can be divided into two types: Static and Dynamic. Static GraphState views the entire graph as a complete entity, and all operations are performed on a complete graph. Dynamic GraphState assumes that the graph is dynamically changing and is composed of time slices, and all slices make up a complete graph, and all operations are performed on the slices. - -**Key State**: KeyState is used to store intermediate results during the calculation process and is generally used for stream processing, such as recording intermediate aggregation results in KeyState when performing aggregation. Similar to GraphState, Geaflow regularly persists KeyState, so KeyState also provides exactly-once semantics. Depending on the data result, KeyState can be divided into KeyValueState, KeyListState, KeyMapState, and so on. \ No newline at end of file diff --git a/docs/docs-en/make.bat b/docs/docs-en/make.bat new file mode 100644 index 000000000..7a1131f7b --- /dev/null +++ b/docs/docs-en/make.bat @@ -0,0 +1,36 @@ +@ECHO OFF + +pushd %~dp0 + +REM Command file for Sphinx documentation + +if "%SPHINXBUILD%" == "" ( + set SPHINXBUILD=sphinx-build +) +set SOURCEDIR=source +set BUILDDIR=build +set SPHINXOPTS=-W + +%SPHINXBUILD% >NUL 2>NUL +if errorlevel 9009 ( + echo. + echo.The 'sphinx-build' command was not found. Make sure you have Sphinx + echo.installed, then set the SPHINXBUILD environment variable to point + echo.to the full path of the 'sphinx-build' executable. Alternatively you + echo.may add the Sphinx directory to PATH. + echo. + echo.If you don't have Sphinx installed, grab it from + echo.https://www.sphinx-doc.org/ + exit /b 1 +) + +if "%1" == "" goto help + +%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% +goto end + +:help +%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% + +:end +popd diff --git a/docs/docs-en/source/1.guide.md b/docs/docs-en/source/1.guide.md new file mode 100644 index 000000000..a3a6b97f0 --- /dev/null +++ b/docs/docs-en/source/1.guide.md @@ -0,0 +1,40 @@ +# Guide +Here is the documentation map to help users quickly learn and use geaflow. + +## Introduction +**TuGraph Analytics** (alias: GeaFlow) is the [**fastest**](https://ldbcouncil.org/benchmarks/snb-bi/) open-source OLAP graph database developed by Ant Group. It supports core capabilities such as trillion-level graph storage, hybrid graph and table processing, real-time graph computation, and interactive graph analysis. Currently, it is widely used in scenarios such as data warehousing acceleration, financial risk control, knowledge graph, and social networks. + +For more information about GeaFlow: [GeaFlow Introduction](2.introduction.md) + +For GeaFlow design paper: [GeaFlow: A Graph Extended and Accelerated Dataflow System](https://dl.acm.org/doi/abs/10.1145/3589771) + +## Quick Start + +1. Prepare Git、JDK8、Maven、Docker environment。 +2. Download Code:`git clone https://github.com/TuGraph-family/tugraph-analytics` +3. Build Project:`mvn clean install -DskipTests` +4. Test Job:`./bin/gql_submit.sh --gql geaflow/geaflow-examples/gql/loop_detection.sql` +3. Build Image:`./build.sh --all` +4. Start Container:`docker run -d --name geaflow-console -p 8888:8888 geaflow-console:0.1` + +For more details:[Quick Start](3.quick_start/1.quick_start.md)。 + +## Development Manual + +GeaFlow supports two sets of programming interfaces: DSL and API. You can develop streaming graph computing jobs using GeaFlow's SQL extension language SQL+ISO/GQL or use GeaFlow's high-level API programming interface to develop applications in Java. +* DSL application development: [DSL Application Development](5.application-development/2.dsl/1.overview.md) +* API application development: [API Application Development](5.application-development/1.api/1.overview.md) + +## Real-time Capabilities + +Compared with traditional stream processing engines such as Flink and Storm, which use tables as their data model for real-time processing, GeaFlow's graph-based data model has significant performance advantages when handling join relationship operations, especially complex multi-hops relationship operations like those involving 3 or more hops of join and complex loop searches. + +[![total_time](../../static/img/vs_join_total_time_en.jpg)](reference/vs_join.md) + + +## Partners +| | | | +|------------------|------------------|------------------| +| [![HUST](../../static/img/partners/hust.png)](https://github.com/CGCL-codes/YiTu) | [![FU](../../static/img/partners/fu.png)](http://kw.fudan.edu.cn/) | ![ZJU](../../static/img/partners/zju.png) | +| [![WhaleOps](../../static/img/partners/whaleops.png)](http://www.whaleops.com/) | [![OceanBase](../../static/img/partners/oceanbase.png)](https://github.com/oceanbase/oceanbase) | [![SecretFlow](../../static/img/partners/secretflow.png)](https://github.com/secretflow/secretflow) | + diff --git a/docs/docs-en/introduction.md b/docs/docs-en/source/2.introduction.md similarity index 84% rename from docs/docs-en/introduction.md rename to docs/docs-en/source/2.introduction.md index af308148f..5ec7c550e 100644 --- a/docs/docs-en/introduction.md +++ b/docs/docs-en/source/2.introduction.md @@ -1,6 +1,6 @@ # Introduction -## Background introduction +## Background Introduction Early big data processing mainly relied on offline processing, with technologies like Hadoop effectively solving the problem of analyzing large-scale data. However, processing efficiency was inadequate for high real-time demand scenarios. The emergence of stream computing engines, represented by Storm, effectively addressed the issue of real-time data processing, improving processing efficiency. However, Storm itself does not provide state management capabilities and is powerless in handling stateful computations such as aggregation. The emergence of Flink effectively addressed this shortcoming by introducing state management and checkpoint mechanisms, achieving efficient stateful stream computing capabilities. @@ -14,12 +14,12 @@ are a large number of join operations, and how to improve the efficiency and per The overall architecture of GeaFlow is as follows: -![geaflow_arch](../static/img/geaflow_arch_new.png) +![geaflow_arch](../../static/img/geaflow_arch_new.png) -* [DSL Layer](./principle/dsl_principle.md): GeaFlow has designed a fusion analysis language, SQL+GQL, which supports unified processing of table models and graph models. -* [Framework Layer](./principle/framework_principle.md): GeaFlow has designed two sets of APIs for graph and stream, supporting the fusion computation of streaming, batch, and graph processing. It has also implemented a unified distributed scheduling model based on Cycle. -* [State Layer](./principle/state_principle.md): GeaFlow has designed two sets of APIs for graph and key-value storage, supporting the mixed storage of table data and graph data. The overall design follows the Sharing Nothing principle and supports the persistence of data to remote storage. -* [Console Platform](./principle/console_principle.md): GeaFlow provides an all-in-one graph development platform, implementing the capabilities of graph data modeling, processing, and analysis. It also provides operational and control support for graph tasks. +* [DSL Layer](4.concepts/2.dsl_principle.md): GeaFlow has designed a fusion analysis language, SQL+GQL, which supports unified processing of table models and graph models. +* [Framework Layer](4.concepts/3.framework_principle.md): GeaFlow has designed two sets of APIs for graph and stream, supporting the fusion computation of streaming, batch, and graph processing. It has also implemented a unified distributed scheduling model based on Cycle. +* [State Layer](4.concepts/4.state_principle.md): GeaFlow has designed two sets of APIs for graph and key-value storage, supporting the mixed storage of table data and graph data. The overall design follows the Sharing Nothing principle and supports the persistence of data to remote storage. +* [Console Platform](4.concepts/5.console_principle.md): GeaFlow provides an all-in-one graph development platform, implementing the capabilities of graph data modeling, processing, and analysis. It also provides operational and control support for graph tasks. * **Execution Environment**: GeaFlow can run in various heterogeneous execution environments, such as K8S, Ray, and local mode. ## Application Scenarios @@ -30,9 +30,9 @@ In data warehouse scenarios, there are a large number of join operations, and in ### Real-time Attribution Analysis Under the background of informationization, channel attribution and path analysis of user behavior are the core of traffic analysis. By calculating the effective behavior path of users in real-time, and constructing a complete conversion path, it can quickly help businesses understand the value of products and assist operations in adjusting their strategies in a timely manner. The core points of real-time attribution analysis are accuracy and effectiveness. Accuracy requires ensuring the accuracy of user behavior path analysis under controllable costs. Effectiveness requires high real-time calculation to quickly assist business decision-making. Based on the capabilities of the GeaFlow's streaming computing, accurate and timely attribution analysis can be achieved. The following figure shows how this is accomplished: -![attribution_analysis](../static/img/guiyin_analysis.png) +![attribution_analysis](../../static/img/guiyin_analysis.png) Firstly, GeaFlow converts the user behavior logs into a user behavior topology graph in real-time, with users as the vertex and every behavior related to them as an the edge towards the buried page. Then, GeaFlow analyzes the subgraph of user behavior in advance using its streaming computing capability, and based on the attribution path matching rule, matches and calculates the attribution path of the corresponding user for the transaction behavior, and outputs it to the downstream systems. ### Real-time Anti-Crash System In the context of credit risk management, detecting credit card cashing-out fraud is a typical risk management requirement. Based on analysis of existing cashing-out patterns, it can be seen that cashing-out is a loop subgraph. How to efficiently and quickly identify cashing-out in a large graph can greatly increase the efficiency of risk identification. Taking the following graph as an example, by transforming real-time transaction flows and transfer flows from input data sources into a real-time transaction graph, and then performing graph feature analysis on user transaction behavior based on risk management policies, such as loop checking and other feature calculations, real-time detection of cashing-out can be provided to decision-making and monitoring platforms. GeaFlow's real-time graph construction and calculation abilities can quickly identify abnormal transactional behaviors such as cashing-out, greatly reducing platform risk. -![real-anti-crash](../static/img/fantaoxian.png) \ No newline at end of file +![real-anti-crash](../../static/img/fantaoxian.png) \ No newline at end of file diff --git a/docs/docs-en/quick_start.md b/docs/docs-en/source/3.quick_start/1.quick_start.md similarity index 90% rename from docs/docs-en/quick_start.md rename to docs/docs-en/source/3.quick_start/1.quick_start.md index b96e900f8..19362f269 100644 --- a/docs/docs-en/quick_start.md +++ b/docs/docs-en/source/3.quick_start/1.quick_start.md @@ -121,7 +121,7 @@ bin/socket.sh ``` After the socket service is started, the following information is displayed on the console: -![socket_start](../static/img/socket_start.png) +![socket_start](../../../static/img/socket_start.png) 3. Input data @@ -146,7 +146,7 @@ The input data is as follows: the "." in front of the data represents a point da ``` We can see the calculated loop data displayed on the socket console: -![ide_socket_server](../static/img/ide_socket_server.png) +![ide_socket_server](../../../static/img/ide_socket_server.png) You can also continue to enter new point edge data to view the latest calculation results, such as entering the following data: @@ -155,29 +155,29 @@ You can also continue to enter new point edge data to view the latest calculatio ``` We can see that the new loop 3-4-5-6-3 is checked out: -![ide_socket_server_more](../static/img/ide_socket_server_more.png) +![ide_socket_server_more](../../../static/img/ide_socket_server_more.png) 4. Access the dashboard page The local mode will use the local 8090 and 8088 ports and comes with a dashboard page. Visit *http://localhost:8090* in the browser to access the front-end page. -![dashboard_overview](../static/img/dashboard_overview.png) +![dashboard_overview](../../../static/img/dashboard_overview.png) For more dashboard related content, please refer to the documentation: -[Dashboard](dashboard.md) +[Dashboard](../7.deploy/3.dashboard.md) ## Running in GeaFlow Console GeaFlow Console is a graph computing research and development platform provided by GeaFlow. In this document, we will introduce how to launch the GeaFlow Console platform in a Docker container and submit graph computing jobs. -Document address: [Running in Docker](quick_start_docker.md) +Document address: [Running in Docker](2.quick_start_docker.md) ## Running with GeaFlow Kubernetes Operator Geaflow Kubernetes Operator is a deployment tool that can quickly deploy Geaflow applications to kubernetes clusters. We will introduce how to install geaflow-kubernetes-operator through Helm and quickly submit geaflow jobs through yaml files, and in addition, how to visit the operator's dashboard page to view the job details in the cluster. -Document address: [Running By kubernetes operator](quick_start_operator.md) +Document address: [Running By kubernetes operator](../7.deploy/2.quick_start_operator.md) ## Visualization of flow graph computation jobs using G6VP -G6VP is an extensible visual analysis platform, including data source management, composition, personalized configuration of graphic elements, visual analysis and other functional modules. Using G6VP, it is easy to visualize the results of Geaflow calculations. Document address: [Document](visualization/collaborate_with_g6vp.md) \ No newline at end of file +G6VP is an extensible visual analysis platform, including data source management, composition, personalized configuration of graphic elements, visual analysis and other functional modules. Using G6VP, it is easy to visualize the results of Geaflow calculations. Document address: [Document](../7.deploy/4.collaborate_with_g6vp.md) \ No newline at end of file diff --git a/docs/docs-en/quick_start_docker.md b/docs/docs-en/source/3.quick_start/2.quick_start_docker.md similarity index 89% rename from docs/docs-en/quick_start_docker.md rename to docs/docs-en/source/3.quick_start/2.quick_start_docker.md index c1a94acf2..66fac2e78 100644 --- a/docs/docs-en/quick_start_docker.md +++ b/docs/docs-en/source/3.quick_start/2.quick_start_docker.md @@ -2,7 +2,7 @@ ## Prepare 1. Install Docker and adjust Docker service resource settings (Dashboard-Settings-Resources), then start Docker service: -![docker_pref](../static/img/docker_pref.png) +![docker_pref](../../../static/img/docker_pref.png) 2. Pull GeaFlow Image @@ -78,7 +78,7 @@ GeaflowApplication:61 - Started GeaflowApplication in 11.437 seconds (JVM runn The first registered user will be set as the default administrator. Log in as an administrator and use the one-click installation feature to start system initialization. -![install_welcome](../static/img/install_welcome_en.png) +![install_welcome](../../../static/img/install_welcome_en.png) 3. Config GeaFlow @@ -89,32 +89,32 @@ GeaFlow requires configuration of runtime environment settings during its initia Click "Next" and use the default Container mode, which is local container mode. -![install_container](../static/img/install_container_en.png) +![install_container](../../../static/img/install_container_en.png) 3.2 Runtime Config For local runtime mode, you can skip this step and use the default system settings by clicking "Next" directly. -![install_conainer_meta_config.png](../static/img/install_container_meta_config_en.png) +![install_conainer_meta_config.png](../../../static/img/install_container_meta_config_en.png) 3.3 Storage Config Select the graph data storage location. For local mode, select LOCAL and enter a local directory. If you don't need to fill it out, click "Next" directly. -![install_storage_config](../static/img/install_storage_config_en.png) +![install_storage_config](../../../static/img/install_storage_config_en.png) 3.4 File Config This configuration is for the persistent storage of GeaFlow engine JAR and user JAR files, such as in HDFS. For local runtime mode, it is the same as the data storage configuration, so select LOCAL mode and enter a local directory. If you don't need to fill it out, click "Next" directly. -![install_jar_config](../static/img/install_jar_config_en.png) +![install_jar_config](../../../static/img/install_jar_config_en.png) After completing the configuration, click the one-click installation button. After successful installation, the administrator will automatically switch to the default instance under the personal tenant and can directly create and publish graph computing tasks. 4. Create Job Go to the `DEVELOPMENT` page, select the `Jobs` tab on the left, click the "Add" button in the upper right corner, and create a new DSL job. -![create_job](../static/img/create_job_en.png) +![create_job](../../../static/img/create_job_en.png) Fill in the job name, task description, and DSL content. The DSL content is the same as described in the local runtime job section earlier, just modify the DSL and replace tbl_source and tbl_result tables with ${your.host.ip}. ```sql @@ -195,14 +195,14 @@ FROM ( ); ``` -![add_dsl_job](../static/img/add_dsl_job_en.png) +![add_dsl_job](../../../static/img/add_dsl_job_en.png) After creating the job, click the "Publish" button to publish the job. -![add_dsl_job](../static/img/job_list_en.png) +![add_dsl_job](../../../static/img/job_list_en.png) Then go to the Job Management page and click the "Submit" button to submit the taslk for execution. -![task_detail](../static/img/task_detail_en.png) +![task_detail](../../../static/img/task_detail_en.png) 5. Start the socket service and input data @@ -231,7 +231,7 @@ After the socket service is started, enter the vertex-edge data, and the calcula - 6,7,0.1 ``` -![ide_socket_server](../static/img/ide_socket_server.png) +![ide_socket_server](../../../static/img/ide_socket_server.png) ## K8S Deployment -GeaFlow supports K8S deployment. For details about the deployment mode, see the document: [K8S Deployment](deploy/install_guide.md). \ No newline at end of file +GeaFlow supports K8S deployment. For details about the deployment mode, see the document: [K8S Deployment](../7.deploy/1.install_guide.md). \ No newline at end of file diff --git a/docs/docs-en/source/3.quick_start/index.rst b/docs/docs-en/source/3.quick_start/index.rst new file mode 100644 index 000000000..45bcae1d4 --- /dev/null +++ b/docs/docs-en/source/3.quick_start/index.rst @@ -0,0 +1,10 @@ +Quick Start +==== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + 1.quick_start.md + 2.quick_start_docker.md \ No newline at end of file diff --git a/docs/docs-en/source/4.concepts/1.glossary.md b/docs/docs-en/source/4.concepts/1.glossary.md new file mode 100644 index 000000000..46696bfbf --- /dev/null +++ b/docs/docs-en/source/4.concepts/1.glossary.md @@ -0,0 +1,182 @@ +# Glossary + +**K8S**:k8s is short for Kubernetes, which is an open-source container orchestration platform that provides automated deployment, scaling, and management of containerized applications. It can run on various cloud platforms, physical servers, and virtual machines, and supports multiple container runtimes, enabling high availability, load balancing, automatic scaling, and automatic repair, and other functions. + +**Graph Processing**: Graph Processing is a computing model used to solve computational problems related to graph data structures. The graph computing model can be applied to solve many real-world problems, such as social network analysis, network traffic analysis, medical diagnosis, and more. + +**ISO-GQL**:GQL is a standard query language for property graphs, which stands for "Graph Query Language", and is an ISO/IEC international standard database language. In addition to supporting the Gremlin query language, GeaFlow also supports GQL. This means that GeaFlow users can use the GQL language to query and analyze their graph data, thereby enhancing their graph data processing capabilities. + +**Cycle**: The GeaFlow Scheduler is a core data structure in the scheduling model. A cycle is described as a basic unit that can be executed repeatedly, and it includes a description of input, intermediate calculations, data exchange, and output. It is generated by the vertex groups in the execution plan and supports nesting. + +**Event**: The core data structure for the interaction between scheduling and computation at the Runtime layer is the Scheduler. The Scheduler constructs a state machine from a series of event sets and distributes it to workers for computation and execution. Some of these events are executable, meaning they have their own computational semantics, and the entire scheduling and computation process is executed asynchronously. + +**Graph Traversal** : Graph Traversal refers to the process of traversing all nodes or some nodes in a graph data structure, visiting all nodes in a specific order, mainly using depth-first search (DFS) and breadth-first search (BFS). It is used to solve many problems, including finding the shortest path between two nodes, detecting cycles in a graph, and so on. + +**Graph State**: GraphState is used to store the graph data or intermediate results of graph iteration calculations in Geaflow. It provides exactly-once semantics and the ability to reuse jobs at the job level. GraphState can be divided into two types: Static and Dynamic. Static GraphState views the entire graph as a complete entity, and all operations are performed on a complete graph. Dynamic GraphState assumes that the graph is dynamically changing and is composed of time slices, and all slices make up a complete graph, and all operations are performed on the slices. + +**Key State**: KeyState is used to store intermediate results during the calculation process and is generally used for stream processing, such as recording intermediate aggregation results in KeyState when performing aggregation. Similar to GraphState, Geaflow regularly persists KeyState, so KeyState also provides exactly-once semantics. Depending on the data result, KeyState can be divided into KeyValueState, KeyListState, KeyMapState, and so on. + +## Graph View + +### Fundamental Conception + +GraphView is the critical core data abstraction in Geaflow, representing a virtual view based on graph structure. It is an abstraction of graph physical storage, which can represent the storage and operation of graph data on multiple nodes. In Geaflow, GraphView is a first-class citizen, and all user operations on the graph are based on GraphView. For example, distributing point and edge streams as GraphView incremental point/edge data sets, generating snapshots for the current GraphView, and triggering calculations based on snapshot graphs or dynamic GraphViews. + +### Functional Description + +GraphView has the following main functions: +* Graph manipulation:it can add or delete vertex and edge data, as well as perform queries and take snapshots based on a specific time slice. +* Graph storage: it can be stored in a graph database or other storage media (such as a file system, KV storage, wide-table storage, native graph, etc.). +* Graph partitioning: it also supports different graph partitioning methods. +* Graph computation: it can perform iterative traversal or computation on the graph. + +![graph_view|(4000x2500)](../../../static/img/graph_view.png) + +### Example Introduction + +Define a GraphView for a social network that describes interpersonal relationships. + +DSL Code +```SQL +CREATE GRAPH social_network ( + Vertex person ( + id int ID, + name varchar + ), + Edge knows ( + person1 int SOURCE ID, + person2 int DESTINATION ID, + weight int + ) +) WITH ( + storeType='rocksdb', + shardCount = 128 +); +``` + + +HLA Code +```java +//build graph view. +final String graphName = "social_network"; +GraphViewDesc graphViewDesc = GraphViewBuilder + .createGraphView(graphName) + .withShardNum(128) + .withBackend(BackendType.RocksDB) + .withSchema(new GraphMetaType(IntegerType.INSTANCE, ValueVertex.class, + String.class, ValueEdge.class, Integer.class)) + .build(); + +// bind the graphview with pipeline1 +pipeline.withView(graphName, graphViewDesc); +pipeline.submit(new PipelineTask()); + +``` + +## Stream Graph + +### Fundamental Conception + +The term "Streaming Graph" refers to graph data that is stream-based, dynamic, and constantly changing. Within the context of GeaFlow, Streaming Graph also refers to the computing mode for streaming graphs, Which is designed for graphs that undergo streaming changes, and performs operations such as graph traversal, graph matching, and graph computation based on graph changes. + + +Based on the GeaFlow framework, it is easy to perform dynamic computation on streaming graphs. In GeaFlow, we have abstracted two core concepts: Dynamic Graph and Static Graph. + +* Static Graph refers to a static graph, in which the nodes and edges are fixed at a certain point in time and do not change. Computation on Static Graph is based on the static structure of the entire graph, so conventional graph algorithms and processing can be used for computation. + +* Dynamic Graph refers to a dynamic graph, where nodes and edges are constantly changing. When the status of a node or edge changes, Dynamic Graph updates the graph structure promptly and performs computation on the new graph structure. In Dynamic Graph, nodes and edges can have dynamic attributes, which can also change with the graph. Computation on Dynamic Graph is based on the real-time structure and attributes of the graph, so special algorithms and processing are required for computation. + +GeaFlow provides various computation modes and algorithms based on Dynamic Graph and Static Graph to facilitate users' choices and usage based on different needs. At the same time, GeaFlow also supports custom algorithms and processing, so users can extend and optimize algorithms according to their own needs. + +### Functional Description + +Streaming Graph mainly has the following features: + +* Supports streaming processing of node and edge data, but the overall graph is static. +* Supports continuous updates and queries of the graph structure, and can handle incremental data processing caused by changes in the graph structure. +* Supports backtracking history and can be queried based on historical snapshots. +* Supports the calculation logic order of the graph, such as the time sequence of edges. + +Through real-time graph data flow and changes, Streaming Graph can dynamically implement graph calculations and analysis, and has a wide range of applications. For example, in the fields of social network analysis, financial risk control, and Internet of Things data analysis, Streaming Graph has broad applications prospects. + + +### Example Introduction + +When building a Streaming Graph, a new node and edge can be added to the graph continuously through an incremental data stream, thus dynamically constructing the graph. At the same time, for each incremental data graph construction completion, it can trigger traversal calculation tracking the evolving process of Bob's 2-degree friends over time. + +DSL code +```SQL + +set geaflow.dsl.window.size = 1; + +CREATE TABLE table_knows ( + personId int, + friendId int, + weight int +) WITH ( + type='file', + geaflow.dsl.file.path = 'resource:///data/table_knows.txt' +); + +INSERT INTO social_network.knows +SELECT personId, friendId, weight +FROM table_knows; + +CREATE TABLE result ( + personName varchar, + friendName varchar, + weight int +) WITH ( + type='console' +); + +-- Graph View Name Defined in Graph View Concept -- +USE GRAPH social_network; +-- find person id 3's known persons triggered every window. +INSERT INTO result +SELECT + name, + known_name, + weight +FROM ( + MATCH (a:person where a.name = 'Bob') -[e:knows]->{1, 2}(b) + RETURN a.name as name, b.name as known_name, e.weight as weight +) +``` + +HLA code + +```java +//build graph view. +final String graphName = "social_network"; +GraphViewDesc graphViewDesc = GraphViewBuilder.createGraphView(graphName).build(); +pipeline.withView(graphName, graphViewDesc); + +// submit pipeLine task. +pipeline.submit(new PipelineTask() { + @Override + public void execute(IPipelineTaskContext pipelineTaskCxt) { + + // build vertices streaming source. + PStreamSource> persons = + pipelineTaskCxt.buildSource( + new CollectionSource.(getVertices()), SizeTumblingWindow.of(5000)); + // build edges streaming source. + PStreamSource> knows = + pipelineTaskCxt.buildSource( + new CollectionSource<>(getEdges()), SizeTumblingWindow.of(5000)); + // build graphview by graph name. + PGraphView socialNetwork = + pipelineTaskCxt.buildGraphView(graphName); + // incremental build graph view. + PIncGraphView incSocialNetwor = + socialNetwork.appendGraph(vertices, edges); + + // traversal by 'Bob'. + incGraphView.incrementalTraversal(new IncGraphTraversalAlgorithms(2)) + .start('Bob') + .map(res -> String.format("%s,%s", res.getResponseId(), res.getResponse())) + .sink(new ConsoleSink<>()); + } +}); +``` \ No newline at end of file diff --git a/docs/docs-en/principle/dsl_principle.md b/docs/docs-en/source/4.concepts/2.dsl_principle.md similarity index 90% rename from docs/docs-en/principle/dsl_principle.md rename to docs/docs-en/source/4.concepts/2.dsl_principle.md index 9d7bf5166..938c8e047 100644 --- a/docs/docs-en/principle/dsl_principle.md +++ b/docs/docs-en/source/4.concepts/2.dsl_principle.md @@ -1,8 +1,10 @@ -# GeaFlow DSL Architecture +# DSL Principle Introduction + +## GeaFlow DSL Architecture The overall architecture of GeaFlow DSL is shown in the following figure: -![dsl_arch](../../static/img/dsl_arch_new.png) +![dsl_arch](../../../static/img/dsl_arch_new.png) DSL Layer is a typical compiler technology architecture, which consists of syntax analysis, semantic analysis, intermediate code generation (IR), code optimization, and target code generation (OBJ). @@ -15,12 +17,12 @@ DSL Layer is a typical compiler technology architecture, which consists of synta * **Custom Functions**: GeaFlow provides a wide range of built-in system functions, and users can also register custom functions as needed. * **Custom Plugins**: GeaFlow allows users to extend their own Connector types to support different data sources and data formats. -# Main Execution Flow of DSL +## Main Execution Flow of DSL The main execution flow of DSL is illustrated in the following figure: -![dsl_workflow](../../static/img/dsl_workflow.png) +![dsl_workflow](../../../static/img/dsl_workflow.png) The DSL text is first parsed by the Parser to generate the AST syntax tree, and then the Validator performs semantic checking and type inference to generate a validated AST syntax tree. The graph-logic execution plan is then generated by the Logical Plan transformer. The logical execution plan is optimized by the Optimizer to generate an optimized logical execution plan. The physical execution plan is then generated by the Physical Plan transformer, and the physical execution logic is generated by the DAG Builder. GeaFlow DSL uses a two-level DAG structure to describe the physical execution logic of the flowchart. -# Two-level DAG Physical Execution Plan +## Two-level DAG Physical Execution Plan Unlike traditional distributed table data processing engines such as Storm, Flink, and Spark, GeaFlow is a flowchart-integrated distributed computing system. Its physical execution plan uses a two-level DAG structure for the flowchart, as shown in the following figure: -![dsl_twice_level_dag](../../static/img/dsl_twice_level_dag.png) +![dsl_twice_level_dag](../../../static/img/dsl_twice_level_dag.png) The outer layer DAG contains operator for table processing and iterative operator for graph processing, which is the main part of the physical execution logic and links the computing logic of the flowchart. The inner DAG expands the graph computation logic through the DAG, representing the specific execution of graph iterative computation. diff --git a/docs/docs-en/principle/framework_principle.md b/docs/docs-en/source/4.concepts/3.framework_principle.md similarity index 95% rename from docs/docs-en/principle/framework_principle.md rename to docs/docs-en/source/4.concepts/3.framework_principle.md index 2fdd79213..771cfe06f 100644 --- a/docs/docs-en/principle/framework_principle.md +++ b/docs/docs-en/source/4.concepts/3.framework_principle.md @@ -2,7 +2,7 @@ The architecture of GeaFlow Framework is shown in the following diagram: -![framework_arch](../../static/img/framework_arch_new.png) +![framework_arch](../../../static/img/framework_arch_new.png) * **High Level API**: GeaFlow adapts to heterogeneous distributed execution environments (K8S, Ray, Local) through the Environment interface. It encapsulates the user's data processing flow using Pipelines and abstracts the stream processing (unbounded window) and batch processing (bounded window) using Windows. The Graph interface provides computation APIs on static graphs and dynamic graphs (streaming graphs), such as append/snapshot/compute/traversal, while the Stream interface provides unified stream and batch processing APIs, such as map/reduce/join/keyBy, etc. * **Logical Execution Plan**: The logical execution plan information is encapsulated in the PipelineGraph object. It organizes the high-level API operators in a directed acyclic graph. Operators are categorized into 5 types: SourceOperator for data source loading, OneInputOperator/TwoInputOperator for traditional data processing, and IteratorOperator for static/dynamic graph computation. The vertices (PipelineVertex) in the DAG store crucial information about operators, such as type, parallelism, and operator functions, while the edges (PipelineEdge) record key information about data shuffling, such as partition rules (forward/broadcast/key), and encoders/decoders. @@ -11,10 +11,10 @@ The architecture of GeaFlow Framework is shown in the following diagram: * **Runtime Components**: GeaFlow's runtime launches the Client, Master, Driver, and Container components. When the Client submits a Pipeline to the Driver, it triggers the construction of the execution plan, task allocation (resources are provided by ResourceManagement), and scheduler. Each Container can run multiple Worker components, and data exchange between different Worker components is done through the Shuffle module. All workers need to regularly send heartbeats (HeartbeatManagement) to the Master and report runtime metric information to the time-series database. Additionally, GeaFlow's runtime provides fault tolerance mechanisms (FailOver) to continue execution in case of exceptions/interruptions. -# Computing Engine +## Computing Engine The core modules of GeaFlow computing engine mainly include execution plan generation and optimization, unified cycle scheduling, and worker runtime execution. The following is an introduction to these core modules. -## Execution Plan +### Execution Plan For the submitted PipelineTask in the stream graph scenario, a unified execution plan model is constructed, and different execution modes are aggregated together as a group for scheduling to provide a unified execution unit. * PipelineGraph @@ -22,9 +22,9 @@ For the submitted PipelineTask in the stream graph scenario, a unified execution * ExecutionGraph The ExecutionGraph aggregates a group of executable vertices together to build the corresponding ExecutionGroup based on different calculation models. Each group represents an independent schedulable unit. A group can be built from one or more vertices and can be considered as a small execution plan. The data exchange within the group is done in pipeline mode, while batch mode is used between groups. The group describes the specific execution mode, supports nesting, can be executed once or multiple times, and can execute data from one or more windows. The group is shown in the following figure. ExecutionGroup is ultimately transformed into the basic unit cycle for scheduling and execution. - ![group.jpg](../../static/img/framework_dag.jpeg) + ![group.jpg](../../../static/img/framework_dag.jpeg) -## Scheduling Model +### Scheduling Model Scheduling generates scheduling basic units cycles based on ExecutionGroup defined in ExecutionGraph. A cycle is a basic unit that can be executed repeatedly and contains descriptions of input, intermediate calculations, data exchange, and output. The scheduling and execution process is mainly as follows: 1. Divide the execution plan into a group of cycles. If there is no data dependency between cycles, they can be executed in series or in parallel. 2. According to the scheduling data policy of the cycle, the cycle is scheduled and executed in order. @@ -34,35 +34,35 @@ Scheduling generates scheduling basic units cycles based on ExecutionGroup defin Tail task: the end of the cycle data stream. After processing the data, the tail task sends an event to the scheduler, indicating that a round of calculation is completed. Other non-head/tail tasks: intermediate execution tasks that receive input data from upstream, process it, and send it directly to the downstream. The scheduling and execution process of the cycle is like a "loop," continuously sending events to the head and receiving return events from the tail, as shown in the following figure. The scheduling initializes the scheduling state according to the type of the cycle, and the scheduling process is also a process of state transition. Based on the received event, the scheduling decides the type of event to be sent to the head for the next round. - ![scheduler.jpg](../../static/img/framework_cyle.jpeg) + ![scheduler.jpg](../../../static/img/framework_cyle.jpeg) -## Runtime Execution -### Overall Introduction +### Runtime Execution +#### Overall Introduction Runtime module is responsible for the specific computation and execution of all GeaFlow mode tasks, including streaming-batch, static/dynamic graph. Its entire worker process is as follows: 1. Scheduler is responsible for sending various types of events to Container for processing. 2. Dispatcher (inherited from AbstractTaskRunner) is responsible for receiving events sent by Scheduler and distributing them to specified TaskRunners according to their workerId. 3. TaskRunner (also inherited from AbstractTaskRunner) is responsible for fetching TASK(Event) from taskQueue, and the specific Event will be processed by Task. The whole lifecycle of Task includes creation, processing and ending. For abnormal Tasks, they can be directly interrupted. a. Task creation and initialization will be completed according to CreateTaskEvent. The lifecycle of Task will end according to DestroyTaskEvent. b. Other types of events will be completed on the semantic level of specific calculation through execute() method of corresponding CommandEvent. For example, according to InitCycleEvent, Worker will build upstream and downstream dependencies. According to LaunchSourceEvent, Worker will trigger source to start reading data, etc. - ![undefined](../../static/img/framework_scheduler.png) + ![undefined](../../../static/img/framework_scheduler.png) The core data structure in the current TaskContext includes the Worker responsible for executing computations, the FetchService responsible for asynchronous data reading from upstream nodes to downstream, and the EmitterService responsible for outputting data generated by the execution Worker to downstream. * Worker: mainly responsible for aligning and processing data in the flow graph, and calling back the corresponding DoneEvent to Scheduler after each batch processing, so that Scheduler can perform subsequent scheduling logic according to the DoneEvent. * FetchService: responsible for asynchronously pulling data from the upstream channel, and putting it into the worker processing queue through the Listener registered by the worker. * EmitterService: responsible for partition writing the data generated by the Worker into the corresponding Channel. -### Command Event +#### Command Event * Command Events can be divided into two types: * Normal Command Events, which do not have specific execute logic and are usually used to trigger the start and end of Task or Cycle lifecycles. * Executable Command Events, which have their own execute logic, such as Cycle initialization, data reading in the Source node, computation in the processing node, and cleanup after the Cycle ends. * In the scheduling module, various types of events will be constructed into a State Machine for the lifecycle management of the entire scheduling task. * Developers can extend Events and implement corresponding execute logic according to design needs, and add them to the State Machine. Then Scheduler can automatically schedule and execute them as expected. -## Fault Tolerance And Exception Recovery -### Cluster Component Fault Tolerance +### Fault Tolerance And Exception Recovery +#### Cluster Component Fault Tolerance For all runtime component processes, such as master/driver/container, they are initialized and run based on context. When creating a new process, the context required by the process is constructed first, and each process persists the context during initialization. When a process restarts abnormally, the context is recovered first, and then the process is re-initialized based on the context. -### Job Exception Recovery -![undefined](../../static/img/framework_failover.jpeg) +#### Job Exception Recovery +![undefined](../../../static/img/framework_failover.jpeg) * Distributed snapshot of job The Scheduler determines the new windowId to be sent to the running tasks based on its current scheduling state, triggering the computation of the new window. When each operator completes the calculation of the corresponding window, if a snapshot of the current window context needs to be taken, the corresponding state within the operator will be persisted to storage. Finally, after the Scheduler receives the message that all tasks for a certain window have completed, it will take a snapshot of the scheduling metadata as needed and persist it. Only then is the processing of this window considered finished. When the Scheduler and the operator recover to this window context, they can continue to execute based on that window. diff --git a/docs/docs-en/principle/state_principle.md b/docs/docs-en/source/4.concepts/4.state_principle.md similarity index 96% rename from docs/docs-en/principle/state_principle.md rename to docs/docs-en/source/4.concepts/4.state_principle.md index 68630356d..d9c6d5cf3 100644 --- a/docs/docs-en/principle/state_principle.md +++ b/docs/docs-en/source/4.concepts/4.state_principle.md @@ -1,4 +1,6 @@ -# State Management in Geaflow +# Introduction to State Principles + +## State Management in Geaflow In Geaflow, state refers to the intermediate calculation results of directly calculated nodes during graph and flow computation processes. This intermediate result may be organized source data information or some results generated by calculation. State management is responsible for the storage and access of this data as well as consistency assurance. As the central data hub of Geaflow, its functional model, performance, and reliability directly affect the entire use process of Geaflow, and it is the foundation of the entire system. * In terms of the function, it supports Geaflow's real-time, multi-mode dynamic graph engine, including low-latency flow graph fusion computation, high-performance long-cycle graph simulation, large-scale dynamic graph exploration, and more. @@ -9,9 +11,9 @@ In Geaflow, state refers to the intermediate calculation results of directly cal Therefore, we have the following architecture diagram, which is flexible and supports multiple pluggable components. -## State Architecture +### State Architecture -![state_arch](../../static/img/state_arch_new.png) +![state_arch](../../../static/img/state_arch_new.png) * **State API**: Provides an API for key-value storage operations, such as get/put/delete. It also provides APIs for graph storage, such as V/E/VE, and operations for adding/updating/deleting vertices and edges. * **State Execution Layer**: Implements data sharding and scalability through the design of KeyGroups. Accessor provides IO abstractions for different read/write strategies and data models. StateOperator abstracts the storage layer SPI, including operations like finish (flushing to disk), archive (checkpoint), compact (compression), recover (recovery), etc. Additionally, State provides various pushdown optimizations to accelerate IO access efficiency. Customized memory management and attribute-based secondary indexing are also provided for storage access optimization. @@ -22,7 +24,7 @@ Therefore, we have the following architecture diagram, which is flexible and sup The life of State shows below: -![state_flow](../../static/img/state_flow.png) +![state_flow](../../../static/img/state_flow.png) When failOver happens, it will recover from the last persistent data. The following is the detailed process. @@ -56,19 +58,19 @@ In each computing task, users need to periodically do a checkpoint to persist th When an exception occurs, the framework layer will perform FailOver, and State will automatically roll back to the latest state. Depending on the choice of the persistence layer as mentioned above, State data will be retrieved and loaded from the corresponding distributed file storage or object storage. -# The Types of State +## The Types of State State can be roughly divided into Graph State and Key State, corresponding to different data structures and mapping to different storage models in the Store layer. For example, for the store type of Rocksdb, there will be different types of storage models such as KV and Graph. -![state_type](../../static/img/state_type.png) +![state_type](../../../static/img/state_type.png) -## Graph State +### Graph State Graph State can be further classified into StaticGraph and DynamicGraph, based on whether it is a dynamic graph or not. The difference is that StaticGraph treats the entire graph as a complete one, and all operations are performed on the complete graph. On the other hand, DynamicGraph considers the graph to be dynamic, consisting of slice graphs that form a complete graph. -### Static Graph State +#### Static Graph State StaticGraphState API is divided into several parts, including query, upsert, delete, and manage. @@ -80,13 +82,13 @@ StaticGraphState API is divided into several parts, including query, upsert, del * manage: Divided into operator and other operations. Operator is the data operation of the State, which can perform flushing persistence or recovery. Other operations include obtaining information such as summary and metrics. -### Dynamic Graph State +#### Dynamic Graph State DynamicGraphState API is similar to StaticGraphState, but each node and edge is associated with a version number. At the same time, Dynamic Graph State also adds version-related queries, which can obtain all versions or the latest version corresponding to certain nodes, and can obtain the specific values of each version. -## Key State +### Key State KeyState API is divided into several parts, including: diff --git a/docs/docs-en/principle/console_principle.md b/docs/docs-en/source/4.concepts/5.console_principle.md similarity index 97% rename from docs/docs-en/principle/console_principle.md rename to docs/docs-en/source/4.concepts/5.console_principle.md index 31644236b..cf85900cd 100644 --- a/docs/docs-en/principle/console_principle.md +++ b/docs/docs-en/source/4.concepts/5.console_principle.md @@ -4,7 +4,7 @@ GeaFlow Console provides a unified platform for graph development and operations ### Platform Architecture -![console_arch](../../static/img/console_arch.png) +![console_arch](../../../static/img/console_arch.png) * **RESTful API**: The platform provides standardized RESTful APIs and authentication mechanisms, supporting unified API services for web and application clients. * **Job Development**: The platform supports graph data modeling based on the "Relationship-Entity-Attribute" paradigm. Based on field mapping configurations, users can define graph data integration (Import) and distribution (Export) tasks. Graph data processing tasks based on graph models support diverse computational scenarios, such as Traversal, Compute, Mining, etc. Graph data services based on data accelerators provide real-time analysis capabilities supporting multiple protocols and integration with BI and visualization analysis tools. @@ -17,7 +17,7 @@ GeaFlow Console provides a unified platform for graph development and operations GeaFlow supports execution in various heterogeneous environments. Taking the common K8S deployment environment as an example, the physical deployment architecture of GeaFlow is as follows: -![deploy_arch](../../static/img/deploy_arch.png) +![deploy_arch](../../../static/img/deploy_arch.png) During the full lifecycle of a GeaFlow task, there are several key phases: diff --git a/docs/docs-en/concepts/graph_view.md b/docs/docs-en/source/4.concepts/graph_view.md similarity index 96% rename from docs/docs-en/concepts/graph_view.md rename to docs/docs-en/source/4.concepts/graph_view.md index 3d459a45d..c2063349a 100644 --- a/docs/docs-en/concepts/graph_view.md +++ b/docs/docs-en/source/4.concepts/graph_view.md @@ -1,3 +1,5 @@ +# Graph view + ## Fundamental Conception GraphView is the critical core data abstraction in Geaflow, representing a virtual view based on graph structure. It is an abstraction of graph physical storage, which can represent the storage and operation of graph data on multiple nodes. In Geaflow, GraphView is a first-class citizen, and all user operations on the graph are based on GraphView. For example, distributing point and edge streams as GraphView incremental point/edge data sets, generating snapshots for the current GraphView, and triggering calculations based on snapshot graphs or dynamic GraphViews. @@ -10,7 +12,7 @@ GraphView has the following main functions: * Graph partitioning: it also supports different graph partitioning methods. * Graph computation: it can perform iterative traversal or computation on the graph. -![graph_view|(4000x2500)](../../static/img/graph_view.png) +![graph_view|(4000x2500)](../../../static/img/graph_view.png) ## Example Introduction diff --git a/docs/docs-en/source/4.concepts/index.rst b/docs/docs-en/source/4.concepts/index.rst new file mode 100644 index 000000000..64e864d30 --- /dev/null +++ b/docs/docs-en/source/4.concepts/index.rst @@ -0,0 +1,13 @@ +Concepts +==== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + 1.glossary.md + 2.dsl_principle.md + 3.framework_principle.md + 4.state_principle.md + 5.console_principle.md \ No newline at end of file diff --git a/docs/docs-en/concepts/stream_graph.md b/docs/docs-en/source/4.concepts/stream_graph.md similarity index 99% rename from docs/docs-en/concepts/stream_graph.md rename to docs/docs-en/source/4.concepts/stream_graph.md index 5ba265b74..97c1260e2 100644 --- a/docs/docs-en/concepts/stream_graph.md +++ b/docs/docs-en/source/4.concepts/stream_graph.md @@ -1,3 +1,5 @@ +#Stream Graph + ## Fundamental Conception The term "Streaming Graph" refers to graph data that is stream-based, dynamic, and constantly changing. Within the context of GeaFlow, Streaming Graph also refers to the computing mode for streaming graphs, Which is designed for graphs that undergo streaming changes, and performs operations such as graph traversal, graph matching, and graph computation based on graph changes. diff --git a/docs/docs-en/application-development/api/overview.md b/docs/docs-en/source/5.application-development/1.api/1.overview.md similarity index 98% rename from docs/docs-en/application-development/api/overview.md rename to docs/docs-en/source/5.application-development/1.api/1.overview.md index 6308430ae..c81940876 100644 --- a/docs/docs-en/application-development/api/overview.md +++ b/docs/docs-en/source/5.application-development/1.api/1.overview.md @@ -3,7 +3,7 @@ GeaFlow API is a development interface provided for advanced users, which suppor * Graph API: Graph is a first-class citizen of the GeaFlow framework. Currently, the GeaFlow framework provides a set of graph computing programming interfaces based on GraphView, including graph construction, graph computation, and traversal. In GeaFlow, both static and dynamic graphs are supported. * Static Graph API: Static graph computing API. Full graph computing or traversal can be performed based on this API. * Dynamic Graph API: Dynamic graph computing API. GraphView is the data abstraction of a dynamic graph in GeaFlow. Based on GraphView, dynamic graph computing or traversal can be performed. At the same time, support for Snapshot generation from GraphView is provided. The Snapshot can provide the same interface capability as the Static Graph API. - ![api_arch](../../../static/img/api_arch.jpeg) + ![api_arch](../../../../static/img/api_arch.jpeg) * The Stream API: GeaFlow provides a set of programming interfaces for general computing, including source construction, streaming batch computing, and sink output. In GeaFlow, both Batch and Stream are supported. * Batch API: Batch computing API, which can perform batch computation based on this API. * Stream API: Streaming computing API. StreamView is the data abstraction of a dynamic stream in GeaFlow. Based on StreamView, streaming computing can be performed. @@ -12,7 +12,7 @@ From the introduction of the two types of APIs, it can be seen that GeaFlow unif * For streaming or dynamic graph APIs, the Window can be split by size, and each window reads a certain size of data to achieve incremental computation. * For batch or static graph APIs, the Window will use the AllWindow mode, and a window will read the full amount of data to achieve full computation. -# Maven依赖 +## Maven Dependency To develop a GeaFlow API application, you need to add maven dependencies: ```xml @@ -47,8 +47,8 @@ To develop a GeaFlow API application, you need to add maven dependencies: ``` -# Overview Of Functions -## Graph API +## Overview Of Functions +### Graph API Graph API is a first-class citizen in GeaFlow, which provides a set of graph computing programming interfaces based on GraphView, including graph construction, graph computation, and traversal. The specific API descriptions are shown in the following table: @@ -108,7 +108,7 @@ Graph API is a first-class citizen in GeaFlow, which provides a set of graph com
-## Stream API +### Stream API The Stream API provides a set of programming interfaces for general computation, including source construction, stream and batch computing, and sink output. The specific API documentation is shown in the table below: @@ -183,14 +183,14 @@ The Stream API provides a set of programming interfaces for general computation, -# Typical Example -## Introduction to PageRank dynamic graph computing example -### The Definition Of PageRank +## Typical Example +### Introduction to PageRank Dynamic Graph Computing Example +#### The Definition Of PageRank PageRank algorithm was originally used to calculate the importance of Internet pages. It was proposed by Page and Brin in 1996 and used in the ranking of web pages in Google search engine. In fact, PageRank can be defined on any digraph and later applied to many problems such as social influence analysis, text summary and so on. Assuming that the Internet is a directed graph, the random walk model is defined on the basis of which, namely, the first-order Markov chain, represents the process of web page visitors browsing the web pages randomly on the Internet. It is assumed that the viewer jumps to the next page with equal probability according to the hyperlink connected out in each page, and continues to carry out such a random jump on the Internet, this process forms a first-order Markov chain. PageRank represents the smooth distribution of this Markov chain. The PageRank value of each page is the stationary probability. The implementation of the algorithm is as follows: 1. Assume that the initial influence value of each point in the figure is the same; 2. 2. Calculate the jump probability of each point to other points, and update the influence value of the point; 3. Perform n iterations until the influence value of each point no longer changes, that is the convergence state. -### Example Code +#### Example Code ```java @@ -416,8 +416,8 @@ public class IncrGraphCompute { ``` -## Introduction to PageRank static graph computing example -### Example Code +### Introduction to PageRank Static Graph Computing Example +#### Example Code ```java @@ -549,8 +549,8 @@ public class PageRank { ``` -## Introduction to WordCount batch computation example -### Example Code +### Introduction to WordCount Batch Computation Example +#### Example Code ```java @@ -654,8 +654,8 @@ public class WordCountStream { ``` -## Introduction to KeyAgg stream computation example -### Example Code +### Introduction to KeyAgg stream computation example +#### Example Code ```java diff --git a/docs/docs-en/application-development/api/stream/source.md b/docs/docs-en/source/5.application-development/1.api/2.stream/1.source.md similarity index 99% rename from docs/docs-en/application-development/api/stream/source.md rename to docs/docs-en/source/5.application-development/1.api/2.stream/1.source.md index 39fad8e67..9f3a0489d 100644 --- a/docs/docs-en/application-development/api/stream/source.md +++ b/docs/docs-en/source/5.application-development/1.api/2.stream/1.source.md @@ -2,7 +2,7 @@ GeaFlow provides Source API to the public, and IWindow needs to be provided at the interface level to build the corresponding window source. Users can define the specific source reading logic by implementing SourceFunction. -# Interface +## Interface | API | Interface Description | Input Parameter Description | | -------- | -------- | -------- | @@ -21,7 +21,7 @@ To build a window source, users can generally use the buildSource interface prov SizeTumblingWindow.of(2)); ``` -# Example +## Example ```java public class WindowStreamWordCount { diff --git a/docs/docs-en/application-development/api/stream/process.md b/docs/docs-en/source/5.application-development/1.api/2.stream/2.process.md similarity index 99% rename from docs/docs-en/application-development/api/stream/process.md rename to docs/docs-en/source/5.application-development/1.api/2.stream/2.process.md index 678d46098..5b64bfd24 100644 --- a/docs/docs-en/application-development/api/stream/process.md +++ b/docs/docs-en/source/5.application-development/1.api/2.stream/2.process.md @@ -1,7 +1,7 @@ # Process Introduction GeaFlow provides a series of Process APIs to the public, which are similar to general stream batch but not identical. As already introduced in the Source API, the source constructed from it has window semantics. Therefore, all GeaFlow Process APIs also have window semantics. -# Interface +## Interface | API | Interface Description | Input Parameter Description | | -------- | -------- | -------- | | PWindowStream map(MapFunction mapFunction) | By implementing mapFunction, input T can be transformed into R and output to downstream |mapFunction:Users define their own conversion logic, T represents input type, and R represents output type| @@ -16,7 +16,7 @@ GeaFlow provides a series of Process APIs to the public, which are similar to ge -# Example +## Example ```java public class StreamUnionPipeline implements Serializable { diff --git a/docs/docs-en/application-development/api/stream/sink.md b/docs/docs-en/source/5.application-development/1.api/2.stream/3.sink.md similarity index 98% rename from docs/docs-en/application-development/api/stream/sink.md rename to docs/docs-en/source/5.application-development/1.api/2.stream/3.sink.md index cd030746f..ed2a5e1ff 100644 --- a/docs/docs-en/application-development/api/stream/sink.md +++ b/docs/docs-en/source/5.application-development/1.api/2.stream/3.sink.md @@ -1,7 +1,7 @@ # Sink Introduction GeaFlow provides Sink API to the public, used to build Window Sink. Users can define specific output logic by implementing SinkFunction. -# Interface +## Interface | API | Interface Description | Input Parameter Description | | -------- | -------- | -------- | | PStreamSink sink(SinkFunction sinkFunction) | Output the result |SinkFunction: Users can define their respective output semantics by implementing the SinkFunction interface. GeaFlow has integrated several sink functions internally, such as Console, File, etc.| @@ -11,7 +11,7 @@ GeaFlow provides Sink API to the public, used to build Window Sink. Users can de source.sink(v -> {LOGGER.info("result: {}", v)}); ``` -# Example +## Example ```java public class WindowStreamWordCount { diff --git a/docs/docs-en/source/5.application-development/1.api/2.stream/index.rst b/docs/docs-en/source/5.application-development/1.api/2.stream/index.rst new file mode 100644 index 000000000..0173956f6 --- /dev/null +++ b/docs/docs-en/source/5.application-development/1.api/2.stream/index.rst @@ -0,0 +1,11 @@ +Stream +==== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + 1.source.md + 2.process.md + 3.sink.md \ No newline at end of file diff --git a/docs/docs-en/application-development/api/graph/traversal.md b/docs/docs-en/source/5.application-development/1.api/3.graph/1.traversal.md similarity index 99% rename from docs/docs-en/application-development/api/graph/traversal.md rename to docs/docs-en/source/5.application-development/1.api/3.graph/1.traversal.md index df8b58443..f8c43e379 100644 --- a/docs/docs-en/application-development/api/graph/traversal.md +++ b/docs/docs-en/source/5.application-development/1.api/3.graph/1.traversal.md @@ -1,8 +1,8 @@ # Graph Traversal Introduction GeaFlow provides interfaces for implementing graph traversal algorithms, which can be used for subgraph traversal and full graph traversal. Users can choose to continue traversing vertices or edges in the traversal algorithm and define the number of iterations. -# Dynamic Graph -## Interface +## Dynamic Graph +### Interface | API | Interface Description | Input Parameter Description | | -------- | -------- | -------- | | void open(IncVertexCentricTraversalFuncContext vertexCentricFuncContext) | Perform the open operation of vertexCentricFunction | vertexCentricFuncContext: where K represents the type of vertex ID, VV represents the type of vertex value, EV represents the type of edge value, M represents the type of message defined in graph traversal, and R represents the type of traversal result | @@ -53,7 +53,7 @@ GeaFlow provides interfaces for implementing graph traversal algorithms, which c } ``` -## Example +### Example ```java public class IncrGraphTraversalAll { @@ -202,9 +202,9 @@ public class IncrGraphTraversalAll { } ``` -# Statical Graph +## Statical Graph -## Interface +### Interface | API | Interface Description | Input Parameter Description | | -------- | -------- | -------- | | void open(VertexCentricTraversalFuncContext vertexCentricFuncContext) | Perform open operation using vertexCentric function | vertexCentricFuncContext: K represents the type of vertex ID, VV represents the type of vertex value, EV represents the type of edge value, M represents the type of message defined in graph traversal, and R represents the type of traversal result | @@ -251,7 +251,7 @@ public interface VertexCentricTraversalFunction extends VertexC } ``` -## Example +### Example ```java public class StaticGraphTraversalAllExample { private static final Logger LOGGER = diff --git a/docs/docs-en/application-development/api/graph/compute.md b/docs/docs-en/source/5.application-development/1.api/3.graph/2.compute.md similarity index 99% rename from docs/docs-en/application-development/api/graph/compute.md rename to docs/docs-en/source/5.application-development/1.api/3.graph/2.compute.md index 8f2fb3ad0..94f7c585a 100644 --- a/docs/docs-en/application-development/api/graph/compute.md +++ b/docs/docs-en/source/5.application-development/1.api/3.graph/2.compute.md @@ -1,8 +1,8 @@ # Graph Compute Introduction GeaFlow provides interfaces for implementing graph computing algorithms, and static or dynamic graph computing can be performed by implementing the corresponding interfaces. Users can define specific computing logic and maximum iteration times in the compute algorithm. -# Dynamic Graph -## Interface +## Dynamic Graph +### Interface | API | Interface Description | Input Parameter Description | | -------- | -------- | -------- | | void init(IncGraphComputeContext incGraphContext) | Graph computing initialization interface | incGraphContext: Context for incremental dynamic graph computing, where K represents the type of vertex ID, VV represents the type of vertex value, EV represents the type of edge value, and M represents the type of message to be sent | @@ -116,7 +116,7 @@ public interface IncVertexCentricComputeFunction extends } ``` -## Example +### Example ```java public class IncrGraphCompute { @@ -232,8 +232,8 @@ public class IncrGraphCompute { } ``` -# Static Graph -## Interface +## Static Graph +### Interface | API | Interface Description | Input Parameter Description | | -------- | -------- | -------- | | void init(VertexCentricComputeFuncContext vertexCentricFuncContext) | Iterative computing initialization interface | vertexCentricFuncContext: Context for static graph computing, where K represents the type of vertex ID, VV represents the type of vertex value, EV represents the type of edge value, and M represents the type of message to be sent | @@ -263,7 +263,7 @@ EV, M> { } ``` -## Example +### Example ```java public class StaticsGraphCompute { diff --git a/docs/docs-en/source/5.application-development/1.api/3.graph/index.rst b/docs/docs-en/source/5.application-development/1.api/3.graph/index.rst new file mode 100644 index 000000000..d434f16d0 --- /dev/null +++ b/docs/docs-en/source/5.application-development/1.api/3.graph/index.rst @@ -0,0 +1,10 @@ +Graph +==== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + 1.traversal.md + 2.compute.md \ No newline at end of file diff --git a/docs/docs-en/source/5.application-development/1.api/index.rst b/docs/docs-en/source/5.application-development/1.api/index.rst new file mode 100644 index 000000000..2b56c8e3b --- /dev/null +++ b/docs/docs-en/source/5.application-development/1.api/index.rst @@ -0,0 +1,11 @@ +API +=========== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + 1.overview.md + 2.stream/index.rst + 3.graph/index.rst diff --git a/docs/docs-en/application-development/dsl/overview.md b/docs/docs-en/source/5.application-development/2.dsl/1.overview.md similarity index 62% rename from docs/docs-en/application-development/dsl/overview.md rename to docs/docs-en/source/5.application-development/2.dsl/1.overview.md index 461027443..7ce856c89 100644 --- a/docs/docs-en/application-development/dsl/overview.md +++ b/docs/docs-en/source/5.application-development/2.dsl/1.overview.md @@ -1,7 +1,7 @@ # Hybrid-DSL Introduction Hybrid-DSL is a data analysis language provided by GeaFlow, which supports standard SQL+ISO/GQL for analysis on graph and tables. Through Hybrid-DSL, relational operations can be performed on table data, and graph matching and graph algorithm calculation can be performed on graph data. It also supports processing table and graph data at the same time. -# Hybrid-DSL Cases +## Hybrid-DSL Cases - **Process GQL return results through SQL** @@ -39,7 +39,7 @@ Hybrid-DSL is a data analysis language provided by GeaFlow, which supports stand It is possible to define a parameter table for GQL, where the data in the parameter table triggers GQL queries one by one. GQL will return the computation results corresponding to each parameter separately. -# Maven依赖 +## Maven Dependency * Developing UDF/UDAF/UDTF/UDGA requires adding the following dependencies: ```xml @@ -59,34 +59,3 @@ Hybrid-DSL is a data analysis language provided by GeaFlow, which supports stand 0.1 ``` - -# DSL Syntax Documents -* DSL Syntax - * [DDL](reference/ddl.md) - * [DML](reference/dml.md) - * DQL - * [Select](reference/dql/select.md) - * [Union](reference/dql/union.md) - * [Match](reference/dql/match.md) - * [With](reference/dql/with.md) - * [USE](reference/use.md) -* Build-in Functions - * [Math Operation](build-in/math.md) - * [Logical Operation](build-in/logical.md) - * [String Function](build-in/string.md) - * [Date Function](build-in/date.md) - * [Condition Function](build-in/condition.md) - * [Aggregate Function](build-in/aggregate.md) - * [Table Function](build-in/table.md) -* User Defined Functions - * [UDF](udf/udf.md) - * [UDTF](udf/udtf.md) - * [UDAF](udf/udaf.md) - * [UDGA](udf/udga.md) -* Connector - * [Hive Connector](connector/hive.md) - * [File Connector](connector/file.md) - * [Kafka Connector](connector/kafka.md) - * [Pulsar Connector](connector/pulsar.md) - * [User Defined Connector](connector/udc.md) - \ No newline at end of file diff --git a/docs/docs-en/application-development/dsl/reference/use.md b/docs/docs-en/source/5.application-development/2.dsl/2.syntax/1.dcl.md similarity index 100% rename from docs/docs-en/application-development/dsl/reference/use.md rename to docs/docs-en/source/5.application-development/2.dsl/2.syntax/1.dcl.md diff --git a/docs/docs-en/application-development/dsl/reference/ddl.md b/docs/docs-en/source/5.application-development/2.dsl/2.syntax/2.ddl.md similarity index 97% rename from docs/docs-en/application-development/dsl/reference/ddl.md rename to docs/docs-en/source/5.application-development/2.dsl/2.syntax/2.ddl.md index 2cdd5c3cd..b44469646 100644 --- a/docs/docs-en/application-development/dsl/reference/ddl.md +++ b/docs/docs-en/source/5.application-development/2.dsl/2.syntax/2.ddl.md @@ -1,6 +1,6 @@ # DDL -# Table -## Create Table +## Table +### Create Table This command is used to create a table in GeaFlow, which belongs to external tables and is stored in catalog. **Syntax** @@ -30,7 +30,7 @@ This example creates a table called **v_person_table**, which includes three col -### Data Type +#### Data Type | Type Name | Value | | -------- | -------- | @@ -41,7 +41,7 @@ This example creates a table called **v_person_table**, which includes three col | DOUBLE | Range: -2^1024 ~ +2^1024 | | VARCHAR | Variable length character string | -### External Tables +#### External Tables When creating a table, the **WITH** keyword can be used to associate the table with its configs, which only apply to that table. By specifying configs, GeaFlow's table can access external tables. The **type** config is used to specify the storage type of the external table. **Example** @@ -68,7 +68,7 @@ From top to bottom, "**type**" specifies the storage type is file. For details of the external table types and their corresponding usage supported by GeaFlow, please refer to the Connector section. -## Create View +### Create View This command is used to create a temporary table view that represents the query result. **Syntax** @@ -88,8 +88,8 @@ CREATE VIEW console_1 (a, b, c) AS SELECT id, name, age FROM v_person_table; ``` -# Graph -## Create Graph +## Graph +### Create Graph This command is used to create a graph. For graphs, GeaFlow has self-maintained storage. **Syntax** @@ -156,7 +156,7 @@ CREATE GRAPH dy_modern ( This example creates a graph "**dy_modern**" that is divided into two shards and stored in RocksDB. The graph has two types of nodes and edges. Nodes "**Person**" and "**Software**" both have an "**id**" field of type long as the identifier, and they also have a "name" field. Nodes of type "Person" have an additional field "age", and nodes of type "Software" have a field "lang". Edges of type "**knows**" and "**created**" have "**srcId**" and "**targetId**" fields of type long as source and destination identifiers respectively. They do not have a timestamp, but both have a "weight" field. -### Vertex/Edge Type Rules +#### Vertex/Edge Type Rules In theory, the vertex and edge fields can be named arbitrarily and assigned any type. However, an unreasonable graph schema can cause difficult problems for subsequent type calculations. For example, "**Match (p)**" matches any type of vertex p, and for the dy_modern graph, which only has two types of vertex, it is equivalent to "**Match (p:person|software)**".Here, the **union calculation** of two types of nodes generates the "**person|software**" type, which is very common in graph traversal. @@ -176,8 +176,8 @@ The storage type of a graph can be specified in the configuration list associate The number of storage shards for a graph can be specified using the "**shardCount**" config. The number of storage shards affects the parallelism of graph traversal. Setting a larger value can utilize more machines for parrallel computation, but will also require more resources. -# Function -## Create Function +## Function +### Create Function This command is used to import a user defined function. diff --git a/docs/docs-en/application-development/dsl/reference/dml.md b/docs/docs-en/source/5.application-development/2.dsl/2.syntax/3.dml.md similarity index 98% rename from docs/docs-en/application-development/dsl/reference/dml.md rename to docs/docs-en/source/5.application-development/2.dsl/2.syntax/3.dml.md index 4af79c784..80e4d17f5 100644 --- a/docs/docs-en/application-development/dsl/reference/dml.md +++ b/docs/docs-en/source/5.application-development/2.dsl/2.syntax/3.dml.md @@ -1,5 +1,5 @@ # DML -# Insert Table +## Insert Table **Syntax** ``` INSERT INTO
@@ -31,10 +31,10 @@ RETURN a.id as a_id, e.weight as weight, b.id as b_id; This example inserts a result returned by a graph traversal query into the **tbl_result** table. -# Insert Graph +## Insert Graph Insert command can also insert data into the graph. Unlike tables, graphs use storage self-maintained by GeaFlow. -# Insert vertex/edge +## Insert Vertex/Edge When insert data into vertex or edge in the graph using the _INSERT_ command, the target node is represented by the graph name and vertex/edge name separated by a dot, and supports reordering of fields. **Syntax** @@ -74,7 +74,7 @@ SELECT 1, 2, 0.2 ``` This example inserts one row into the edge **knows** in the graph **dy_modern**. -# Multi table insert +## Multi Table Insert Sometimes the source table needs to be inserted into multiple nodes simultaneously, especially when the foreign key of the source table represents a relationship, which often needs to be transformed into a type of edge, and the foreign key value will also become the opposite endpoint of the edge. The INSERT statement also supports this type of insertion where a single source table has multiple target nodes. **Syntax** diff --git a/docs/docs-en/source/5.application-development/2.dsl/2.syntax/4.dql/index.rst b/docs/docs-en/source/5.application-development/2.dsl/2.syntax/4.dql/index.rst new file mode 100644 index 000000000..a46a4301b --- /dev/null +++ b/docs/docs-en/source/5.application-development/2.dsl/2.syntax/4.dql/index.rst @@ -0,0 +1,9 @@ +DQL +==== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + * \ No newline at end of file diff --git a/docs/docs-en/application-development/dsl/reference/dql/match.md b/docs/docs-en/source/5.application-development/2.dsl/2.syntax/4.dql/match.md similarity index 100% rename from docs/docs-en/application-development/dsl/reference/dql/match.md rename to docs/docs-en/source/5.application-development/2.dsl/2.syntax/4.dql/match.md diff --git a/docs/docs-en/application-development/dsl/reference/dql/select.md b/docs/docs-en/source/5.application-development/2.dsl/2.syntax/4.dql/select.md similarity index 100% rename from docs/docs-en/application-development/dsl/reference/dql/select.md rename to docs/docs-en/source/5.application-development/2.dsl/2.syntax/4.dql/select.md diff --git a/docs/docs-en/application-development/dsl/reference/dql/union.md b/docs/docs-en/source/5.application-development/2.dsl/2.syntax/4.dql/union.md similarity index 100% rename from docs/docs-en/application-development/dsl/reference/dql/union.md rename to docs/docs-en/source/5.application-development/2.dsl/2.syntax/4.dql/union.md diff --git a/docs/docs-en/application-development/dsl/reference/dql/with.md b/docs/docs-en/source/5.application-development/2.dsl/2.syntax/4.dql/with.md similarity index 100% rename from docs/docs-en/application-development/dsl/reference/dql/with.md rename to docs/docs-en/source/5.application-development/2.dsl/2.syntax/4.dql/with.md diff --git a/docs/docs-en/source/5.application-development/2.dsl/2.syntax/index.rst b/docs/docs-en/source/5.application-development/2.dsl/2.syntax/index.rst new file mode 100644 index 000000000..39605f89d --- /dev/null +++ b/docs/docs-en/source/5.application-development/2.dsl/2.syntax/index.rst @@ -0,0 +1,12 @@ +Syntax +=========== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + 1.dcl.md + 2.ddl.md + 3.dml.md + 4.dql/index.rst \ No newline at end of file diff --git a/docs/docs-en/application-development/dsl/build-in/aggregate.md b/docs/docs-en/source/5.application-development/2.dsl/3.build-in/aggregate.md similarity index 97% rename from docs/docs-en/application-development/dsl/build-in/aggregate.md rename to docs/docs-en/source/5.application-development/2.dsl/3.build-in/aggregate.md index 07b3c023e..88aba94ce 100644 --- a/docs/docs-en/application-development/dsl/build-in/aggregate.md +++ b/docs/docs-en/source/5.application-development/2.dsl/3.build-in/aggregate.md @@ -1,3 +1,5 @@ +# Aggregate + GeaFlow support the following aggregate functions: * [COUNT](#COUNT) * [MAX](#MAX) @@ -5,7 +7,7 @@ GeaFlow support the following aggregate functions: * [SUM](#SUM) * [AVG](#AVG) -# COUNT +## COUNT **Syntax** ```sql @@ -22,7 +24,7 @@ select count(distinct id) from user; select count(1) from user; ``` -# MAX +## MAX **Syntax** ```sql @@ -41,7 +43,7 @@ select id, max(age) from user group by id; select max(name) from user; ``` -# MIN +## MIN **Syntax** ```sql @@ -60,7 +62,7 @@ select id, min(age) from user group by id; select min(name) from user; ``` -# SUM +## SUM **Syntax** ```sql @@ -79,7 +81,7 @@ select sum(DISTINCT age) from user; select sum(1) from user; ``` -# AVG +## AVG **Syntax** ```sql diff --git a/docs/docs-en/application-development/dsl/build-in/condition.md b/docs/docs-en/source/5.application-development/2.dsl/3.build-in/condition.md similarity index 96% rename from docs/docs-en/application-development/dsl/build-in/condition.md rename to docs/docs-en/source/5.application-development/2.dsl/3.build-in/condition.md index b8070b288..922bf1491 100644 --- a/docs/docs-en/application-development/dsl/build-in/condition.md +++ b/docs/docs-en/source/5.application-development/2.dsl/3.build-in/condition.md @@ -1,8 +1,10 @@ +# Condition + GeaFlow support both **Case** And **If** condition functions. * [Case](#Case) * [If](#If) -# Case +## Case **Syntax** ```sql @@ -42,7 +44,7 @@ CASE WHEN a = 1 THEN '1' END ``` -# If +## If **Syntax** ```sql diff --git a/docs/docs-en/application-development/dsl/build-in/date.md b/docs/docs-en/source/5.application-development/2.dsl/3.build-in/date.md similarity index 97% rename from docs/docs-en/application-development/dsl/build-in/date.md rename to docs/docs-en/source/5.application-development/2.dsl/3.build-in/date.md index a25c2a0cc..0b61380fb 100644 --- a/docs/docs-en/application-development/dsl/build-in/date.md +++ b/docs/docs-en/source/5.application-development/2.dsl/3.build-in/date.md @@ -1,3 +1,5 @@ +# Date + GeaFlow support the following date function: * [from_unixtime](#from_unixtime) * [from_unixtime_millis](#from_unixtime_millis) @@ -18,7 +20,7 @@ GeaFlow support the following date function: * [date_part](#date_part) * [date_trunc](#date_trunc) -# from_unixtime +## from_unixtime **Syntax** ```sql @@ -38,7 +40,7 @@ from_unixtime(11111111) = '1970-05-09 22:25:11' from_unixtime(11111111, 'yyyy-MM-dd HH:mm:ss.SSSSSS') = '1970-05-09 22:25:11.000000' ``` -# from_unixtime_millis +## from_unixtime_millis **Syntax** ```sql @@ -58,7 +60,7 @@ from_unixtime_millis(11111111, 'yyyy-MM-dd HH:mm:ss') = '1970-01-01 11:05:11' from_unixtime_millis(11111111, 'yyyy-MM-dd HH:mm:ss.SSSSSS') = '1970-01-01 11:05:11.111000' ``` -# unix_timestamp +## unix_timestamp **Syntax** ```sql @@ -76,7 +78,7 @@ unix_timestamp('1987-06-05 00:11:22') = 549817882 unix_timestamp('1987-06-05 00:11', 'yyyy-MM-dd HH:mm') = 549817860 ``` -# unix_timestamp_millis +## unix_timestamp_millis **Syntax** ```sql @@ -93,7 +95,7 @@ unix_timestamp_millis('1987-06-05 00:11:22') = 549817882000 unix_timestamp_millis('1987-06-05', 'yyyy-mm-dd') = 536774760000 ``` -# isdate +## isdate **Syntax** ```sql @@ -111,7 +113,7 @@ isdate('xxxxxxxxxxxxx') = false isdate('1987-06-05 00:11:22', 'yyyy-MM-dd HH:mm:ss.SSSSSS') = false ``` -# now +## now **Syntax** ```sql @@ -129,7 +131,7 @@ now() now(1000) ``` -# day +## day **Syntax** ```sql @@ -144,7 +146,7 @@ Returns the day of date. The default format is "yyyy-MM-dd" or "yyyy-MM-dd HH:mm day('1987-06-05 00:11:22') = 5 ``` -# weekday +## weekday **Syntax** ```sql @@ -159,7 +161,7 @@ Returns the weekday of date. The default format is "yyyy-MM-dd" or "yyyy-MM-dd H weekday('1987-06-05 00:11:22') = 5 ``` -# lastday +## lastday **Syntax** ```sql @@ -174,7 +176,7 @@ Returns the last day of the month which the date belongs to. The default format lastday('1987-06-05') = '1987-06-30 00:00:00' ``` -# day_of_month +## day_of_month **Syntax** ```sql @@ -189,7 +191,7 @@ Returns the date of the month of date. The default format is "yyyy-MM-dd" or "yy day_of_month('1987-06-05 00:11:22') = 5 ``` -# week_of_year +## week_of_year **Syntax** ```sql @@ -205,7 +207,7 @@ week_of_year('1987-06-05 00:11:22') = 23 ``` -# date_add +## date_add **Syntax** ```sql @@ -222,7 +224,7 @@ date_add('2017-09-25', 1) = '2017-09-26' date_add('2017-09-25', -1) = '2017-09-24' ``` -# date_sub +## date_sub **Syntax** ```sql @@ -239,7 +241,7 @@ date_sub('2017-09-25', 1) = '2017-09-24' date_sub('2017-09-25', -1) = '2017-09-26' ``` -# date_diff +## date_diff **Syntax** ```sql @@ -255,7 +257,7 @@ date_diff('2017-09-26', '2017-09-25') = 1 date_diff('2017-09-24', '2017-09-25') = -1 ``` -# add_months +## add_months **Syntax** ```sql @@ -272,7 +274,7 @@ add_months('2017-09-25', 1) = '2017-10-25' add_months('2017-09-25', -1) = '2017-08-25' ``` -# date_format +## date_format **Syntax** ```sql @@ -291,7 +293,7 @@ date_format('1987-06-05 00:11:22', 'MM-dd-yyyy') = '06-05-1987' date_format('00:11:22 1987-06-05', 'HH:mm:ss yyyy-MM-dd', 'MM-dd-yyyy') = '06-05-1987' ``` -# date_part +## date_part **Syntax** ```sql @@ -321,7 +323,7 @@ date_part('1987-06-05 00:11:22', 'ss') = 22 date_part('1987-06-05', 'ss') = 0 ``` -# date_trunc +## date_trunc **Syntax** ```sql diff --git a/docs/docs-en/source/5.application-development/2.dsl/3.build-in/index.rst b/docs/docs-en/source/5.application-development/2.dsl/3.build-in/index.rst new file mode 100644 index 000000000..1ae5ae1bd --- /dev/null +++ b/docs/docs-en/source/5.application-development/2.dsl/3.build-in/index.rst @@ -0,0 +1,9 @@ +Build-In +==== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + * \ No newline at end of file diff --git a/docs/docs-en/application-development/dsl/build-in/logical.md b/docs/docs-en/source/5.application-development/2.dsl/3.build-in/logical.md similarity index 99% rename from docs/docs-en/application-development/dsl/build-in/logical.md rename to docs/docs-en/source/5.application-development/2.dsl/3.build-in/logical.md index 95a2eb25a..0cd14630b 100644 --- a/docs/docs-en/application-development/dsl/build-in/logical.md +++ b/docs/docs-en/source/5.application-development/2.dsl/3.build-in/logical.md @@ -1,3 +1,5 @@ +# Logical + Geaflow supports the following logical operations. Operation|Description diff --git a/docs/docs-en/application-development/dsl/build-in/math.md b/docs/docs-en/source/5.application-development/2.dsl/3.build-in/math.md similarity index 99% rename from docs/docs-en/application-development/dsl/build-in/math.md rename to docs/docs-en/source/5.application-development/2.dsl/3.build-in/math.md index d36017990..d3c7add18 100644 --- a/docs/docs-en/application-development/dsl/build-in/math.md +++ b/docs/docs-en/source/5.application-development/2.dsl/3.build-in/math.md @@ -1,3 +1,5 @@ +# Math + Geaflow supports the following mathematical operations. | Operation | Description | diff --git a/docs/docs-en/application-development/dsl/build-in/string.md b/docs/docs-en/source/5.application-development/2.dsl/3.build-in/string.md similarity index 96% rename from docs/docs-en/application-development/dsl/build-in/string.md rename to docs/docs-en/source/5.application-development/2.dsl/3.build-in/string.md index fca824046..7eff6e8c9 100644 --- a/docs/docs-en/application-development/dsl/build-in/string.md +++ b/docs/docs-en/source/5.application-development/2.dsl/3.build-in/string.md @@ -1,3 +1,5 @@ +# String + GeaFlow support the following string functions. * [ascii2str](#ascii2str) * [base64_decode](#base64_decode) @@ -27,7 +29,7 @@ GeaFlow support the following string functions. * [urldecode](#urldecode) * [urlencode](#urlencode) -# ascii2str +## ascii2str **Syntax** ```sql @@ -44,7 +46,7 @@ ascii2str(66) = 'B' ascii2str(48) = '0' ``` -# base64_decode +## base64_decode **Syntax** ```sql @@ -61,7 +63,7 @@ base64_decode('dGVzdF9zdHJpbmc=') = 'test_string' base64_decode(null) = null ``` -# base64_encode +## base64_encode **Syntax** ```sql @@ -77,7 +79,7 @@ base64_encode('abc ') = 'YWJjIA==' base64_encode('test_string') = 'dGVzdF9zdHJpbmc=' ``` -# concat +## concat **Syntax** ```sql @@ -94,7 +96,7 @@ concat('1','2',null) = '12' concat(null) = null; ``` -# concat_ws +## concat_ws **Syntax** ```sql @@ -112,7 +114,7 @@ concat_ws(',','1',null,'c') = '1,,c' concat_ws(null, 'a','b','c') = 'abc' ``` -# hash +## hash **Syntax** ```sql @@ -128,7 +130,7 @@ hash('1') = 49 hash(2) = 2 ``` -# index_of +## index_of **Syntax** ```sql @@ -146,7 +148,7 @@ index_of('a test string', 'test') = 2 index_of(null, 'test') = -1 ``` -# instr +## instr **Syntax** ```sql @@ -168,7 +170,7 @@ instr('abc', 'a', 3, -1) = null instr('abc', null) = null ``` -# isBlank +## isBlank **Syntax** ```sql @@ -184,7 +186,7 @@ isBlank('test') = false isBlank(' ') = true ``` -# length +## length **Syntax** ```sql @@ -200,7 +202,7 @@ length('abc') = 3 length('abc ') = 5 ``` -# like +## like **Syntax** ```sql @@ -217,7 +219,7 @@ like('test', 'abc\\%') = false like('abc', 'a%bc') = true ``` -# lower +## lower **Syntax** ```sql @@ -233,7 +235,7 @@ lower('ABC') = 'abc' lower(null) = null ``` -# ltrim +## ltrim **Syntax** ```sql @@ -249,7 +251,7 @@ ltrim(' abc ') = 'abc ' ltrim(' test') = 'test' ``` -# regexp +## regexp **Syntax** ```sql @@ -266,7 +268,7 @@ regexp('a.b.c.d.e.f', '.d%') = false regexp('a.b.c.d.e.f', null) = null ``` -# regexp_count +## regexp_count **Syntax** ```sql @@ -284,7 +286,7 @@ regexp('ab1d2d3dsss', '[0-9]d', 8) = 0 regexp('ab1d2d3dsss', '.b') = 1 ``` -# regexp_extract +## regexp_extract **Syntax** ```sql @@ -301,7 +303,7 @@ regexp_extract('abchebar', 'abc(.*?)(bar)', 1) = 'he' regexp_extract('100-200', '(\d+)-(\d+)') = '100' ``` -# regexp_replace +## regexp_replace **Syntax** ```sql @@ -319,7 +321,7 @@ regexp_replace('adfabadfasdf', '[a]', '3') = '3df3b3df3sdf' ``` -# repeat +## repeat **Syntax** ```sql @@ -335,7 +337,7 @@ repeat('abc', 3) = 'abcabcabc' repeat(null, 4) = null ``` -# replace +## replace **Syntax** ```sql @@ -351,7 +353,7 @@ replace('test test', 'test', 'c') = 'c c' replace('test test', 'test', '') = ' ' ``` -# reverse +## reverse **Syntax** ```sql @@ -367,7 +369,7 @@ reverse('abc') = 'cba' reverse(null) = null ``` -# rtrim +## rtrim **Syntax** ```sql @@ -383,7 +385,7 @@ rtrim(' abc ') = ' abc' rtrim('test') = 'test' ``` -# space +## space **Syntax** ```sql @@ -399,7 +401,7 @@ space(5) = ' ' space(null) = null ``` -# split_ex +## split_ex **Syntax** ```sql @@ -417,7 +419,7 @@ split_ex('a.b.c.d.e', '.', -1) = null ``` -# substr +## substr **Syntax** ```sql @@ -434,7 +436,7 @@ substr('testString', 5, 10) = 'String' substr('testString', -6) = 'String' ``` -# trim +## trim **Syntax** ```sql @@ -450,7 +452,7 @@ trim(' abc ') = 'abc' trim('abc') = 'abc' ``` -# upper +## upper **Syntax** ```sql @@ -466,7 +468,7 @@ upper('abc') = 'ABC' upper(null) = null ``` -# urldecode +## urldecode **Syntax** ```sql @@ -482,7 +484,7 @@ urldecode('a%3d0%26c%3d1') = 'a=0&c=1' urldecode('a%3D2') = 'a=2' ``` -# urlencode +## urlencode **Syntax** ```sql diff --git a/docs/docs-en/application-development/dsl/build-in/table.md b/docs/docs-en/source/5.application-development/2.dsl/3.build-in/table.md similarity index 97% rename from docs/docs-en/application-development/dsl/build-in/table.md rename to docs/docs-en/source/5.application-development/2.dsl/3.build-in/table.md index aa938e65b..77be8a7da 100644 --- a/docs/docs-en/application-development/dsl/build-in/table.md +++ b/docs/docs-en/source/5.application-development/2.dsl/3.build-in/table.md @@ -1,3 +1,5 @@ +# Table + The table function returns a list of rows for each input. **Table Function Syntax** @@ -7,7 +9,7 @@ FROM (Table | SubQuery), LATERAL TABLE '('TableFunctionRef')' AS Identifier '(' Identifier (,Identifier)* ')' ``` -# split +## Split **Syntax** ```sql diff --git a/docs/docs-en/application-development/dsl/udf/udf.md b/docs/docs-en/source/5.application-development/2.dsl/4.udf/1.udf.md similarity index 97% rename from docs/docs-en/application-development/dsl/udf/udf.md rename to docs/docs-en/source/5.application-development/2.dsl/4.udf/1.udf.md index 75d98f1e9..28b26a6d6 100644 --- a/docs/docs-en/application-development/dsl/udf/udf.md +++ b/docs/docs-en/source/5.application-development/2.dsl/4.udf/1.udf.md @@ -1,6 +1,6 @@ # UDF Introduction The UDF (User Defined Function) map scalar values to a scalar value. -# Interface +## Interface ```java public abstract class UserDefinedFunction implements Serializable { @@ -23,7 +23,7 @@ public abstract class UDF extends UserDefinedFunction { } ``` -# Example +## Example ```java public class ConcatWS extends UDF { diff --git a/docs/docs-en/application-development/dsl/udf/udaf.md b/docs/docs-en/source/5.application-development/2.dsl/4.udf/2.udaf.md similarity index 99% rename from docs/docs-en/application-development/dsl/udf/udaf.md rename to docs/docs-en/source/5.application-development/2.dsl/4.udf/2.udaf.md index c3e94a3e3..f1800dd29 100644 --- a/docs/docs-en/application-development/dsl/udf/udaf.md +++ b/docs/docs-en/source/5.application-development/2.dsl/4.udf/2.udaf.md @@ -1,6 +1,6 @@ # UDAF Introduction The UDAF (User Defined Aggregate Function) aggregate multi-rows to a single value. -# Interface +## Interface ```java public abstract class UserDefinedFunction implements Serializable { @@ -51,7 +51,7 @@ public abstract class UDAF extends UserDefinedFunction ``` -# Example +## Example ```java public class AvgDouble extends UDAF { diff --git a/docs/docs-en/application-development/dsl/udf/udtf.md b/docs/docs-en/source/5.application-development/2.dsl/4.udf/3.udtf.md similarity index 98% rename from docs/docs-en/application-development/dsl/udf/udtf.md rename to docs/docs-en/source/5.application-development/2.dsl/4.udf/3.udtf.md index dabb5768f..5a7b7be8a 100644 --- a/docs/docs-en/application-development/dsl/udf/udtf.md +++ b/docs/docs-en/source/5.application-development/2.dsl/4.udf/3.udtf.md @@ -1,6 +1,6 @@ # UDTF Introduction The UDTF (User Defined Table Function) expand the input to multi-line rows. -# Interface +## Interface ```java public abstract class UserDefinedFunction implements Serializable { @@ -45,7 +45,7 @@ public abstract class UDTF extends UserDefinedFunction { ``` Each UDTF should have one or more **eval** method. -# Example +## Example ```java public class Split extends UDTF { diff --git a/docs/docs-en/application-development/dsl/udf/udga.md b/docs/docs-en/source/5.application-development/2.dsl/4.udf/4.udga.md similarity index 99% rename from docs/docs-en/application-development/dsl/udf/udga.md rename to docs/docs-en/source/5.application-development/2.dsl/4.udf/4.udga.md index a22ef6ec1..8c84fb301 100644 --- a/docs/docs-en/application-development/dsl/udf/udga.md +++ b/docs/docs-en/source/5.application-development/2.dsl/4.udf/4.udga.md @@ -1,6 +1,6 @@ # UDGA Introduction The UDGA (User Defined Graph Algorithm) defined a graph algorithm.e.g. sssp, pagerank. -# Interface +## Interface ```java /** @@ -30,7 +30,7 @@ public interface AlgorithmUserFunction extends Serializable { ``` -# Example +## Example ```java public class PageRank implements AlgorithmUserFunction { diff --git a/docs/docs-en/source/5.application-development/2.dsl/4.udf/index.rst b/docs/docs-en/source/5.application-development/2.dsl/4.udf/index.rst new file mode 100644 index 000000000..64577447a --- /dev/null +++ b/docs/docs-en/source/5.application-development/2.dsl/4.udf/index.rst @@ -0,0 +1,12 @@ +UDF +==== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + 1.udf.md + 2.udaf.md + 3.udtf.md + 4.udga.md \ No newline at end of file diff --git a/docs/docs-en/source/5.application-development/2.dsl/index.rst b/docs/docs-en/source/5.application-development/2.dsl/index.rst new file mode 100644 index 000000000..8ecfa5db1 --- /dev/null +++ b/docs/docs-en/source/5.application-development/2.dsl/index.rst @@ -0,0 +1,12 @@ +DSL +=========== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + 1.overview.md + 2.syntax/index.rst + 3.build-in/index.rst + 4.udf/index.rst diff --git a/docs/docs-en/application-development/dsl/connector/common.md b/docs/docs-en/source/5.application-development/3.connector/1.common.md similarity index 98% rename from docs/docs-en/application-development/dsl/connector/common.md rename to docs/docs-en/source/5.application-development/3.connector/1.common.md index f51ae58ce..3e98d0ae5 100644 --- a/docs/docs-en/application-development/dsl/connector/common.md +++ b/docs/docs-en/source/5.application-development/3.connector/1.common.md @@ -1,7 +1,7 @@ # Introduction to Connector Basics GeaFlow supports reading and writing data from various connectors. GeaFlow identifies them as external tables and stores the metadata in the Catalog. -# Syntax +## Syntax ```sql CREATE [TEMPORARY] TABLE [IF NOT EXISTS] table ( @@ -25,7 +25,7 @@ The WITH clause is used to specify the configuration information for the Connect Additionally, we can add table parameters in the WITH clause. These parameters will override the external (SQL file, job parameters) configurations and have the highest priority. -# Common Options +## Common Options | Key | Required | Description | |-----------------------------------------------|----------|-------------------------------------------------| @@ -34,7 +34,7 @@ Additionally, we can add table parameters in the WITH clause. These parameters w | geaflow.dsl.partitions.per.source.parallelism | false | Groups several shards of the Source together to reduce the resource usage associated with concurrency. | -# Example +## Example ```sql CREATE TABLE console_sink ( diff --git a/docs/docs-en/application-development/dsl/connector/udc.md b/docs/docs-en/source/5.application-development/3.connector/10.udc.md similarity index 98% rename from docs/docs-en/application-development/dsl/connector/udc.md rename to docs/docs-en/source/5.application-development/3.connector/10.udc.md index d2b4f6343..02ad2133e 100644 --- a/docs/docs-en/application-development/dsl/connector/udc.md +++ b/docs/docs-en/source/5.application-development/3.connector/10.udc.md @@ -116,10 +116,10 @@ public interface TableSink extends Serializable { } ``` -# Example +## Example Here is an example for console table connector. -## Implement TableConnector +### Implement TableConnector ```java public class ConsoleTableConnector implements TableWritableConnector { @@ -173,7 +173,7 @@ public class ConsoleTableSink implements TableSink { After implement the **ConsoleTableConnector**, you should put the full class name to the **resources/META-INF.services/com.antgroup.geaflow.dsl.connector.api.TableConnector** -## Usage +### Usage ```java CREATE TABLE file_source ( diff --git a/docs/docs-en/application-development/dsl/connector/file.md b/docs/docs-en/source/5.application-development/3.connector/2.file.md similarity index 97% rename from docs/docs-en/application-development/dsl/connector/file.md rename to docs/docs-en/source/5.application-development/3.connector/2.file.md index a60ae1328..541107618 100644 --- a/docs/docs-en/application-development/dsl/connector/file.md +++ b/docs/docs-en/source/5.application-development/3.connector/2.file.md @@ -1,6 +1,6 @@ # File Connector Introduction GeaFlow support read data from file and write data to file. -# Syntax +## Syntax ```sql CREATE TABLE file_table ( @@ -12,7 +12,7 @@ CREATE TABLE file_table ( geaflow.dsl.file.path = '/path/to/file' ) ``` -# Options +## Options | Key | Required | Description | | -------- | -------- | -------- | @@ -23,7 +23,7 @@ CREATE TABLE file_table ( | geaflow.dsl.file.name.regex | false | The regular expression filter rule for file name reading is empty by default. | | geaflow.dsl.file.format | false | The file format for reading and writing supports Parquet and TXT, with the default format being TXT. | -# Example +## Example ```sql CREATE TABLE file_source ( diff --git a/docs/docs-en/application-development/dsl/connector/console.md b/docs/docs-en/source/5.application-development/3.connector/3.console.md similarity index 95% rename from docs/docs-en/application-development/dsl/connector/console.md rename to docs/docs-en/source/5.application-development/3.connector/3.console.md index 82fbf7a3e..507093019 100644 --- a/docs/docs-en/application-development/dsl/connector/console.md +++ b/docs/docs-en/source/5.application-development/3.connector/3.console.md @@ -1,6 +1,6 @@ # Console Connector Introduction -# Syntax +## Syntax ```sql CREATE TABLE console_table ( @@ -12,13 +12,13 @@ CREATE TABLE console_table ( geaflow.dsl.console.skip = true ) ``` -# Options +## Options | Key | Required | Description | | -------- | -------- |------------------------| | geaflow.dsl.console.skip | false | Whether to skip the log, i.e., no output at all. The default value is false. | -# Example +## Example ```sql CREATE TABLE file_source ( diff --git a/docs/docs-en/application-development/dsl/connector/jdbc.md b/docs/docs-en/source/5.application-development/3.connector/4.jdbc.md similarity index 98% rename from docs/docs-en/application-development/dsl/connector/jdbc.md rename to docs/docs-en/source/5.application-development/3.connector/4.jdbc.md index dad7b8e92..ca472c34c 100644 --- a/docs/docs-en/application-development/dsl/connector/jdbc.md +++ b/docs/docs-en/source/5.application-development/3.connector/4.jdbc.md @@ -1,6 +1,6 @@ # JDBC Connector Introduction The JDBC Connector is contributed by the community and supports both reading and writing operations. -# Syntax +## Syntax ```sql CREATE TABLE jdbc_table ( @@ -16,7 +16,7 @@ CREATE TABLE jdbc_table ( geaflow.dsl.jdbc.table.name = 'source_table' ); ``` -# Options +## Options | Key | Required | Description | | -------- |------|---------------------------------------------------| @@ -31,7 +31,7 @@ CREATE TABLE jdbc_table ( | geaflow.dsl.jdbc.partition.upperbound | false | The upperbound of JDBC partition, just used to decide the partition stride, not for filtering the rows in table. | -# Example +## Example ```sql set geaflow.dsl.jdbc.driver = 'org.h2.Driver'; diff --git a/docs/docs-en/application-development/dsl/connector/hive.md b/docs/docs-en/source/5.application-development/3.connector/5.hive.md similarity index 97% rename from docs/docs-en/application-development/dsl/connector/hive.md rename to docs/docs-en/source/5.application-development/3.connector/5.hive.md index 200692d96..015203049 100644 --- a/docs/docs-en/application-development/dsl/connector/hive.md +++ b/docs/docs-en/source/5.application-development/3.connector/5.hive.md @@ -1,6 +1,6 @@ # Hive Connector Introduction GeaFlow support read data from hive table through the hive metastore server. Currently we support Hive 2.3.x version. -# Syntax +## Syntax ```sql CREATE TABLE hive_table ( @@ -14,7 +14,7 @@ CREATE TABLE hive_table ( geaflow.dsl.hive.metastore.uris = 'thrift://localhost:9083' ) ``` -# Options +## Options | Key | Required | Description | | -------- | -------- | -------- | @@ -23,7 +23,7 @@ CREATE TABLE hive_table ( | geaflow.dsl.hive.metastore.uris | true | The hive metastore uris | | geaflow.dsl.hive.splits.per.partition | false | The number of splits for each hive partition.Default value is 1. | -# Example +## Example ```sql CREATE TABLE hive_table ( diff --git a/docs/docs-en/application-development/dsl/connector/kafka.md b/docs/docs-en/source/5.application-development/3.connector/6.kafka.md similarity index 97% rename from docs/docs-en/application-development/dsl/connector/kafka.md rename to docs/docs-en/source/5.application-development/3.connector/6.kafka.md index fe41638ac..17550d153 100644 --- a/docs/docs-en/application-development/dsl/connector/kafka.md +++ b/docs/docs-en/source/5.application-development/3.connector/6.kafka.md @@ -1,6 +1,6 @@ # Kafka Connector Introduction GeaFlow support read data from kafka and write data to kafka. Currently support kafka version is 2.4.1. -# Syntax +## Syntax ```sql CREATE TABLE kafka_table ( @@ -13,7 +13,7 @@ CREATE TABLE kafka_table ( geaflow.dsl.kafka.topic = 'test-topic' ) ``` -# Options +## Options | Key | Required | Description | | -------- | -------- | -------- | @@ -22,7 +22,7 @@ CREATE TABLE kafka_table ( | geaflow.dsl.kafka.group.id | false | The kafka group id. Default value is: 'default-group-id'.| -# Example +## Example ```sql CREATE TABLE kafka_source ( diff --git a/docs/docs-en/application-development/dsl/connector/hbase.md b/docs/docs-en/source/5.application-development/3.connector/7.hbase.md similarity index 97% rename from docs/docs-en/application-development/dsl/connector/hbase.md rename to docs/docs-en/source/5.application-development/3.connector/7.hbase.md index 2339dee51..cb8d2b6ee 100644 --- a/docs/docs-en/application-development/dsl/connector/hbase.md +++ b/docs/docs-en/source/5.application-development/3.connector/7.hbase.md @@ -1,7 +1,7 @@ # Hbase Connector Introduction The HBase Connector is contributed by the community and supports Sink yet. -# Syntax +## Syntax ```sql CREATE TABLE hbase_table ( @@ -15,7 +15,7 @@ CREATE TABLE hbase_table ( geaflow.dsl.hbase.rowkey.column = 'id' ); ``` -# Options +## Options | Key | Required | Description | | -------- | -------- | -------- | @@ -27,7 +27,7 @@ CREATE TABLE hbase_table ( | geaflow.dsl.hbase.familyname.mapping | false | HBase column family name mapping. | | geaflow.dsl.hbase.buffersize | false | HBase writer buffer size. | -# Example +## Example ```sql CREATE TABLE file_source ( diff --git a/docs/docs-en/application-development/dsl/connector/hudi.md b/docs/docs-en/source/5.application-development/3.connector/8.hudi.md similarity index 97% rename from docs/docs-en/application-development/dsl/connector/hudi.md rename to docs/docs-en/source/5.application-development/3.connector/8.hudi.md index 330350c1f..a97aa141b 100644 --- a/docs/docs-en/application-development/dsl/connector/hudi.md +++ b/docs/docs-en/source/5.application-development/3.connector/8.hudi.md @@ -1,6 +1,6 @@ # Hudi Connector Introduction GeaFlow Hudi currently supports reading data from files. -# Syntax +## Syntax ```sql CREATE TABLE IF NOT EXISTS hudi_person ( @@ -12,14 +12,14 @@ CREATE TABLE IF NOT EXISTS hudi_person ( geaflow.dsl.file.path='/path/to/hudi_person' ); ``` -# Options +## Options | Key | Required | Description | | -------- | -------- | -------- | | geaflow.dsl.file.path | true | The path of the file or folder to read from or write to. | | geaflow.file.persistent.config.json | false | JSON-formatted DFS configuration, which will override the system environment configuration. | -# Example +## Example ```sql set geaflow.dsl.window.size = -1; diff --git a/docs/docs-en/application-development/dsl/connector/pulsar.md b/docs/docs-en/source/5.application-development/3.connector/9.pulsar.md similarity index 98% rename from docs/docs-en/application-development/dsl/connector/pulsar.md rename to docs/docs-en/source/5.application-development/3.connector/9.pulsar.md index daf49ce5d..16010d322 100644 --- a/docs/docs-en/application-development/dsl/connector/pulsar.md +++ b/docs/docs-en/source/5.application-development/3.connector/9.pulsar.md @@ -1,6 +1,6 @@ # Pulsar Connector Introduction GeaFlow supports reading data from Pulsar and writing data to Pulsar. The currently supported Pulsar version is 2.8.1. -# Syntax +## Syntax ```sql CREATE TABLE pulsar_table ( id BIGINT, @@ -14,7 +14,7 @@ CREATE TABLE pulsar_table ( `geaflow.dsl.pulsar.subscriptionInitialPosition` = 'latest' ) ``` -# Options +## Options | Key | Required | Description | | -------- | -------- | -------- | @@ -24,7 +24,7 @@ CREATE TABLE pulsar_table ( | geaflow.dsl.pulsar.subscriptionInitialPosition | No | The initial position of consumer, default is 'latest'.| Note: Pulsar connector cannot specify a partition topic. If you want to consume messages for a certain partition, please select the sub topic name of the partition topic. -# Example1 +## Example1 Example 1 is from pulsar to `topic_read` data and write it to the `topic_write`. ```sql CREATE TABLE pulsar_source ( @@ -51,7 +51,7 @@ CREATE TABLE pulsar_sink ( INSERT INTO pulsar_sink SELECT * FROM pulsar_source; ``` -# Example2 +## Example2 Similarly, we can also perform a fourth degree loop detection. ```sql set geaflow.dsl.window.size = 1; diff --git a/docs/docs-en/source/5.application-development/3.connector/index.rst b/docs/docs-en/source/5.application-development/3.connector/index.rst new file mode 100644 index 000000000..7e696edc4 --- /dev/null +++ b/docs/docs-en/source/5.application-development/3.connector/index.rst @@ -0,0 +1,18 @@ +Connector +==== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + 1.common.md + 2.file.md + 3.console.md + 4.jdbc.md + 5.hive.md + 6.kafka.md + 7.hbase.md + 8.hudi.md + 9.pulsar.md + 10.udc.md \ No newline at end of file diff --git a/docs/docs-en/source/5.application-development/4.chat_guide.md b/docs/docs-en/source/5.application-development/4.chat_guide.md new file mode 100644 index 000000000..47473d72d --- /dev/null +++ b/docs/docs-en/source/5.application-development/4.chat_guide.md @@ -0,0 +1,21 @@ +# Text2GQL Syntax Manual +This manual enumerates common syntax elements of GQL along with reference prompts, enabling users to formulate GQL statements by referring to the provided example queries. + +| Syntax | Query Example | Result | +|--------------------------------------------------------------| --- |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Find Vertex | Locate vertices of type person | match(a:person) return a | +| Find Edge | Return all edges labeled as knows | match(a)-[e:knows]->(b) return e | +| Find Relationships | Query 10 universities located in Beijing
Identify 5 students related to Teacher Xiao Zhang | match(a:city where a.name = '北京')<-[:belong]-(b:university) return b limit 10
match(a:teacher where a.name='Xiao Zhang')-[e]-(b:student) return b limit 5 | +| Find Multi-Degree Relationships | Find people known by friends of Student Xiao Wang
Retrieve departments connected to universities, then students linked to those departments, and courses chosen by those students
Identify software co-created by Tencent and Google, return 5 results | match(a:student where a.name = 'Xiao Wang')-[e:friend]->(b)-[e2:knows]->(c:person) return c

match(a:university)-[e:has]->(b:department)-[e2:has]->(c:student)-[e3:selects]->(d:course) return d

match(a:company where a.name='Tencent')-[e:creates]->(b:software)<-[e2:creates]-(c:company where c.name='Google') return b.name limit 5 | +| Loop | From person Zhang Siqi, traverse through pay edges, reach vertices within 2 to 4 degrees | match(a:person where a.name='Zhang Siqi')-[e:pay]->{2,4}(b:person) return b | +| Loop | Identify 3-hop cycles involving persons who know Li Hong | match(a:person where name = 'Li Hong')-[e:knows]->{1,2}(b)->(a) return a.id, b.id as b_id
| +| Filter Criteria | Find people known by Xiaohong, aged over 20, earning more than 5000
Fetch 10 nodes not female, shorter than 160cm, or with an id greater than 5 | match(a:person where a.name='Xiaohong')-[e:knows]->(b:person where b.age > 20 and b.salary > 5000) return b

match(a where (a.gender <> 'female' and a.height < 160) or a.id > 5) return a limit 10 | +| Let Single Value | Query software created by Ant Group, set minPrice of software equal to its minimum price, return company id and software's minPrice | match(a:company where a.name = 'Ant Group')-[e:creates]->(b:software) let b.minPrice = MIN(b.price) return a.id, b.minPrice | +| Let Subquery | Find employees of Ant Group, set their countSalary equal to the sum of salaries of those who know them, then find the software they purchase
Identify the country that city id 10 belongs to, assign the average count of companies related to the country as avgCnt | match(a:company where a.name = 'Ant Group')-[e:employee]->(b:person) let b.countSalary = SUM((b:person)<-[e2:knows]-(c:person) => c.salary) match(b:person)-[e3:buy]->(d:software) return b.countSalary, d

match(a:city where id = '10')-[e:belong]->(b:country)<-[e2:belong]-(c:company) let b.avgCnt = AVG(c.peopleNumber) return b | +| Function Call | Invoke SSSP function with 'arg1', 10 as inputs, return id and distance | match(a:person) call sssp(a, 10) yield (id, distance) return id, distance | +| order | Return software created by companies, sorted by company scale descending and software price ascending | match(a:company)-[e:creates]->(b:software) return a.scale,b.price order by a.scale desc, b.price asc | +| group by | Find people known by Xiaohong, grouped by gender, return max salary
For Peking University affiliates, return the average count of people per company, grouped by company scale | match(a:person where person.name = 'Xiaohong')-[e:knows]->(b:person) return MAX(b.salary) group by b.gender

match(a:university where a.name='北京大学')-[e]-(b:company) return AVG(b.peopleNumber) group by b.scale | +| join | Find all people liked by Zheng Wei and all who know him, return together
Find schools related to person Alice, denote as X, further find companies and persons associated with X | match(a:person where a.name = 'Zheng Wei')-[e:likes]->(b:person),(a:person where a.name = 'Zheng Wei')<-[e2:knows]-(c:person) return a, b, c

match(a:person where a.name = 'alice')-[e]-(b:school), (b:school)-[e2]-(c:company),(b:school)-[e3]-(d:person) return a, b, c, d | +| Schema Query with Graph (Automatically appended in Console) | Using this graph schema:CREATE GRAPH g ( Vertex film ( id int ID, name varchar, category varchar, value int ), Vertex cinema ( id int ID, name varchar, address varchar, size int ), Vertex person ( id int ID, name varchar, age int, gender varchar, height int, salary int ), Vertex comment ( id int ID, name varchar, createTime bigint, wordCount int ), Vertex tag ( id int ID, name varchar, value int ), Edge person_likes_comment ( srcId int FROM person SOURCE ID, targetId int FROM comment DESTINATION ID, weight double, f0 int, f1 boolean ), Edge cinema_releases_film ( srcId int FROM cinema SOURCE ID, targetId int FROM film DESTINATION ID, weight double, f0 int, f1 boolean ), Edge person_watch_film ( srcId int FROM person SOURCE ID, targetId int FROM film DESTINATION ID, weight double, f0 int, f1 boolean, timeStamp bigint ), Edge film_has_tag ( srcId int FROM film SOURCE ID, targetId int FROM tag DESTINATION ID, weight double, f0 int, f1 boolean ), Edge person_creates_comment ( srcId int FROM person SOURCE ID, targetId int FROM comment DESTINATION ID, weight double, f0 int, f1 boolean, timeStamp bigint ), Edge comment_belong_film ( srcId int FROM comment SOURCE ID, targetId int FROM film DESTINATION ID, weight double, f0 int, f1 boolean ));Find comments created by Sun Mei and liked by Sun Jiancong, return all | match(a:person where a.name = 'Sun Mei')-[e:person_creates_comment]->(b:comment),(c:person where c.name = 'Sun Jiancong')-[e2:person_likes_comment]->(d:comment)return a, b, c, d | +| Multi-Query with Graph Schema | Using this graph schema:CREATE GRAPH g ( Vertex book ( id int ID, name varchar, id int ID, name varchar, category varchar, price int, wordCount int, createTime bigint ), Vertex publisher ( id int ID, name varchar, age int, gender varchar, height int, salary int ), Vertex reader ( id int ID, name varchar, age int, gender varchar, height int, salary int ), Vertex author ( id int ID, name varchar, age int, gender varchar, height int, salary int ), Edge author_write_book ( srcId int FROM author SOURCE ID, targetId int FROM book DESTINATION ID, weight double, f0 int, f1 boolean, timeStamp bigint ), Edge publisher_publish_book ( srcId int FROM publisher SOURCE ID, targetId int FROM book DESTINATION ID, weight double, f0 int, f1 boolean, timeStamp bigint ), Edge book_refers_book ( srcId int FROM book SOURCE ID, targetId int FROM book DESTINATION ID, weight double, f0 int, f1 boolean ), Edge reader_likes_book ( srcId int FROM reader SOURCE ID, targetId int FROM book DESTINATION ID, weight double, f0 int, f1 boolean ), Edge author_knows_author ( srcId int FROM author SOURCE ID, targetId int FROM author DESTINATION ID, weight double, f0 int, f1 boolean ));Execute 4 queries: 1. Writers known by Huang Jiacong; 2. Edges labeled author_knows_author; 3. IDs of books related to "Computer Networks"; 4. 152 books related to both He Xue and Zhang Jiancong | Queries:1: match(a:author)<-[e:author_knows_author]-(b:author where b.name='Huang Jiacong') return a, b;2: match(a:author)-[e:author_knows_author]->(b:author) return e;3: match(a:book where a.name='Computer Networks')-[e]-(b:book) return b.id;4: match(a where a.name='He Xue')-[e]->(b:book)<-[e2]-(c where c.name='Zhang Jiancong') return b limit 152; | + diff --git a/docs/docs-en/source/5.application-development/index.rst b/docs/docs-en/source/5.application-development/index.rst new file mode 100644 index 000000000..58f0927bd --- /dev/null +++ b/docs/docs-en/source/5.application-development/index.rst @@ -0,0 +1,12 @@ +Development +=========== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + 1.api/index.rst + 2.dsl/index.rst + 3.connector/index.rst + 4.chat_guide.md diff --git a/docs/docs-en/deploy/install_guide.md b/docs/docs-en/source/7.deploy/1.install_guide.md similarity index 80% rename from docs/docs-en/deploy/install_guide.md rename to docs/docs-en/source/7.deploy/1.install_guide.md index 2258025ee..dc3173f9b 100644 --- a/docs/docs-en/deploy/install_guide.md +++ b/docs/docs-en/source/7.deploy/1.install_guide.md @@ -1,40 +1,7 @@ -# Install Guide -## Prepare K8S environment +# Kubernetes Cluster Deployment +## Prepare K8S Environment -Here we will use minikube as an example to simulate a K8S cluster on a single machine. - -If you already have a K8S cluster, you can skip this part and use it directly. - -Download and install minikube. -``` -# ARM architecture -curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-darwin-arm64 -sudo install minikube-darwin-arm64 /usr/local/bin/minikube - -# x86 architecture -curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-darwin-amd64 -sudo install minikube-darwin-amd64 /usr/local/bin/minikube -``` - -Start minikube and dashboard. -``` -# Start minikube with docker as the driver -minikube start --driver=docker --ports=32761:32761 —image-mirror-country='cn' -# Starting minikube dashboard will automatically open the dashboard page in the browser. -# If it doesn't open, copy the dashboard address provided in the terminal and open it in the browser. -minikube dashboard -``` - -**Note:** -Do not to close the terminal process that the dashboard belongs to. For subsequent operations, -please start a new terminal. Otherwise, the API server process will exit. - -If you want to use GeaFlow on the local minikube environment, please make sure that minikube is running properly. GeaFlow engine image will be automatically built into the minikube environment. Otherwise, it will be built into the local Docker environment and you need to manually push it to the image repository for use. - -```shell -# confirm host、kubelet、apiserver is running -minikube status -``` +Here, we use minikube as an example to simulate a Kubernetes cluster on a single machine. If you already have a Kubernetes cluster set up, you may proceed directly to the next steps and skip this section. For instructions on installing minikube, refer to the [Installing Minikube](6.install_minikube.md) chapter. Create a geaflow service account, otherwise the program has no permission to create new K8S resources (only needed for the first time) @@ -137,7 +104,7 @@ geaflow-console:1.0 ## Register and Login The first registered user will be set as the administrator by default. Log in as an administrator and use the "One-click Installation" function to start system initialization. -![install_welcome](../../static/img/install_welcome_en.png) +![install_welcome](../../../static/img/install_welcome_en.png) ## System Initialization When the administrator logs into the GeaFlow system for the first time, the one-click installation process will be triggered to prepare the system for initialization. @@ -145,7 +112,7 @@ When the administrator logs into the GeaFlow system for the first time, the one- ### Cluster Configuration Configure the runtime cluster for GeaFlow jobs, and it is recommended to use Kubernetes. In local mode, the default proxy address is ${your.host.name}:8000. Please make sure that minikube has been started locally and the proxy address has been set. If you set the address of the K8S cluster, please ensure that the connectivity of the address is normal. -![install_cluster_config](../../static/img/install_cluster_config_en.png) +![install_cluster_config](../../../static/img/install_cluster_config_en.png) Add the following configuration for K8S cluster: ``` @@ -166,7 +133,7 @@ Configure the storage of HA metadata for GeaFlow job runtime. HA metadata includ Configure the storage of metric data for GeaFlow job runtime, which is used for job metric monitoring. It is recommended to use InfluxDB. -![undefined](../../static/img/install_meta_config_en.png) +![undefined](../../../static/img/install_meta_config_en.png) In local mode, when the docker container starts, MySQL, Redis, and InfluxDB services will be automatically pulled up by default. @@ -179,12 +146,12 @@ Add the following configuration for K8S cluster: ### Data Storage Configuration Configure the persistent storage of GeaFlow job, graph, and table data, and it is recommended to use HDFS. In local mode, the default is the disk inside the container. -![install_storage_config](../../static/img/install_storage_config_en.png) +![install_storage_config](../../../static/img/install_storage_config_en.png) ### File Storage Configuration Configure the persistent storage of GeaFlow engine JAR and user JAR files, and it is recommended to use HDFS. In local mode, the default is the disk inside the container. -![install_jar_config](../../static/img/install_jar_config_en.png) +![install_jar_config](../../../static/img/install_jar_config_en.png) After the installation is successful, **the administrator will automatically switch to the default instance under the default tenant**, and you can directly create and publish graph computation tasks at this time. @@ -197,28 +164,28 @@ The user icon menu in the upper right corner of the page provides a quick mode s ### System Mode When the user logs in as an administrator, they will enter the system mode. At this time, you can perform system operations such as one-click installation and system management. -![install_system_mode](../../static/img/install_system_mode_en.jpg) +![install_system_mode](../../../static/img/install_system_mode_en.jpg) In system mode, the administrator can manage information such as clusters, GeaFlow engine versions, files, users, tenants, etc. ### Tenant Mode After normal user login, they will enter tenant mode. At this time, they can perform graph computing development and maintenance operations. -![install_tenant_mode](../../static/img/install_tenant_mode_en.jpg) +![install_tenant_mode](../../../static/img/install_tenant_mode_en.jpg) In tenant mode, users can create development resources such as instances, graphs, tables, and graph computing tasks. They can also publish graph computing tasks, submit graph computing jobs, and perform other related operations. ## Task Management Add a graph computing task and describe the business logic of graph computing using SQL+GQL. -![install_task_manager](../../static/img/install_task_manager_en.png) +![install_task_manager](../../../static/img/install_task_manager_en.png) After creating a task, click on "publish" to enter the job operation and maintenance interface. ## Task Maintenance Before submitting a job, you can also adjust the default task parameters and cluster parameters to facilitate adjustments to job behavior. -![install_job_op](../../static/img/install_job_op_en.png) +![install_job_op](../../../static/img/install_job_op_en.png) Access other tabs on the job details page to view information about job runtime, metrics, containers, exceptions, logs, and more. diff --git a/docs/docs-en/quick_start_operator.md b/docs/docs-en/source/7.deploy/2.quick_start_operator.md similarity index 92% rename from docs/docs-en/quick_start_operator.md rename to docs/docs-en/source/7.deploy/2.quick_start_operator.md index 039249c2d..880c1572d 100644 --- a/docs/docs-en/quick_start_operator.md +++ b/docs/docs-en/source/7.deploy/2.quick_start_operator.md @@ -1,7 +1,7 @@ -# Quick Start(Running with Geaflow Kubernetes Operator) +# Deploying Kubernetes Operator ## Prepare -1. Install Docker and adjust Docker service resource settings (Dashboard-Settings-Resources), then start Docker service: +1. Download and install Docker and Minikube. Refer to the documentation: [Installing Minikube](6.install_minikube.md). 2. Pull GeaFlow Image @@ -72,23 +72,23 @@ tag** to use the correct image. cd tugraph-analytics/geaflow-kubernetes-operator/ helm install geaflow-kubernetes-operator helm/geaflow-kubernetes-operator ``` -![img.png](../static/img/helm_install_operator.png) +![img.png](../../../static/img/helm_install_operator.png) 5. Check whether the pod is running normally in the minikube dashboard -![img.png](../static/img/view_operator_pod.png) +![img.png](../../../static/img/view_operator_pod.png) 6. Proxy GeaFlow-Operator-Dashboard to the local port through port-forward (default port is 8089) Please replace **${operator-pod-name}** with the actual operator pod name. ```shell kubectl port-forward ${operator-pod-name} 8089:8089 ``` -![img.png](../static/img/port_forward_operator.png) +![img.png](../../../static/img/port_forward_operator.png) 7. Visit localhost:8089 with your browser to open the operator cluster page. -![img.png](../static/img/operator_dashboard.png) +![img.png](../../../static/img/operator_dashboard.png) -## Submit geaflow job by Geaflow Kubernetes Operator +## Submit Geaflow Job by Geaflow Kubernetes Operator After geaflow-kubernetes-operator is successfully deployed and run, you can write the job's yaml file and submit the job. First, we write a yaml file of geaflow's built-in sample job. @@ -173,7 +173,7 @@ cd tugraph-analysis/geaflow-kubernetes-operator/example kubectl apply example_hla.yml ``` -### Submit HLA jobs +### Submit HLA Jobs When submitting HLA jobs, you need to pay extra attention to the following parameters: * entryClass is required. * udfJars is optional. If you need, please fill in the url address of your own file. @@ -188,7 +188,7 @@ spec: url: http://localhost:8888/download/myUdf.jar ``` -### Submit DSL job +### Submit DSL Job When submitting DSL jobs, you need to pay extra attention to the following parameters: * Do not fill in entryClass, leave it blank. * gqlFile is required, please fill in the name and url address of your file. @@ -214,13 +214,13 @@ refer to the sample files in the project respectively: * example/example-dsl.yml * example/example-hla.yml -### Query job status +### Query Job Status #### Query by dashboard page of geaflow-kubernetes-operator Visit http://localhost:8089, you can open the cluster overview page to view the list and details of all geaflow job CRs in the cluster. -![img.png](../static/img/operator_dashboard_jobs.png) +![img.png](../../../static/img/operator_dashboard_jobs.png) -#### Query by command +#### Query by Command Run the following command to view the status of CR ```shell kubectl get geaflowjob geaflow-example diff --git a/docs/docs-en/dashboard.md b/docs/docs-en/source/7.deploy/3.dashboard.md similarity index 67% rename from docs/docs-en/dashboard.md rename to docs/docs-en/source/7.deploy/3.dashboard.md index 51782400d..5b7295430 100644 --- a/docs/docs-en/dashboard.md +++ b/docs/docs-en/source/7.deploy/3.dashboard.md @@ -1,4 +1,4 @@ -#GeaFlow Dashboard +# GeaFlow Dashboard ## Introduction Geaflow-dashboard provides a job-level monitoring page for Geaflow. You can easily view the following information of the job through the dashboard: * Job health (Container and Worker activity) @@ -8,17 +8,17 @@ Geaflow-dashboard provides a job-level monitoring page for Geaflow. You can easi * Flame graph of each component of the job * Thread Dump of each component of the job -## How to access the page +## Access The Page When the job is running in a k8s cluster, the HTTP service can be exposed externally through the master's service and can be accessed directly through the service. In the local or development environment, you can also directly map to the master pod port through the kubectl port-forward command. -### Take minikube as an example -1. Deploy the job to minikube. For how to deploy the job, please refer to [Quick Start](quick_start.md). +### Take Minikube as An Example +1. Deploy the job to minikube. For how to deploy the job, please refer to [Quick Start](../3.quick_start/1.quick_start.md). 2. Open minikube-dashboard and find the pod name of the master (or enter the following command in the terminal to obtain it). ```shell kubectl get pods ``` -![kubectl_get_pods.png](../static/img/kubectl_get_pods.png) +![kubectl_get_pods.png](../../../static/img/kubectl_get_pods.png) 3. Open the terminal and enter the following command to map the 8090 port in the pod container to the localhost's local port 8090. Please replace **${your-master-pod-name}** with your own pod name. ```shell @@ -31,71 +31,71 @@ kubectl port-forward ${your-master-pod-name} 8090:8090 The Overview page displays the health status of the entire job. You can check here whether the container and driver are running normally. In addition, the Overview page will also display the Pipeline list of the job. -![dashboard_overview.png](../static/img/dashboard_overview.png) +![dashboard_overview.png](../../../static/img/dashboard_overview.png) -### Pipeline list +### Pipeline List You can also enter the page through the Pipeline menu in the sidebar. The page includes the name, start time, and cost time of each Pipeline of the job. If the cost is 0, means that the Pipeline has started execution but has not yet completed. -![dashboard_pipelines.png](../static/img/dashboard_pipelines.png) +![dashboard_pipelines.png](../../../static/img/dashboard_pipelines.png) -### Cycle list +### Cycle List Click on the Pipeline name to enter the secondary menu and view information on all Cycle lists under the current Pipeline. -![dashboard_cycles.png](../static/img/dashboard_cycles.png) +![dashboard_cycles.png](../../../static/img/dashboard_cycles.png) -### Job component details +### Job Component Details You can view various information about each component of the job (including master, driver, and container). It can be accessed via the menu in the sidebar. The Driver details display the basic information of all drivers. Container details display the basic information of all Containers. -![dashboard_containers.png](../static/img/dashboard_containers.png) -![dashboard_drivers.png](../static/img/dashboard_drivers.png) +![dashboard_containers.png](../../../static/img/dashboard_containers.png) +![dashboard_drivers.png](../../../static/img/dashboard_drivers.png) -### Component runtime details +### Component Runtime Details By clicking the Master details in the left column, or by clicking the component name in the Driver/Container details, you can jump to the component's runtime page. In the runtime page, you can view the following contents. * View the process metrics of the component -![dashboard_runtime_metrics.png](../static/img/dashboard_runtime_metrics.png) +![dashboard_runtime_metrics.png](../../../static/img/dashboard_runtime_metrics.png) * View real-time logs of the component. Here we take the master as an example to introduce the log files. * master.log: Java main process log of master. * master.log.1/master.log.2: Java main process log backup of master. * agent.log: Agent service log of master. * geaflow.log: shell startup script log after entering the container. -![dashboard_runtime_logs.png](../static/img/dashboard_runtime_logs.png) -![dashboard_runtime_log_content.png](../static/img/dashboard_runtime_log_content.png) +![dashboard_runtime_logs.png](../../../static/img/dashboard_runtime_logs.png) +![dashboard_runtime_log_content.png](../../../static/img/dashboard_runtime_log_content.png) * Perform CPU/ALLOC analysis on the process and generate flame graph. The flame graph analysis type can be selected as CPU or ALLOC. A single analysis can last up to 60 seconds, and a maximum of 10 historical records can be retained. -![dashboard_runtime_profiler_execute.png](../static/img/dashboard_runtime_profiler_execute.png) -![dashboard_runtime_profiler_history.png](../static/img/dashboard_runtime_profiler_history.png) -![dashboard_runtime_profiler_content.png](../static/img/dashboard_runtime_profiler_content.png) +![dashboard_runtime_profiler_execute.png](../../../static/img/dashboard_runtime_profiler_execute.png) +![dashboard_runtime_profiler_history.png](../../../static/img/dashboard_runtime_profiler_history.png) +![dashboard_runtime_profiler_content.png](../../../static/img/dashboard_runtime_profiler_content.png) * Perform Thread Dump on the process. Keep the results of the latest dump. -![dashboard_runtime_thread_dump.png](../static/img/dashboard_runtime_thread_dump.png) -![dashboard_runtime_thread_dump_execute.png](../static/img/dashboard_runtime_thread_dump_execute.png) +![dashboard_runtime_thread_dump.png](../../../static/img/dashboard_runtime_thread_dump.png) +![dashboard_runtime_thread_dump_execute.png](../../../static/img/dashboard_runtime_thread_dump_execute.png) * View all configurations of the process (only master owns this page) -![dashboard_runtime_master_configuration.png](../static/img/dashboard_runtime_master_configuration.png) +![dashboard_runtime_master_configuration.png](../../../static/img/dashboard_runtime_master_configuration.png) -## Other functions -### List sorting and search +## Other Functions +### List Sorting and Search Partial list columns can be sorted and searched. When searching, click the "Search" icon, enter keywords, and click the "Search" button. When resetting, click the "Reset" button and the list will be refreshed. -![dashboard_table_search.png](../static/img/dashboard_table_search.png) +![dashboard_table_search.png](../../../static/img/dashboard_table_search.png) ### Localization -The page supports switching between Chinese and English. Click the "Text A" icon in the upper right corner to select the language.![dashboard_locale.png](../static/img/dashboard_locale.png) \ No newline at end of file +The page supports switching between Chinese and English. Click the "Text A" icon in the upper right corner to select the language.![dashboard_locale.png](../../../static/img/dashboard_locale.png) \ No newline at end of file diff --git a/docs/docs-en/visualization/collaborate_with_g6vp.md b/docs/docs-en/source/7.deploy/4.collaborate_with_g6vp.md similarity index 95% rename from docs/docs-en/visualization/collaborate_with_g6vp.md rename to docs/docs-en/source/7.deploy/4.collaborate_with_g6vp.md index cee400399..cb63e30d9 100644 --- a/docs/docs-en/visualization/collaborate_with_g6vp.md +++ b/docs/docs-en/source/7.deploy/4.collaborate_with_g6vp.md @@ -1,6 +1,6 @@ -# 🌈 [G6VP](https://github.com/antvis/g6vp) now supports visualization of flow graph jobs in collaboration with Tugraph! +# 🌈 [G6VP](https://github.com/antvis/g6vp) Graph Visualization -## Just 5 steps to present 🎊 +## Just 5 Steps to Present 🎊 ### 1. Start the GeaFlow calculating job and Socket service @@ -18,7 +18,7 @@ When the terminal outputs the following, Tugraph Analytics is ready to establish > If any problem occurs during service startup, see https://github.com/TuGraph-family/tugraph-analytics/issues/1 -### 2. Create a G6VP project +### 2. Create a G6VP Project Enter [New Canvas](https://insight.antv.antgroup.com/#/workbook/create), enter a workbook name. We will manually add the dot edge data later, so choose a case data set here, and use the **minimalist template** diff --git a/docs/docs-en/deploy/install_llm.md b/docs/docs-en/source/7.deploy/5.install_llm.md similarity index 96% rename from docs/docs-en/deploy/install_llm.md rename to docs/docs-en/source/7.deploy/5.install_llm.md index 3de386623..60d56ef39 100644 --- a/docs/docs-en/deploy/install_llm.md +++ b/docs/docs-en/source/7.deploy/5.install_llm.md @@ -1,10 +1,11 @@ +# LLM Local Deployment The users have the capability to locally deploy extensive models as a service. The complete process, encompassing downloading pre-trained models, deploying them as a service, and debugging, is described in the following steps. It is essential for the user's machine to have Docker installed and be granted access to the repository containing these large models. - ## Step 1: Download the model file + ## Step 1: Download the Model File The pre-trained large model file has been uploaded to the [Hugging Face repository](https://huggingface.co/tugraph/CodeLlama-7b-GQL-hf). Please proceed with downloading and locally unzipping the model file. ![hugging](../../static/img/llm_hugging_face.png) - ## Step 2: Prepare the Docker container environment + ## Step 2: Prepare the Docker Container Environment 1. Run the following command on the terminal to download the Docker image required for model servicing: ``` @@ -29,7 +30,7 @@ docker ps Here, we map the container's port 8000 to the local machine's port 8000, mount the directory where the local model (/home/huggingface) resides to the container's path (/opt/huggingface), and set the container name to my-model-container. -## Step 3: Model service deployment +## Step 3: Model Service Deployment 1. Model transformation ``` // Enter the container you just created diff --git a/docs/docs-en/deploy/install_minikube.md b/docs/docs-en/source/7.deploy/6.install_minikube.md similarity index 100% rename from docs/docs-en/deploy/install_minikube.md rename to docs/docs-en/source/7.deploy/6.install_minikube.md diff --git a/docs/docs-en/source/7.deploy/index.rst b/docs/docs-en/source/7.deploy/index.rst new file mode 100644 index 000000000..95fedbd96 --- /dev/null +++ b/docs/docs-en/source/7.deploy/index.rst @@ -0,0 +1,9 @@ +Deployment +==== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + * \ No newline at end of file diff --git a/docs/docs-en/contribution.md b/docs/docs-en/source/9.contribution.md similarity index 97% rename from docs/docs-en/contribution.md rename to docs/docs-en/source/9.contribution.md index eb20fc32b..5d24d2490 100644 --- a/docs/docs-en/contribution.md +++ b/docs/docs-en/source/9.contribution.md @@ -1,4 +1,4 @@ -# How To Contribute +# Contribution ## Code Contribution Process @@ -22,7 +22,7 @@ After local code development and testing are completed, you can submit a PR to t The community committer will provide feedback on code specifications, code logic, etc. on GitHub. You need to provide feedback and make modifications to the problems raised. After several rounds of feedback and modification, the community will eventually accept your PR and merge it into the master branch. -## First contribution +## First Contribution GeaFlow Project's issues provides a few simple issues for quick participation in community contributions. These issues are labeled **good first issues**, and you can choose the issues you are interested in to contribute. \ No newline at end of file diff --git a/docs/docs-en/source/conf.py b/docs/docs-en/source/conf.py new file mode 100644 index 000000000..0a3e26f5a --- /dev/null +++ b/docs/docs-en/source/conf.py @@ -0,0 +1,31 @@ +# Configuration file for the Sphinx documentation builder. +# +# For the full list of built-in configuration values, see the documentation: +# https://www.sphinx-doc.org/en/master/usage/configuration.html + +# -- Project information ----------------------------------------------------- + +import os, subprocess, sys, shlex +project = 'TuGraph' +copyright = '2023, Ant Group' +author = 'Ant Group' + +# -- General configuration --------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration + +extensions = ['myst_parser', + 'sphinx_panels', + 'sphinx.ext.autodoc', + 'sphinx.ext.napoleon', + 'sphinx.ext.viewcode'] + +templates_path = ['../../_templates'] +exclude_patterns = [] + + +# -- Options for HTML output ------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output + +html_theme = 'sphinx_rtd_theme' + +read_the_docs_build = os.environ.get('READTHEDOCS', None) == 'True' diff --git a/docs/docs-en/source/contacts.md b/docs/docs-en/source/contacts.md new file mode 100644 index 000000000..105c6d79a --- /dev/null +++ b/docs/docs-en/source/contacts.md @@ -0,0 +1,4 @@ +# Contacts +You can contact us through the following methods: + +![contacts](../../static/img/contacts.png) \ No newline at end of file diff --git a/docs/docs-en/source/index.rst b/docs/docs-en/source/index.rst new file mode 100644 index 000000000..3478bcaa4 --- /dev/null +++ b/docs/docs-en/source/index.rst @@ -0,0 +1,18 @@ +geaflow +===================== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + 1.guide.md + 2.introduction.md + 3.quick_start/index.rst + 4.concepts/index.rst + 5.application-development/index.rst + 7.deploy/index.rst + 9.contribution.md + contacts.md + thanks.md + reference/index.rst \ No newline at end of file diff --git a/docs/docs-en/source/reference/index.rst b/docs/docs-en/source/reference/index.rst new file mode 100644 index 000000000..a1a2a503d --- /dev/null +++ b/docs/docs-en/source/reference/index.rst @@ -0,0 +1,9 @@ +References +==== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + :glob: + + * \ No newline at end of file diff --git a/docs/docs-en/principle/vs_join.md b/docs/docs-en/source/reference/vs_join.md similarity index 96% rename from docs/docs-en/principle/vs_join.md rename to docs/docs-en/source/reference/vs_join.md index f01c014de..815b2b624 100644 --- a/docs/docs-en/principle/vs_join.md +++ b/docs/docs-en/source/reference/vs_join.md @@ -14,7 +14,7 @@ In both batch and streaming computational systems, Join operations involve a sig In the experiment shown in the figure below, we simulated scenarios of performing one-hop, two-hop, and three-hop relationship operations in sequence. It is clear that the more complex the multi-hop relationship calculation, the worse the performance of join operations in the relational model. In terms of total time, using graph-based Match calculations can save more than 90% of the time. -![total_time](../../static/img/vs_join_total_time_en.jpg) +![total_time](../../../static/img/vs_join_total_time_en.jpg)
Figure 1
### Pain point 2: Data redundancy and low timeliness @@ -33,7 +33,7 @@ Obviously, **the process of constructing a graph is essentially the extraction o Compared to the relationship materialization method of wide tables, due to the aggregation of points and edges in the graph structure itself, constructing a graph is very efficient. The figure below shows the performance of high-performance graph construction in GeaFlow, which demonstrates that the graph construction operation itself is extremely fast, and due to the sharding feature of graphs, it has excellent scalability. -![insert_throuput](../../static/img/insert_throuput_en.jpg) +![insert_throuput](../../../static/img/insert_throuput_en.jpg)
Figure 2
In the experiment shown in Figure 1, it can also be observed that we spent a small amount of time inserting the graph (the cost of the green "insert to graph" part) in exchange for the acceleration effect of the graph modeling on subsequent join queries. @@ -44,7 +44,7 @@ Analytical systems based on table modeling only support SQL join for relationshi GeaFlow provides a query language that combines GQL and SQL styles. This is a data analysis language that combines graphs and tables, inherited from standard SQL+ISO/GQL, and can easily perform graph and table analysis. -![code_style](../../static/img/code_style.jpg) +![code_style](../../../static/img/code_style.jpg)
Figure 3
**In the DSL, the results of graph computation and table queries are equivalent and can be processed like table data for relationship operations.** This means that both the GQL and SQL descriptions in Figure 3 can achieve similar effects, greatly enhancing the query expression capability of users。 @@ -57,7 +57,7 @@ GeaFlow (branded as TuGraph-Analytics) is an open-source distributed streaming g TuGraph-Analytics was officially open-sourced in June 2023, opening up its core capability of graph-based streaming and batch computing. Compared to traditional streaming computing engines such as Flink and Storm, which are based on table models for real-time processing, GeaFlow has a self-developed graph storage as its foundation and a streaming-batch computing engine as its weapon. It integrates GQL/SQL DSL language as its flag, and has significant advantages in complex multi-degree relationship operations. -![insert_throuput](../../static/img/query_throuput_en.jpg) +![insert_throuput](../../../static/img/query_throuput_en.jpg)
Figure 4
Figure 4 shows the use of the Match operator in GeaFlow to perform multi-hop relationship queries on a graph, compared to the real-time throughput improvement brought by the Join operator in Flink. In complex multi-hop scenarios, existing streaming computing engines are basically unable to handle real-time processing. The existence of the graph model breaks through this limitation and extends the application scenarios of real-time streaming computing. \ No newline at end of file diff --git a/docs/docs-en/source/thanks.md b/docs/docs-en/source/thanks.md new file mode 100644 index 000000000..37db8e972 --- /dev/null +++ b/docs/docs-en/source/thanks.md @@ -0,0 +1,2 @@ +# Thanks +Thank you very much for contributing to GeaFlow, whether bug reporting, documentation improvement, or major feature development, we warmly welcome all contributions. \ No newline at end of file diff --git a/docs/index.md b/docs/index.md deleted file mode 100644 index 273686c4e..000000000 --- a/docs/index.md +++ /dev/null @@ -1,7 +0,0 @@ -# GeaFlow English Document Guide - -{% -include-markdown "../README.md" -start="" -end="" -%} \ No newline at end of file diff --git a/docs/index_cn.md b/docs/index_cn.md deleted file mode 100644 index 29e93e038..000000000 --- a/docs/index_cn.md +++ /dev/null @@ -1,7 +0,0 @@ -# GeaFlow 中文文档导航页 - -{% -include-markdown "../README_cn.md" -start="" -end="" -%} diff --git a/docs/requirements.txt b/docs/requirements.txt index 013df645b..0f5564609 100644 --- a/docs/requirements.txt +++ b/docs/requirements.txt @@ -1,68 +1,5 @@ -# -# This file is autogenerated by pip-compile with python 3.10 -# To update, run: -# -# pip-compile docs/requirements.in -# -click==8.1.3 - # via mkdocs -ghp-import==2.1.0 - # via mkdocs -griffe==0.22.0 - # via mkdocstrings-python -importlib-metadata==4.12.0 - # via mkdocs -jinja2==3.1.2 - # via - # mkdocs - # mkdocstrings -markdown==3.3.7 - # via - # markdown-include - # mkdocs - # mkdocs-autorefs - # mkdocstrings - # pymdown-extensions -markdown-include==0.6.0 - # via -r docs/requirements.in -mkdocs-include-markdown-plugin==4.0.4 -mkdocs-bootswatch==1.1 -markupsafe==2.1.1 - # via - # jinja2 - # mkdocstrings -mergedeep==1.3.4 - # via mkdocs -mkdocs==1.3.0 - # via - # -r docs/requirements.in - # mkdocs-autorefs - # mkdocstrings -mkdocs-autorefs==0.4.1 - # via mkdocstrings -mkdocstrings[python]==0.19.0 - # via - # -r docs/requirements.in - # mkdocstrings-python -mkdocstrings-python==0.7.1 - # via mkdocstrings -packaging==21.3 - # via mkdocs -pymdown-extensions==9.5 - # via mkdocstrings -pyparsing==3.0.9 - # via packaging -python-dateutil==2.8.2 - # via ghp-import -pyyaml==6.0 - # via - # mkdocs - # pyyaml-env-tag -pyyaml-env-tag==0.1 - # via mkdocs -six==1.16.0 - # via python-dateutil -watchdog==2.1.9 - # via mkdocs -zipp==3.8.0 - # via importlib-metadata \ No newline at end of file +sphinx +myst-parser +sphinx-panels +sphinx-rtd-theme +breathe diff --git a/mkdocs.yml b/mkdocs.yml deleted file mode 100644 index 352881a83..000000000 --- a/mkdocs.yml +++ /dev/null @@ -1,143 +0,0 @@ -site_name: GeaFlow Docs - -theme: - name: cerulean - nav_style: dark - -plugins: - - include-markdown - -nav: - - 'index_cn.md' - - 中文文档: - - 'docs-cn/introduction.md' - - 'docs-cn/quick_start.md' - - 'docs-cn/quick_start_docker.md' - - 概念: - - 'docs-cn/concepts/glossary.md' - - 'docs-cn/concepts/graph_view.md' - - 'docs-cn/concepts/stream_graph.md' - - GeaFlow应用开发: - - API: - - 'docs-cn/application-development/api/overview.md' - - 'docs-cn/application-development/api/guid.md' - - 图: - - 'docs-cn/application-development/api/graph/compute.md' - - 'docs-cn/application-development/api/graph/traversal.md' - - 流: - - 'docs-cn/application-development/api/stream/process.md' - - 'docs-cn/application-development/api/stream/sink.md' - - 'docs-cn/application-development/api/stream/source.md' - - DSL: - - 'docs-cn/application-development/dsl/overview.md' - - 内置函数: - - 'docs-cn/application-development/dsl/build-in/aggregate.md' - - 'docs-cn/application-development/dsl/build-in/condition.md' - - 'docs-cn/application-development/dsl/build-in/date.md' - - 'docs-cn/application-development/dsl/build-in/logical.md' - - 'docs-cn/application-development/dsl/build-in/math.md' - - 'docs-cn/application-development/dsl/build-in/string.md' - - 'docs-cn/application-development/dsl/build-in/table.md' - - 连接器(Connector): - - 'docs-cn/application-development/dsl/connector/common.md' - - 'docs-cn/application-development/dsl/connector/file.md' - - 'docs-cn/application-development/dsl/connector/console.md' - - 'docs-cn/application-development/dsl/connector/jdbc.md' - - 'docs-cn/application-development/dsl/connector/hive.md' - - 'docs-cn/application-development/dsl/connector/kafka.md' - - 'docs-cn/application-development/dsl/connector/hbase.md' - - 'docs-cn/application-development/dsl/connector/hudi.md' - - 'docs-cn/application-development/dsl/connector/udc.md' - - 语法文档: - - 'docs-cn/application-development/dsl/reference/ddl.md' - - 'docs-cn/application-development/dsl/reference/dml.md' - - DQL: - - 'docs-cn/application-development/dsl/reference/dql/match.md' - - 'docs-cn/application-development/dsl/reference/dql/select.md' - - 'docs-cn/application-development/dsl/reference/dql/union.md' - - 'docs-cn/application-development/dsl/reference/dql/with.md' - - 'docs-cn/application-development/dsl/reference/use.md' - - UDF: - - 'docs-cn/application-development/dsl/udf/udaf.md' - - 'docs-cn/application-development/dsl/udf/udf.md' - - 'docs-cn/application-development/dsl/udf/udga.md' - - 'docs-cn/application-development/dsl/udf/udtf.md' - - 部署: - - 'docs-cn/deploy/install_guide.md' - - 原理: - - 'docs-cn/principle/dsl_principle.md' - - 'docs-cn/principle/framework_principle.md' - - 'docs-cn/principle/state_principle.md' - - 'docs-cn/principle/vs_join.md' - - './docs-cn/contribution.md' - - - 'index.md' - - English Document: - - 'docs-en/introduction.md' - - 'docs-en/quick_start.md' - - 'docs-en/quick_start_docker.md' - - Concepts: - - 'docs-en/concepts/glossary.md' - - 'docs-en/concepts/graph_view.md' - - 'docs-en/concepts/stream_graph.md' - - Application-Development: - - API: - - 'docs-en/application-development/api/overview.md' - - 'docs-en/application-development/api/guid.md' - - Graph: - - 'docs-en/application-development/api/graph/compute.md' - - 'docs-en/application-development/api/graph/traversal.md' - - Stream: - - 'docs-en/application-development/api/stream/process.md' - - 'docs-en/application-development/api/stream/sink.md' - - 'docs-en/application-development/api/stream/source.md' - - DSL: - - 'docs-en/application-development/dsl/overview.md' - - Build-In: - - 'docs-en/application-development/dsl/build-in/aggregate.md' - - 'docs-en/application-development/dsl/build-in/condition.md' - - 'docs-en/application-development/dsl/build-in/date.md' - - 'docs-en/application-development/dsl/build-in/logical.md' - - 'docs-en/application-development/dsl/build-in/math.md' - - 'docs-en/application-development/dsl/build-in/string.md' - - 'docs-en/application-development/dsl/build-in/table.md' - - Connector: - - 'docs-en/application-development/dsl/connector/common.md' - - 'docs-en/application-development/dsl/connector/file.md' - - 'docs-en/application-development/dsl/connector/console.md' - - 'docs-en/application-development/dsl/connector/jdbc.md' - - 'docs-en/application-development/dsl/connector/hive.md' - - 'docs-en/application-development/dsl/connector/kafka.md' - - 'docs-en/application-development/dsl/connector/hbase.md' - - 'docs-en/application-development/dsl/connector/hudi.md' - - 'docs-en/application-development/dsl/connector/udc.md' - - Reference: - - 'docs-en/application-development/dsl/reference/ddl.md' - - 'docs-en/application-development/dsl/reference/dml.md' - - DQL: - - 'docs-en/application-development/dsl/reference/dql/match.md' - - 'docs-en/application-development/dsl/reference/dql/select.md' - - 'docs-en/application-development/dsl/reference/dql/union.md' - - 'docs-en/application-development/dsl/reference/dql/with.md' - - 'docs-en/application-development/dsl/reference/use.md' - - UDF: - - 'docs-en/application-development/dsl/udf/udaf.md' - - 'docs-en/application-development/dsl/udf/udf.md' - - 'docs-en/application-development/dsl/udf/udga.md' - - 'docs-en/application-development/dsl/udf/udtf.md' - - Deploy: - - 'docs-en/deploy/install_guide.md' - - Principle: - - 'docs-en/principle/dsl_principle.md' - - 'docs-en/principle/framework_principle.md' - - 'docs-en/principle/state_principle.md' - - 'docs-en/principle/vs_join.md' - - './docs-en/contribution.md' - -markdown_extensions: - - toc: - toc_depth: 4 - -python_version: 3.8 # use Python 3.8 -jinja2: - version: 3.0.0 # use Jinja2 3.0.0 \ No newline at end of file