From b649e61ca4a0f7305a2591d5612245b6de0b0d63 Mon Sep 17 00:00:00 2001 From: randomJoe211 <69501902+randomJoe211@users.noreply.github.com> Date: Tue, 9 Nov 2021 14:28:47 +0800 Subject: [PATCH 1/3] Add docs for Exchange Enterprise edition --- .../about-exchange/ex-ug-limitations.md | 4 +- .../about-exchange/ex-ug-what-is-exchange.md | 15 +- docs-2.0/nebula-exchange/ex-ug-compile.md | 24 ++- .../parameter-reference/ex-ug-parameter.md | 20 +++ .../use-exchange/ex-ug-export-from-nebula.md | 144 ++++++++++++++++++ mkdocs.yml | 3 +- 6 files changed, 200 insertions(+), 10 deletions(-) create mode 100644 docs-2.0/nebula-exchange/use-exchange/ex-ug-export-from-nebula.md diff --git a/docs-2.0/nebula-exchange/about-exchange/ex-ug-limitations.md b/docs-2.0/nebula-exchange/about-exchange/ex-ug-limitations.md index 61567681988..57b2d93282f 100644 --- a/docs-2.0/nebula-exchange/about-exchange/ex-ug-limitations.md +++ b/docs-2.0/nebula-exchange/about-exchange/ex-ug-limitations.md @@ -2,9 +2,9 @@ 本文描述Exchange 2.x的一些使用限制。 -## Nebula Graph版本 +## 版本兼容性 -Nebula Exchange版本(即JAR包版本)和Nebula Graph的版本对应关系如下。 +Nebula Exchange版本(即JAR包版本)和Nebula Graph内核的版本对应关系如下。 |Exchange client版本|Nebula Graph版本| |:---|:---| diff --git a/docs-2.0/nebula-exchange/about-exchange/ex-ug-what-is-exchange.md b/docs-2.0/nebula-exchange/about-exchange/ex-ug-what-is-exchange.md index b4df90d667b..098350f8251 100644 --- a/docs-2.0/nebula-exchange/about-exchange/ex-ug-what-is-exchange.md +++ b/docs-2.0/nebula-exchange/about-exchange/ex-ug-what-is-exchange.md @@ -6,6 +6,10 @@ Exchange由Reader、Processor和Writer三部分组成。Reader读取不同来源 ![Nebula Graph® Exchange 由 Reader、Processor、Writer 组成,可以完成多种不同格式和来源的数据向 Nebula Graph 的迁移](../figs/ex-ug-003.png "Nebula Graph® Exchange 转数据转换和迁移的过程") +## 版本系列 + +Exchange有社区版和企业版两个系列。社区版在[GitHub](https://github.com/vesoft-inc/nebula-exchange)开源开发,企业版除了支持社区版的功能,还增加了额外的特性,详情参见[版本对比](https://nebula-graph.com.cn/pricing/)。 + ## 适用场景 Exchange适用于以下场景: @@ -16,6 +20,11 @@ Exchange适用于以下场景: - 需要将大批量数据生成Nebula Graph能识别的SST文件,再导入Nebula Graph数据库。 +- 需要导出Nebula Graph中保存的数据。 + + !!! enterpriseonly + 仅企业版Exchange支持从Nebula Graph中导出数据。 + ## 产品优点 Exchange具有以下优点: @@ -40,7 +49,7 @@ Exchange具有以下优点: ## 数据源 -Exchange {{exchange.release}} 支持将以下格式或来源的数据转换为Nebula Graph能识别的点和边数据,然后通过**nGQL**语句的形式导入Nebula Graph: +Exchange {{exchange.release}} 支持将以下格式或来源的数据转换为Nebula Graph能识别的点和边数据,然后通过nGQL语句的形式导入Nebula Graph: - 存储在HDFS或本地的数据: - [Apache Parquet](../use-exchange/ex-ug-import-from-parquet.md) @@ -65,7 +74,9 @@ Exchange {{exchange.release}} 支持将以下格式或来源的数据转换为Ne - 发布/订阅消息平台:[Apache Pulsar 2.4.5](../use-exchange/ex-ug-import-from-pulsar.md) -除了用nGQL语句的形式导入数据,Exchange还支持将数据源的数据生成**SST文件**,然后通过Console[导入SST文件](../use-exchange/ex-ug-import-from-sst.md)。 +除了用nGQL语句的形式导入数据,Exchange还支持将数据源的数据生成SST文件,然后通过Console[导入SST文件](../use-exchange/ex-ug-import-from-sst.md)。 + +此外,企业版Exchange还支持以Nebula Graph为源,将数据[导出到CSV文件](../use-exchange/ex-ug-export-from-nebula.md)。 ## 视频 diff --git a/docs-2.0/nebula-exchange/ex-ug-compile.md b/docs-2.0/nebula-exchange/ex-ug-compile.md index 2abfc782d0f..eb8ae9e88bf 100644 --- a/docs-2.0/nebula-exchange/ex-ug-compile.md +++ b/docs-2.0/nebula-exchange/ex-ug-compile.md @@ -1,15 +1,29 @@ -# 编译Exchange +# 获取Nebula Exchange -本文介绍如何编译Nebula Exchange。用户也可以直接[下载](https://repo1.maven.org/maven2/com/vesoft/nebula-exchange/)编译完成的`.jar`文件。 +本文介绍如何获取Nebula Exchange的JAR文件。 -## 准备工作 +## 直接下载JAR文件 + +社区版Exchange的JAR文件可以直接[下载](https://repo1.maven.org/maven2/com/vesoft/nebula-exchange/)。 + +要下载企业版Exchange,需先[获取Nebula Graph企业版套餐](https://nebula-graph.com.cn/pricing/)。 + +## 编译源代码获取JAR文件 + +社区版Exchange的JAR文件还可以通过编译源代码获取。下文介绍如何编译Exchange源代码。 + +!!! enterpriseonly + + 企业版Exchange仅能在Nebula Graph企业版套餐中获取。 + +### 前提条件 - 安装[Maven](https://maven.apache.org/download.cgi)。 - 下载[pulsar-spark-connector_2.11](https://oss-cdn.nebula-graph.com.cn/jar-packages/pulsar-spark-connector_2.11.zip),解压到本地Maven库的目录`io/streamnative/connectors`中。 -## 编译Exchange +### 操作步骤 1. 在根目录克隆仓库`nebula-exchange`。 @@ -57,7 +71,7 @@ 迁移数据时,用户可以参考配置文件[`target/classes/application.conf`](https://github.com/vesoft-inc/nebula-exchange/blob/master/nebula-exchange/src/main/resources/application.conf)。 -## 下载依赖包失败 +### 下载依赖包失败 如果编译时下载依赖包失败: diff --git a/docs-2.0/nebula-exchange/parameter-reference/ex-ug-parameter.md b/docs-2.0/nebula-exchange/parameter-reference/ex-ug-parameter.md index 7a3e81099b9..f8da3fa9137 100644 --- a/docs-2.0/nebula-exchange/parameter-reference/ex-ug-parameter.md +++ b/docs-2.0/nebula-exchange/parameter-reference/ex-ug-parameter.md @@ -175,6 +175,18 @@ |:---|:---|:---|:---|:---| |`tags.path`|string|-|是|指定需要生成SST文件的源文件的路径。| +### Nebula Graph源特有参数 + +!!! enterpriseonly + + Nebula Graph源特有参数用于导出Nebula Graph数据,仅企业版Exchange支持。 + +|参数|数据类型|默认值|是否必须|说明| +|:---|:---|:---|:---|:---| +|`tags.path`|string|`"hdfs://namenode:9000/path/vertex"`|是|指定CSV文件的存储路径。设置的路径必须不存在,Exchange会自动创建该路径。存储到HDFS服务器时路径格式同默认值,例如`"hdfs://192.168.8.177:9000/vertex/player"`。存储到本地时路径格式为`"file:///path/vertex"`,例如`"file:///home/nebula/vertex/player"`。有多个Tag时必须为每个Tag设置不同的目录。| +|`tags.noField`|bool|`false`|是|当值为`true`时,仅导出VID而不导出属性数据;当值为`false`时导出VID和属性数据。| +|`tags.return.fields`|list|`[]`|是|指定要导出的属性。例如,要导出`name`和`age`属性,需将参数值设置为`["name","age"]`。该参数仅在`tags.noField`的值为`false`时生效。| + ## 边配置 对于不同的数据源,边的配置也有所不同,有很多通用参数,也有部分特有参数,配置时需要配置通用参数和不同数据源的特有参数。 @@ -195,3 +207,11 @@ |`edges.ranking`|int|-|否|rank值的列。没有指定时,默认所有rank值为`0`。| |`edges.batch`|int|`256`|是|单批次写入Nebula Graph的最大边数量。| |`edges.partition`|int|`32`|是|Spark分片数量。| + +### Nebula Graph源特有参数 + +|参数|数据类型|默认值|是否必须|说明| +|:---|:---|:---|:---|:---| +|`edges.path`|string|`"hdfs://namenode:9000/path/edge"`|是|指定CSV文件的存储路径。设置的路径必须尚不存在,Exchange会自动创建该路径。存储到HDFS服务器时路径格式同默认值,例如`"hdfs://192.168.8.177:9000/edge/follow"`。存储到本地时路径格式为`"file:///path/edge"`,例如`"file:///home/nebula/edge/follow"`。有多个Edge时必须为每个Edge设置不同的目录。| +|`edges.noField`|bool|`false`|是|当值为`true`时,仅导出起始点VID、目的点VID和Rank,而不导出属性数据;当值为`false`时导出起始点VID、目的点VID、Rank和属性数据。| +|`edges.return.fields`|list|`[]`|是|指定要导出的属性。例如,要导出`start_year`和`end_year`属性,需将参数值设置为`["start_year","end_year"]`。该参数仅在`edges.noField`的值为`false`时生效。| diff --git a/docs-2.0/nebula-exchange/use-exchange/ex-ug-export-from-nebula.md b/docs-2.0/nebula-exchange/use-exchange/ex-ug-export-from-nebula.md new file mode 100644 index 00000000000..300d5d88d5e --- /dev/null +++ b/docs-2.0/nebula-exchange/use-exchange/ex-ug-export-from-nebula.md @@ -0,0 +1,144 @@ +# 导出Nebula Graph数据 + +本文以一个示例说明如何使用Exchange将Nebula Graph中的数据导出到CSV文件中。 + +!!! enterpriseonly + + 仅企业版Exchange支持导出Nebula Graph数据到CSV文件。 + +## 环境准备 + +本示例在Linux系统的虚拟机环境下完成,导出数据前准备的软硬件信息如下。 + +### 硬件 + +| 类型 | 信息 | +| - | - | +| CPU | 4 Intel(R) Xeon(R) Platinum 8260 CPU @ 2.30GHz | +| 内存 | 16G | +| 硬盘 | 50G | + +### 系统 + +CentOS 7.9.2009 + +### 软件 + +| 名称 | 版本 | +| - | - | +| JDK | 1.8.0 | +| Hadoop | 2.10.1 | +| Scala | 2.12.11 | +| Spark | 2.4.7 | +| Nebula Graph | {{nebula.release}} | + +### 数据集 + +在本示例中,作为数据源的Nebula Graph存储着[basketballplayer数据集](https://docs.nebula-graph.io/2.0/basketballplayer-2.X.ngql),其中的Schema要素如下表所示。 + +| 要素 | 名称 | 属性 | +| :--- | :--- | :--- | +| Tag | `player` | `name string, age int` | +| Tag | `team` | `name string` | +| Edge type | `follow` | `degree int` | +| Edge type | `serve` | `start_year int, end_year int` | + +## 操作步骤 + +1. 从[Nebula Graph企业版套餐](https://nebula-graph.com.cn/pricing/)中获取企业版Exchange的JAR文件。 + +2. 修改配置文件。 + + 企业版Exchange提供了导出Nebula Graph数据专用的配置文件模板`export_application.conf`,其中各配置项的说明参见[Exchange配置](../ parameter-reference/ex-ug-parameter.md)。本示例使用的配置文件核心内容如下: + + ```conf + ... + + # Processing tags + # There are tag config examples for different dataSources. + tags: [ + # export NebulaGraph tag data to csv, only support export to CSV for now. + { + name: player + type: { + source: Nebula + sink: CSV + } + # the path to save the NebulaGrpah data, make sure the path doesn't exist. + path:"hdfs://192.168.8.177:9000/vertex/player" + # if no need to export any properties when export NebulaGraph tag data + # if noField is configured true, just export vertexId + noField:false + # define to export what properties when export NebulaGraph tag data + # if return.fields is configured as empty list, then export all properties + return.fields:[] + # nebula space partition number + partition:10 + } + + ... + + ] + + # Processing edges + # There are edge config examples for different dataSources. + edges: [ + # export NebulaGraph tag data to csv, only support export to CSV for now. + { + name: follow + type: { + source: Nebula + sink: CSV + } + # the path to save the NebulaGrpah data, make sure the path doesn't exist. + path:"hdfs://192.168.8.177:9000/edge/follow" + # if no need to export any properties when export NebulaGraph edge data + # if noField is configured true, just export src,dst,rank + noField:false + # define to export what properties when export NebulaGraph edfe data + # if return.fields is configured as empty list, then export all properties + return.fields:[] + # nebula space partition number + partition:10 + } + + ... + + ] + } + ``` + +3. 使用如下命令导出Nebula Graph中的数据。 + + ```bash + /bin/spark-submit --master "local" --class com.vesoft.nebula.exchange.Exchange nebula-exchange-x.y.z.jar_path> -c + ``` + + 本示例使用的导出命令如下。 + + ```bash + $ ./spark-submit --master "local" --class com.vesoft.nebula.exchange.Exchange \ + ~/exchange-ent/nebula-exchange-ent-{{exchange.release}}.jar -c ~/exchange-ent/export_application.conf + ``` + +4. 检查导出的数据。 + + 1. 查看目标路径下是否成功生成了CSV文件。 + + ```bash + $ hadoop fs -ls /vertex/player + Found 11 items + -rw-r--r-- 3 nebula supergroup 0 2021-11-05 07:36 /vertex/player/_SUCCESS + -rw-r--r-- 3 nebula supergroup 160 2021-11-05 07:36 /vertex/player/ part-00000-17293020-ba2e-4243-b834-34495c0536b3-c000.csv + -rw-r--r-- 3 nebula supergroup 163 2021-11-05 07:36 /vertex/player/ part-00001-17293020-ba2e-4243-b834-34495c0536b3-c000.csv + -rw-r--r-- 3 nebula supergroup 172 2021-11-05 07:36 /vertex/player/ part-00002-17293020-ba2e-4243-b834-34495c0536b3-c000.csv + -rw-r--r-- 3 nebula supergroup 172 2021-11-05 07:36 /vertex/player/ part-00003-17293020-ba2e-4243-b834-34495c0536b3-c000.csv + -rw-r--r-- 3 nebula supergroup 144 2021-11-05 07:36 /vertex/player/ part-00004-17293020-ba2e-4243-b834-34495c0536b3-c000.csv + -rw-r--r-- 3 nebula supergroup 173 2021-11-05 07:36 /vertex/player/ part-00005-17293020-ba2e-4243-b834-34495c0536b3-c000.csv + -rw-r--r-- 3 nebula supergroup 160 2021-11-05 07:36 /vertex/player/ part-00006-17293020-ba2e-4243-b834-34495c0536b3-c000.csv + -rw-r--r-- 3 nebula supergroup 148 2021-11-05 07:36 /vertex/player/ part-00007-17293020-ba2e-4243-b834-34495c0536b3-c000.csv + -rw-r--r-- 3 nebula supergroup 125 2021-11-05 07:36 /vertex/player/ part-00008-17293020-ba2e-4243-b834-34495c0536b3-c000.csv + -rw-r--r-- 3 nebula supergroup 119 2021-11-05 07:36 /vertex/player/ part-00009-17293020-ba2e-4243-b834-34495c0536b3-c000.csv + ``` + + 2. 检查CSV文件内容,确定数据导出成功。 diff --git a/mkdocs.yml b/mkdocs.yml index 8c309babdd7..29b50b655cf 100755 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -412,7 +412,7 @@ nav: - 认识Nebula Exchange: - 什么是Nebula Exchange: nebula-exchange/about-exchange/ex-ug-what-is-exchange.md - 使用限制: nebula-exchange/about-exchange/ex-ug-limitations.md - - 编译Exchange: nebula-exchange/ex-ug-compile.md + - 获取Nebula Exchange: nebula-exchange/ex-ug-compile.md - 参数说明: - 导入命令参数: nebula-exchange/parameter-reference/ex-ug-para-import-command.md - 配置参数: nebula-exchange/parameter-reference/ex-ug-parameter.md @@ -430,6 +430,7 @@ nav: - 导入Pulsar数据: nebula-exchange/use-exchange/ex-ug-import-from-pulsar.md - 导入Kafka数据: nebula-exchange/use-exchange/ex-ug-import-from-kafka.md - 导入SST文件数据: nebula-exchange/use-exchange/ex-ug-import-from-sst.md + - 导出Nebula Graph数据: nebula-exchange/use-exchange/ex-ug-export-from-nebula.md - Exchange 常见问题: nebula-exchange/ex-ug-FAQ.md # - Nebula Operator: From bbb1635debf7b1af40d17320de44f830e56fa906 Mon Sep 17 00:00:00 2001 From: randomJoe211 <69501902+randomJoe211@users.noreply.github.com> Date: Wed, 10 Nov 2021 16:16:13 +0800 Subject: [PATCH 2/3] Update ex-ug-parameter.md --- docs-2.0/nebula-exchange/parameter-reference/ex-ug-parameter.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs-2.0/nebula-exchange/parameter-reference/ex-ug-parameter.md b/docs-2.0/nebula-exchange/parameter-reference/ex-ug-parameter.md index f8da3fa9137..2da3f076b9a 100644 --- a/docs-2.0/nebula-exchange/parameter-reference/ex-ug-parameter.md +++ b/docs-2.0/nebula-exchange/parameter-reference/ex-ug-parameter.md @@ -212,6 +212,6 @@ |参数|数据类型|默认值|是否必须|说明| |:---|:---|:---|:---|:---| -|`edges.path`|string|`"hdfs://namenode:9000/path/edge"`|是|指定CSV文件的存储路径。设置的路径必须尚不存在,Exchange会自动创建该路径。存储到HDFS服务器时路径格式同默认值,例如`"hdfs://192.168.8.177:9000/edge/follow"`。存储到本地时路径格式为`"file:///path/edge"`,例如`"file:///home/nebula/edge/follow"`。有多个Edge时必须为每个Edge设置不同的目录。| +|`edges.path`|string|`"hdfs://namenode:9000/path/edge"`|是|指定CSV文件的存储路径。设置的路径必须不存在,Exchange会自动创建该路径。存储到HDFS服务器时路径格式同默认值,例如`"hdfs://192.168.8.177:9000/edge/follow"`。存储到本地时路径格式为`"file:///path/edge"`,例如`"file:///home/nebula/edge/follow"`。有多个Edge时必须为每个Edge设置不同的目录。| |`edges.noField`|bool|`false`|是|当值为`true`时,仅导出起始点VID、目的点VID和Rank,而不导出属性数据;当值为`false`时导出起始点VID、目的点VID、Rank和属性数据。| |`edges.return.fields`|list|`[]`|是|指定要导出的属性。例如,要导出`start_year`和`end_year`属性,需将参数值设置为`["start_year","end_year"]`。该参数仅在`edges.noField`的值为`false`时生效。| From 73276c063428b2b36acad2b5012cb82ee00de65d Mon Sep 17 00:00:00 2001 From: randomJoe211 <69501902+randomJoe211@users.noreply.github.com> Date: Wed, 10 Nov 2021 16:19:24 +0800 Subject: [PATCH 3/3] Update ex-ug-export-from-nebula.md --- .../nebula-exchange/use-exchange/ex-ug-export-from-nebula.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs-2.0/nebula-exchange/use-exchange/ex-ug-export-from-nebula.md b/docs-2.0/nebula-exchange/use-exchange/ex-ug-export-from-nebula.md index 300d5d88d5e..aed63d96ea0 100644 --- a/docs-2.0/nebula-exchange/use-exchange/ex-ug-export-from-nebula.md +++ b/docs-2.0/nebula-exchange/use-exchange/ex-ug-export-from-nebula.md @@ -69,7 +69,7 @@ CentOS 7.9.2009 # if no need to export any properties when export NebulaGraph tag data # if noField is configured true, just export vertexId noField:false - # define to export what properties when export NebulaGraph tag data + # define properties to export from NebulaGraph tag data # if return.fields is configured as empty list, then export all properties return.fields:[] # nebula space partition number @@ -95,7 +95,7 @@ CentOS 7.9.2009 # if no need to export any properties when export NebulaGraph edge data # if noField is configured true, just export src,dst,rank noField:false - # define to export what properties when export NebulaGraph edfe data + # define properties to export from NebulaGraph edge data # if return.fields is configured as empty list, then export all properties return.fields:[] # nebula space partition number