Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add exchange ent doc #1199

Merged
merged 3 commits into from
Nov 10, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs-2.0/nebula-exchange/about-exchange/ex-ug-limitations.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

本文描述Exchange 2.x的一些使用限制。

## Nebula Graph版本
## 版本兼容性

Nebula Exchange版本(即JAR包版本)和Nebula Graph的版本对应关系如下
Nebula Exchange版本(即JAR包版本)和Nebula Graph内核的版本对应关系如下

|Exchange client版本|Nebula Graph版本|
|:---|:---|
Expand Down
15 changes: 13 additions & 2 deletions docs-2.0/nebula-exchange/about-exchange/ex-ug-what-is-exchange.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ Exchange由Reader、Processor和Writer三部分组成。Reader读取不同来源

![Nebula Graph® Exchange 由 Reader、Processor、Writer 组成,可以完成多种不同格式和来源的数据向 Nebula Graph 的迁移](../figs/ex-ug-003.png "Nebula Graph® Exchange 转数据转换和迁移的过程")

## 版本系列

Exchange有社区版和企业版两个系列。社区版在[GitHub](https://github.com/vesoft-inc/nebula-exchange)开源开发,企业版除了支持社区版的功能,还增加了额外的特性,详情参见[版本对比](https://nebula-graph.com.cn/pricing/)。

## 适用场景

Exchange适用于以下场景:
Expand All @@ -16,6 +20,11 @@ Exchange适用于以下场景:

- 需要将大批量数据生成Nebula Graph能识别的SST文件,再导入Nebula Graph数据库。

- 需要导出Nebula Graph中保存的数据。

!!! enterpriseonly
仅企业版Exchange支持从Nebula Graph中导出数据。

## 产品优点

Exchange具有以下优点:
Expand All @@ -40,7 +49,7 @@ Exchange具有以下优点:

## 数据源

Exchange {{exchange.release}} 支持将以下格式或来源的数据转换为Nebula Graph能识别的点和边数据,然后通过**nGQL**语句的形式导入Nebula Graph:
Exchange {{exchange.release}} 支持将以下格式或来源的数据转换为Nebula Graph能识别的点和边数据,然后通过nGQL语句的形式导入Nebula Graph:

- 存储在HDFS或本地的数据:
- [Apache Parquet](../use-exchange/ex-ug-import-from-parquet.md)
Expand All @@ -65,7 +74,9 @@ Exchange {{exchange.release}} 支持将以下格式或来源的数据转换为Ne

- 发布/订阅消息平台:[Apache Pulsar 2.4.5](../use-exchange/ex-ug-import-from-pulsar.md)

除了用nGQL语句的形式导入数据,Exchange还支持将数据源的数据生成**SST文件**,然后通过Console[导入SST文件](../use-exchange/ex-ug-import-from-sst.md)。
除了用nGQL语句的形式导入数据,Exchange还支持将数据源的数据生成SST文件,然后通过Console[导入SST文件](../use-exchange/ex-ug-import-from-sst.md)。

此外,企业版Exchange还支持以Nebula Graph为源,将数据[导出到CSV文件](../use-exchange/ex-ug-export-from-nebula.md)。

## 视频

Expand Down
24 changes: 19 additions & 5 deletions docs-2.0/nebula-exchange/ex-ug-compile.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,29 @@
# 编译Exchange
# 获取Nebula Exchange

本文介绍如何编译Nebula Exchange。用户也可以直接[下载](https://repo1.maven.org/maven2/com/vesoft/nebula-exchange/)编译完成的`.jar`文件
本文介绍如何获取Nebula Exchange的JAR文件

## 准备工作
## 直接下载JAR文件

社区版Exchange的JAR文件可以直接[下载](https://repo1.maven.org/maven2/com/vesoft/nebula-exchange/)。

要下载企业版Exchange,需先[获取Nebula Graph企业版套餐](https://nebula-graph.com.cn/pricing/)。

## 编译源代码获取JAR文件

社区版Exchange的JAR文件还可以通过编译源代码获取。下文介绍如何编译Exchange源代码。

!!! enterpriseonly

企业版Exchange仅能在Nebula Graph企业版套餐中获取。

### 前提条件

- 安装[Maven](https://maven.apache.org/download.cgi)。

<!-- pulsar所在的maven库5月31日被官方关闭,还没找到迁移位置,找到后这里可以删掉-->
- 下载[pulsar-spark-connector_2.11](https://oss-cdn.nebula-graph.com.cn/jar-packages/pulsar-spark-connector_2.11.zip),解压到本地Maven库的目录`io/streamnative/connectors`中。

## 编译Exchange
### 操作步骤

1. 在根目录克隆仓库`nebula-exchange`。

Expand Down Expand Up @@ -57,7 +71,7 @@

迁移数据时,用户可以参考配置文件[`target/classes/application.conf`](https://github.com/vesoft-inc/nebula-exchange/blob/master/nebula-exchange/src/main/resources/application.conf)。

## 下载依赖包失败
### 下载依赖包失败

如果编译时下载依赖包失败:

Expand Down
20 changes: 20 additions & 0 deletions docs-2.0/nebula-exchange/parameter-reference/ex-ug-parameter.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,18 @@
|:---|:---|:---|:---|:---|
|`tags.path`|string|-|是|指定需要生成SST文件的源文件的路径。|

### Nebula Graph源特有参数

!!! enterpriseonly

Nebula Graph源特有参数用于导出Nebula Graph数据,仅企业版Exchange支持。

|参数|数据类型|默认值|是否必须|说明|
|:---|:---|:---|:---|:---|
|`tags.path`|string|`"hdfs://namenode:9000/path/vertex"`|是|指定CSV文件的存储路径。设置的路径必须不存在,Exchange会自动创建该路径。存储到HDFS服务器时路径格式同默认值,例如`"hdfs://192.168.8.177:9000/vertex/player"`。存储到本地时路径格式为`"file:///path/vertex"`,例如`"file:///home/nebula/vertex/player"`。有多个Tag时必须为每个Tag设置不同的目录。|
|`tags.noField`|bool|`false`|是|当值为`true`时,仅导出VID而不导出属性数据;当值为`false`时导出VID和属性数据。|
|`tags.return.fields`|list|`[]`|是|指定要导出的属性。例如,要导出`name`和`age`属性,需将参数值设置为`["name","age"]`。该参数仅在`tags.noField`的值为`false`时生效。|

## 边配置

对于不同的数据源,边的配置也有所不同,有很多通用参数,也有部分特有参数,配置时需要配置通用参数和不同数据源的特有参数。
Expand All @@ -195,3 +207,11 @@
|`edges.ranking`|int|-|否|rank值的列。没有指定时,默认所有rank值为`0`。|
|`edges.batch`|int|`256`|是|单批次写入Nebula Graph的最大边数量。|
|`edges.partition`|int|`32`|是|Spark分片数量。|

### Nebula Graph源特有参数

|参数|数据类型|默认值|是否必须|说明|
|:---|:---|:---|:---|:---|
|`edges.path`|string|`"hdfs://namenode:9000/path/edge"`|是|指定CSV文件的存储路径。设置的路径必须不存在,Exchange会自动创建该路径。存储到HDFS服务器时路径格式同默认值,例如`"hdfs://192.168.8.177:9000/edge/follow"`。存储到本地时路径格式为`"file:///path/edge"`,例如`"file:///home/nebula/edge/follow"`。有多个Edge时必须为每个Edge设置不同的目录。|
|`edges.noField`|bool|`false`|是|当值为`true`时,仅导出起始点VID、目的点VID和Rank,而不导出属性数据;当值为`false`时导出起始点VID、目的点VID、Rank和属性数据。|
|`edges.return.fields`|list|`[]`|是|指定要导出的属性。例如,要导出`start_year`和`end_year`属性,需将参数值设置为`["start_year","end_year"]`。该参数仅在`edges.noField`的值为`false`时生效。|
144 changes: 144 additions & 0 deletions docs-2.0/nebula-exchange/use-exchange/ex-ug-export-from-nebula.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# 导出Nebula Graph数据

本文以一个示例说明如何使用Exchange将Nebula Graph中的数据导出到CSV文件中。

!!! enterpriseonly

仅企业版Exchange支持导出Nebula Graph数据到CSV文件。

## 环境准备

本示例在Linux系统的虚拟机环境下完成,导出数据前准备的软硬件信息如下。

### 硬件

| 类型 | 信息 |
| - | - |
| CPU | 4 Intel(R) Xeon(R) Platinum 8260 CPU @ 2.30GHz |
| 内存 | 16G |
| 硬盘 | 50G |

### 系统

CentOS 7.9.2009

### 软件

| 名称 | 版本 |
| - | - |
| JDK | 1.8.0 |
| Hadoop | 2.10.1 |
| Scala | 2.12.11 |
| Spark | 2.4.7 |
| Nebula Graph | {{nebula.release}} |

### 数据集

在本示例中,作为数据源的Nebula Graph存储着[basketballplayer数据集](https://docs.nebula-graph.io/2.0/basketballplayer-2.X.ngql),其中的Schema要素如下表所示。

| 要素 | 名称 | 属性 |
| :--- | :--- | :--- |
| Tag | `player` | `name string, age int` |
| Tag | `team` | `name string` |
| Edge type | `follow` | `degree int` |
| Edge type | `serve` | `start_year int, end_year int` |

## 操作步骤

1. 从[Nebula Graph企业版套餐](https://nebula-graph.com.cn/pricing/)中获取企业版Exchange的JAR文件。

2. 修改配置文件。

企业版Exchange提供了导出Nebula Graph数据专用的配置文件模板`export_application.conf`,其中各配置项的说明参见[Exchange配置](../ parameter-reference/ex-ug-parameter.md)。本示例使用的配置文件核心内容如下:

```conf
...

# Processing tags
# There are tag config examples for different dataSources.
tags: [
# export NebulaGraph tag data to csv, only support export to CSV for now.
{
name: player
type: {
source: Nebula
sink: CSV
}
# the path to save the NebulaGrpah data, make sure the path doesn't exist.
path:"hdfs://192.168.8.177:9000/vertex/player"
# if no need to export any properties when export NebulaGraph tag data
# if noField is configured true, just export vertexId
noField:false
# define properties to export from NebulaGraph tag data
# if return.fields is configured as empty list, then export all properties
return.fields:[]
# nebula space partition number
partition:10
}

...

]

# Processing edges
# There are edge config examples for different dataSources.
edges: [
# export NebulaGraph tag data to csv, only support export to CSV for now.
{
name: follow
type: {
source: Nebula
sink: CSV
}
# the path to save the NebulaGrpah data, make sure the path doesn't exist.
path:"hdfs://192.168.8.177:9000/edge/follow"
# if no need to export any properties when export NebulaGraph edge data
# if noField is configured true, just export src,dst,rank
noField:false
# define properties to export from NebulaGraph edge data
# if return.fields is configured as empty list, then export all properties
return.fields:[]
# nebula space partition number
partition:10
}

...

]
}
```

3. 使用如下命令导出Nebula Graph中的数据。

```bash
<spark_install_path>/bin/spark-submit --master "local" --class com.vesoft.nebula.exchange.Exchange nebula-exchange-x.y.z.jar_path> -c <export_application.conf_path>
```

本示例使用的导出命令如下。

```bash
$ ./spark-submit --master "local" --class com.vesoft.nebula.exchange.Exchange \
~/exchange-ent/nebula-exchange-ent-{{exchange.release}}.jar -c ~/exchange-ent/export_application.conf
```

4. 检查导出的数据。

1. 查看目标路径下是否成功生成了CSV文件。

```bash
$ hadoop fs -ls /vertex/player
Found 11 items
-rw-r--r-- 3 nebula supergroup 0 2021-11-05 07:36 /vertex/player/_SUCCESS
-rw-r--r-- 3 nebula supergroup 160 2021-11-05 07:36 /vertex/player/ part-00000-17293020-ba2e-4243-b834-34495c0536b3-c000.csv
-rw-r--r-- 3 nebula supergroup 163 2021-11-05 07:36 /vertex/player/ part-00001-17293020-ba2e-4243-b834-34495c0536b3-c000.csv
-rw-r--r-- 3 nebula supergroup 172 2021-11-05 07:36 /vertex/player/ part-00002-17293020-ba2e-4243-b834-34495c0536b3-c000.csv
-rw-r--r-- 3 nebula supergroup 172 2021-11-05 07:36 /vertex/player/ part-00003-17293020-ba2e-4243-b834-34495c0536b3-c000.csv
-rw-r--r-- 3 nebula supergroup 144 2021-11-05 07:36 /vertex/player/ part-00004-17293020-ba2e-4243-b834-34495c0536b3-c000.csv
-rw-r--r-- 3 nebula supergroup 173 2021-11-05 07:36 /vertex/player/ part-00005-17293020-ba2e-4243-b834-34495c0536b3-c000.csv
-rw-r--r-- 3 nebula supergroup 160 2021-11-05 07:36 /vertex/player/ part-00006-17293020-ba2e-4243-b834-34495c0536b3-c000.csv
-rw-r--r-- 3 nebula supergroup 148 2021-11-05 07:36 /vertex/player/ part-00007-17293020-ba2e-4243-b834-34495c0536b3-c000.csv
-rw-r--r-- 3 nebula supergroup 125 2021-11-05 07:36 /vertex/player/ part-00008-17293020-ba2e-4243-b834-34495c0536b3-c000.csv
-rw-r--r-- 3 nebula supergroup 119 2021-11-05 07:36 /vertex/player/ part-00009-17293020-ba2e-4243-b834-34495c0536b3-c000.csv
```

2. 检查CSV文件内容,确定数据导出成功。
3 changes: 2 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -412,7 +412,7 @@ nav:
- 认识Nebula Exchange:
- 什么是Nebula Exchange: nebula-exchange/about-exchange/ex-ug-what-is-exchange.md
- 使用限制: nebula-exchange/about-exchange/ex-ug-limitations.md
- 编译Exchange: nebula-exchange/ex-ug-compile.md
- 获取Nebula Exchange: nebula-exchange/ex-ug-compile.md
- 参数说明:
- 导入命令参数: nebula-exchange/parameter-reference/ex-ug-para-import-command.md
- 配置参数: nebula-exchange/parameter-reference/ex-ug-parameter.md
Expand All @@ -430,6 +430,7 @@ nav:
- 导入Pulsar数据: nebula-exchange/use-exchange/ex-ug-import-from-pulsar.md
- 导入Kafka数据: nebula-exchange/use-exchange/ex-ug-import-from-kafka.md
- 导入SST文件数据: nebula-exchange/use-exchange/ex-ug-import-from-sst.md
- 导出Nebula Graph数据: nebula-exchange/use-exchange/ex-ug-export-from-nebula.md
- Exchange 常见问题: nebula-exchange/ex-ug-FAQ.md

# - Nebula Operator:
Expand Down