Skip to content

Commit

Permalink
Merge branch 'dev' into cdc-1
Browse files Browse the repository at this point in the history
  • Loading branch information
MonsterChenzhuo committed Jun 20, 2023
2 parents 4ea8d58 + 6e8de0e commit 0451628
Show file tree
Hide file tree
Showing 65 changed files with 849 additions and 94 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ volume average daily, and later writing the data to Clickhouse.

- Tencent Cloud

Collecting various logs from business services into Apache Kafka, some of the data in Apache Kafka is consumed and extracted through Seatunnel, and then store into Clickhouse.
Collecting various logs from business services into Apache Kafka, some of the data in Apache Kafka is consumed and extracted through SeaTunnel, and then store into Clickhouse.

For more use cases, please refer to: https://seatunnel.apache.org/blog

Expand Down
8 changes: 4 additions & 4 deletions config/v2.batch.config.template
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
# limitations under the License.
#
######
###### This config file is a demonstration of streaming processing in Seatunnel config
###### This config file is a demonstration of streaming processing in SeaTunnel config
######

env {
Expand All @@ -41,14 +41,14 @@ source {
}
}

# If you would like to get more information about how to configure Seatunnel and see full list of source plugins,
# If you would like to get more information about how to configure SeaTunnel and see full list of source plugins,
# please go to https://seatunnel.apache.org/docs/category/source-v2
}

sink {
Console {
}

# If you would like to get more information about how to configure Seatunnel and see full list of sink plugins,
# If you would like to get more information about how to configure SeaTunnel and see full list of sink plugins,
# please go to https://seatunnel.apache.org/docs/category/sink-v2
}
}
8 changes: 4 additions & 4 deletions config/v2.streaming.conf.template
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
# limitations under the License.
#
######
###### This config file is a demonstration of streaming processing in Seatunnel config
###### This config file is a demonstration of streaming processing in SeaTunnel config
######

env {
Expand All @@ -41,14 +41,14 @@ source {
}
}

# If you would like to get more information about how to configure Seatunnel and see full list of source plugins,
# If you would like to get more information about how to configure SeaTunnel and see full list of source plugins,
# please go to https://seatunnel.apache.org/docs/category/source-v2
}

sink {
Console {
}

# If you would like to get more information about how to configure Seatunnel and see full list of sink plugins,
# If you would like to get more information about how to configure SeaTunnel and see full list of sink plugins,
# please go to https://seatunnel.apache.org/docs/category/sink-v2
}
}
2 changes: 1 addition & 1 deletion docs/en/about.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# About Seatunnel
# About SeaTunnel

<img src="https://seatunnel.apache.org/image/logo.png" alt="seatunnel logo" width="200px" height="200px" align="right" />

Expand Down
2 changes: 1 addition & 1 deletion docs/en/concept/connector-v2-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
## Differences Between Connector V2 And Connector v1

Since https://github.com/apache/seatunnel/issues/1608 We Added Connector V2 Features.
Connector V2 is a connector defined based on the Seatunnel Connector API interface. Unlike Connector V1, Connector V2 supports the following features.
Connector V2 is a connector defined based on the SeaTunnel Connector API interface. Unlike Connector V1, Connector V2 supports the following features.

* **Multi Engine Support** SeaTunnel Connector API is an engine independent API. The connectors developed based on this API can run in multiple engines. Currently, Flink and Spark are supported, and we will support other engines in the future.
* **Multi Engine Version Support** Decoupling the connector from the engine through the translation layer solves the problem that most connectors need to modify the code in order to support a new version of the underlying engine.
Expand Down
6 changes: 3 additions & 3 deletions docs/en/connector-v2/formats/canal-json.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@ Changelog-Data-Capture Format Format: Serialization Schema Format: Deserializati

Canal is a CDC (Changelog Data Capture) tool that can stream changes in real-time from MySQL into other systems. Canal provides a unified format schema for changelog and supports to serialize messages using JSON and protobuf (protobuf is the default format for Canal).

Seatunnel supports to interpret Canal JSON messages as INSERT/UPDATE/DELETE messages into seatunnel system. This is useful in many cases to leverage this feature, such as
SeaTunnel supports to interpret Canal JSON messages as INSERT/UPDATE/DELETE messages into seatunnel system. This is useful in many cases to leverage this feature, such as

synchronizing incremental data from databases to other systems
auditing logs
real-time materialized views on databases
temporal join changing history of a database table and so on.

Seatunnel also supports to encode the INSERT/UPDATE/DELETE messages in Seatunnel as Canal JSON messages, and emit to storage like Kafka. However, currently Seatunnel can’t combine UPDATE_BEFORE and UPDATE_AFTER into a single UPDATE message. Therefore, Seatunnel encodes UPDATE_BEFORE and UPDATE_AFTER as DELETE and INSERT Canal messages.
SeaTunnel also supports to encode the INSERT/UPDATE/DELETE messages in SeaTunnel as Canal JSON messages, and emit to storage like Kafka. However, currently SeaTunnel can’t combine UPDATE_BEFORE and UPDATE_AFTER into a single UPDATE message. Therefore, SeaTunnel encodes UPDATE_BEFORE and UPDATE_AFTER as DELETE and INSERT Canal messages.

# Format Options

Expand Down Expand Up @@ -73,7 +73,7 @@ Note: please refer to Canal documentation about the meaning of each fields.

The MySQL products table has 4 columns (id, name, description and weight).
The above JSON message is an update change event on the products table where the weight value of the row with id = 111 is changed from 5.18 to 5.15.
Assuming the messages have been synchronized to Kafka topic products_binlog, then we can use the following Seatunnel to consume this topic and interpret the change events.
Assuming the messages have been synchronized to Kafka topic products_binlog, then we can use the following SeaTunnel to consume this topic and interpret the change events.

```bash
env {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# CDC compatible debezium-json

Seatunnel supports to interpret cdc record as Debezium-JSON messages publish to mq(kafka) system.
SeaTunnel supports to interpret cdc record as Debezium-JSON messages publish to mq(kafka) system.

This is useful in many cases to leverage this feature, such as compatible with the debezium ecosystem.

Expand Down
6 changes: 3 additions & 3 deletions docs/en/connector-v2/sink/MongoDB.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,9 @@ They can be downloaded via install-plugin.sh or from the Maven central repositor
Data Type Mapping
-----------------

The following table lists the field data type mapping from MongoDB BSON type to Seatunnel data type.
The following table lists the field data type mapping from MongoDB BSON type to SeaTunnel data type.

| Seatunnel Data Type | MongoDB BSON Type |
| SeaTunnel Data Type | MongoDB BSON Type |
|---------------------|-------------------|
| STRING | ObjectId |
| STRING | String |
Expand Down Expand Up @@ -199,7 +199,7 @@ The necessity for using transactions can be greatly avoided by designing systems
By specifying a clear primary key and using the upsert method, exactly-once write semantics can be achieved.

If upsert-key is defined in the configuration, the MongoDB sink will use upsert semantics instead of regular INSERT statements. We combine the primary keys declared in upsert-key as the MongoDB reserved primary key and use upsert mode for writing to ensure idempotent writes.
In the event of a failure, Seatunnel jobs will recover from the last successful checkpoint and reprocess, which may result in duplicate message processing during recovery. It is highly recommended to use upsert mode, as it helps to avoid violating database primary key constraints and generating duplicate data if records need to be reprocessed.
In the event of a failure, SeaTunnel jobs will recover from the last successful checkpoint and reprocess, which may result in duplicate message processing during recovery. It is highly recommended to use upsert mode, as it helps to avoid violating database primary key constraints and generating duplicate data if records need to be reprocessed.

```bash
sink {
Expand Down
4 changes: 2 additions & 2 deletions docs/en/connector-v2/sink/Mysql.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

> Spark<br/>
> Flink<br/>
> Seatunnel Zeta<br/>
> SeaTunnel Zeta<br/>
## Key Features

Expand Down Expand Up @@ -34,7 +34,7 @@ semantics (using XA transaction guarantee).
## Data Type Mapping

| Mysql Data type | Seatunnel Data type |
| Mysql Data type | SeaTunnel Data type |
|-----------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| BIT(1)<br/>INT UNSIGNED | BOOLEAN |
| TINYINT<br/>TINYINT UNSIGNED<br/>SMALLINT<br/>SMALLINT UNSIGNED<br/>MEDIUMINT<br/>MEDIUMINT UNSIGNED<br/>INT<br/>INTEGER<br/>YEAR | INT |
Expand Down
4 changes: 2 additions & 2 deletions docs/en/connector-v2/sink/Snowflake.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
>
> Spark<br/>
> Flink<br/>
> Seatunnel Zeta<br/>
> SeaTunnel Zeta<br/>
>
## Key features

Expand All @@ -30,7 +30,7 @@ Write data through jdbc. Support Batch mode and Streaming mode, support concurre
>
## Data Type Mapping

| Snowflake Data type | Seatunnel Data type |
| Snowflake Data type | SeaTunnel Data type |
|-----------------------------------------------------------------------------|---------------------|
| BOOLEAN | BOOLEAN |
| TINYINT<br/>SMALLINT<br/>BYTEINT<br/> | SHORT_TYPE |
Expand Down
2 changes: 1 addition & 1 deletion docs/en/connector-v2/source/FtpFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ For example, set like following:

`skip_header_row_number = 2`

then Seatunnel will skip the first 2 lines from source files
then SeaTunnel will skip the first 2 lines from source files

### schema [config]

Expand Down
2 changes: 1 addition & 1 deletion docs/en/connector-v2/source/HdfsFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ For example, set like following:

`skip_header_row_number = 2`

then Seatunnel will skip the first 2 lines from source files
then SeaTunnel will skip the first 2 lines from source files

### file_format_type [string]

Expand Down
2 changes: 1 addition & 1 deletion docs/en/connector-v2/source/Klaviyo.md
Original file line number Diff line number Diff line change
Expand Up @@ -282,7 +282,7 @@ Source plugin common parameters, please refer to [Source Common Options](common-
```hocon
Klaviyo {
url = "https://a.klaviyo.com/api/lists/"
private_key = "Seatunnel-test"
private_key = "SeaTunnel-test"
revision = "2020-10-17"
method = "GET"
format = "json"
Expand Down
2 changes: 1 addition & 1 deletion docs/en/connector-v2/source/Lemlist.md
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,7 @@ Source plugin common parameters, please refer to [Source Common Options](common-
```hocon
Lemlist {
url = "https://api.lemlist.com/api/campaigns"
password = "Seatunnel-test"
password = "SeaTunnel-test"
schema {
fields {
_id = string
Expand Down
2 changes: 1 addition & 1 deletion docs/en/connector-v2/source/LocalFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ For example, set like following:

`skip_header_row_number = 2`

then Seatunnel will skip the first 2 lines from source files
then SeaTunnel will skip the first 2 lines from source files

### file_format_type [string]

Expand Down
8 changes: 4 additions & 4 deletions docs/en/connector-v2/source/MongoDB.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,9 @@ They can be downloaded via install-plugin.sh or from the Maven central repositor
Data Type Mapping
-----------------

The following table lists the field data type mapping from MongoDB BSON type to Seatunnel data type.
The following table lists the field data type mapping from MongoDB BSON type to SeaTunnel data type.

| MongoDB BSON type | Seatunnel Data type |
| MongoDB BSON type | SeaTunnel Data type |
|-------------------|---------------------|
| ObjectId | STRING |
| String | STRING |
Expand All @@ -55,9 +55,9 @@ The following table lists the field data type mapping from MongoDB BSON type to
| Object | ROW |
| Array | ARRAY |

For specific types in MongoDB, we use Extended JSON format to map them to Seatunnel STRING type.
For specific types in MongoDB, we use Extended JSON format to map them to SeaTunnel STRING type.

| MongoDB BSON type | Seatunnel STRING |
| MongoDB BSON type | SeaTunnel STRING |
|-------------------|----------------------------------------------------------------------------------------------|
| Symbol | {"_value": {"$symbol": "12"}} |
| RegularExpression | {"_value": {"$regularExpression": {"pattern": "^9$", "options": "i"}}} |
Expand Down
4 changes: 2 additions & 2 deletions docs/en/connector-v2/source/Mysql.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

> Spark<br/>
> Flink<br/>
> Seatunnel Zeta<br/>
> SeaTunnel Zeta<br/>
## Key Features

Expand Down Expand Up @@ -36,7 +36,7 @@ Read external data source data through JDBC.
## Data Type Mapping

| Mysql Data type | Seatunnel Data type |
| Mysql Data type | SeaTunnel Data type |
|-----------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| BIT(1)<br/>INT UNSIGNED | BOOLEAN |
| TINYINT<br/>TINYINT UNSIGNED<br/>SMALLINT<br/>SMALLINT UNSIGNED<br/>MEDIUMINT<br/>MEDIUMINT UNSIGNED<br/>INT<br/>INTEGER<br/>YEAR | INT |
Expand Down
2 changes: 1 addition & 1 deletion docs/en/connector-v2/source/Notion.md
Original file line number Diff line number Diff line change
Expand Up @@ -283,7 +283,7 @@ Source plugin common parameters, please refer to [Source Common Options](common-
```hocon
Notion {
url = "https://api.notion.com/v1/users"
password = "Seatunnel-test"
password = "SeaTunnel-test"
version = "2022-06-28"
content_field = "$.results.*"
schema = {
Expand Down
2 changes: 1 addition & 1 deletion docs/en/connector-v2/source/OneSignal.md
Original file line number Diff line number Diff line change
Expand Up @@ -280,7 +280,7 @@ Source plugin common parameters, please refer to [Source Common Options](common-
OneSignal {
url = "https://onesignal.com/api/v1/apps"
password = "Seatunnel-test"
password = "SeaTunnel-test"
schema = {
fields {
id = string
Expand Down
2 changes: 1 addition & 1 deletion docs/en/connector-v2/source/OssFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ For example, set like following:

`skip_header_row_number = 2`

then Seatunnel will skip the first 2 lines from source files
then SeaTunnel will skip the first 2 lines from source files

### file_format_type [string]

Expand Down
2 changes: 1 addition & 1 deletion docs/en/connector-v2/source/OssJindoFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ For example, set like following:

`skip_header_row_number = 2`

then Seatunnel will skip the first 2 lines from source files
then SeaTunnel will skip the first 2 lines from source files

### file_format_type [string]

Expand Down
2 changes: 1 addition & 1 deletion docs/en/connector-v2/source/S3File.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ For example, set like following:

`skip_header_row_number = 2`

then Seatunnel will skip the first 2 lines from source files
then SeaTunnel will skip the first 2 lines from source files

### file_format_type [string]

Expand Down
2 changes: 1 addition & 1 deletion docs/en/connector-v2/source/SftpFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ For example, set like following:

`skip_header_row_number = 2`

then Seatunnel will skip the first 2 lines from source files
then SeaTunnel will skip the first 2 lines from source files

### schema [config]

Expand Down
4 changes: 2 additions & 2 deletions docs/en/connector-v2/source/Snowflake.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
>
> Spark<br/>
> Flink<br/>
> Seatunnel Zeta<br/>
> SeaTunnel Zeta<br/>
>
## Key features

Expand Down Expand Up @@ -36,7 +36,7 @@ Read external data source data through JDBC.
>
## Data Type Mapping

| Snowflake Data type | Seatunnel Data type |
| Snowflake Data type | SeaTunnel Data type |
|-----------------------------------------------------------------------------|---------------------|
| BOOLEAN | BOOLEAN |
| TINYINT<br/>SMALLINT<br/>BYTEINT<br/> | SHORT_TYPE |
Expand Down
2 changes: 1 addition & 1 deletion docs/en/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -334,7 +334,7 @@ Just configure hdfs-site.xml properly. Refer to: https://www.cnblogs.com/suanec/

## I want to learn the source code of SeaTunnel. Where should I start?

SeaTunnel has a completely abstract and structured code implementation, and many people have chosen SeaTunnel As a way to learn Spark. You can learn the source code from the main program entry: Seatunnel.java
SeaTunnel has a completely abstract and structured code implementation, and many people have chosen SeaTunnel As a way to learn Spark. You can learn the source code from the main program entry: SeaTunnel.java

## When SeaTunnel developers develop their own plugins, do they need to understand the SeaTunnel code? Should these plugins be integrated into the SeaTunnel project?

Expand Down
2 changes: 1 addition & 1 deletion docs/en/seatunnel-engine/rest-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ sidebar_position: 7

# REST API

Seatunnel has a monitoring API that can be used to query status and statistics of running jobs, as well as recent
SeaTunnel has a monitoring API that can be used to query status and statistics of running jobs, as well as recent
completed jobs. The monitoring API is a REST-ful API that accepts HTTP requests and responds with JSON data.

## Overview
Expand Down
4 changes: 2 additions & 2 deletions docs/en/start-v2/kubernetes/kubernetes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ minikube image load seatunnel:2.3.0-flink-1.13
]}>
<TabItem value="flink">

The steps below provide a quick walk-through on setting up the Flink Kubernetes Operator.
The steps below provide a quick walk-through on setting up the Flink Kubernetes Operator.
You can refer to [Flink Kubernetes Operator - Quick Start](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/try-flink-kubernetes-operator/quick-start/) for more details.

> Notice: All the Kubernetes resources bellow are created in default namespace.
Expand Down Expand Up @@ -206,7 +206,7 @@ spec:
path: seatunnel.streaming.conf
job:
jarURI: local:///opt/seatunnel/starter/seatunnel-flink-starter.jar
entryClass: org.apache.seatunnel.core.starter.flink.SeatunnelFlink
entryClass: org.apache.seatunnel.core.starter.flink.SeaTunnelFlink
args: ["--config", "/data/seatunnel.streaming.conf"]
parallelism: 2
upgradeMode: stateless
Expand Down
Loading

0 comments on commit 0451628

Please sign in to comment.