Merge pull request #483 from Altinity/482-rearrange-main-readme-docum…

…entation 482 rearrange main readme documentation
Altinity · Mar 1, 2024 · 9cccbd3 · 9cccbd3
2 parents 4ba7c72 + fba2e81
commit 9cccbd3
Show file tree

Hide file tree

Showing 6 changed files with 254 additions and 82 deletions.
diff --git a/README.md b/README.md
@@ -36,14 +36,16 @@ for analysis.
 
 ## Getting Started
 
-See [QuickStart Guide(Lightweight)](doc/quickstart.md).
+[QuickStart Guide: Lightweight (MySQL)](doc/quickstart.md)\
+[QuickStart Guide: Lightweight (PostgreSQL)](doc/quickstart_postgres.md)\
+[QuickStart Guide: Kafka](doc/quickstart_kafka.md)
 
 ## Blog Articles
 
 First two are good tutorials on MySQL and PostgreSQL respectively. 
 
-- [Altinity Sink connector for ClickHouse](https://altinity.com/blog/fast-mysql-to-clickhouse-replication-announcing-the-altinity-sink-connector-for-clickhouse)
-- [Replicating PostgreSQL to ClickHouse](https://altinity.com/blog/replicating-data-from-postgresql-to-clickhouse-with-the-altinity-sink-connector)
+- [Altinity Sink Connector (MySQL)](https://altinity.com/blog/fast-mysql-to-clickhouse-replication-announcing-the-altinity-sink-connector-for-clickhouse)
+- [Altinity Sink Connector (PostgreSQL)](https://altinity.com/blog/replicating-data-from-postgresql-to-clickhouse-with-the-altinity-sink-connector)
 - [ClickHouse as an analytic extension for MySQL](https://altinity.com/blog/using-clickhouse-as-an-analytic-extension-for-mysql?utm_campaign=Brand&utm_content=224583767&utm_medium=social&utm_source=linkedin&hss_channel=lcp-10955938)
 
 ## Reference Documentation
@@ -54,12 +56,6 @@ First two are good tutorials on MySQL and PostgreSQL respectively.
 * [Lightweight Sink Connect CLI](doc/sink_connector_cli.md)
 * [Mutable Data Handling](doc/mutable_data.md)
 
-### Installation
-
-* [Sink Connector Setup(Kafka)](doc/setup.md)
-* [Sink Connector Configuration(Kafka & Lightweight)](doc/configuration.md)
-* [Debezium Setup](doc/debezium_setup.md)
-
 ### Operations
 
 * [Monitoring](doc/Monitoring.md)
@@ -71,7 +67,7 @@ First two are good tutorials on MySQL and PostgreSQL respectively.
 
 ## Roadmap 
 
-See [2024 Roadmap](https://github.com/Altinity/clickhouse-sink-connector/issues/401).
+[2024 Roadmap](https://github.com/Altinity/clickhouse-sink-connector/issues/401)
 
 ## Help
 

diff --git a/doc/configuration.md b/doc/configuration.md
@@ -1,37 +1,3 @@
-# Sink Connector Configuration Settings
-
-## LightWeight Sink Connector
-
-### Setting Values for Docker Operation
-
-Update `config.yml` https://github.com/Altinity/clickhouse-sink-connector/blob/develop/sink-connector-lightweight/docker/config.yml
-
-1.  Update **MySQL information** in config.yaml: `database.hostname`, `database.port`, `database.user` and `database.password`.
-2.  Update **ClickHouse information** in config.yaml: `clickhouse.server.url`, `clickhouse.server.user`, `clickhouse.server.password`, `clickhouse.server.port`. 
-Also Update **ClickHouse information** for the following fields that are used to store the offset information- `offset.storage.jdbc.url`, `offset.storage.jdbc.user`, `offset.storage.jdbc.password`, `schema.history.internal.jdbc.url`, `schema.history.internal.jdbc.user`, and `schema.history.internal.jdbc.password`.
-3.  Update MySQL databases to be replicated: `database.include.list`.
-4.  Add table filters: `table.include.list`.
-5.  Set `snapshot.mode` to `initial` if you like to replicate existing records, set `snapshot.mode` to `schema_only` to replicate schema and only the records that are modified after the connector is started.
-6.  Start replication by running the JAR file. `java -jar clickhouse-debezium-embedded-1.0-SNAPSHOT.jar <yaml_config_file>` or docker.
-**ClickHouse HTTPS servers**
-For `https` servers, make sure the `clickhouse.server.url` includes `https`
-Also add `?ssl=true` and port `8443` to both the `offset.storage.jdbc.url` and `schema.history.internal.jdbc.url` configuration variables.
-Example: **ClickHouse Cloud**
-```
-clickhouse.server.url: "https://cloud_url"
-offset.storage.jdbc.url: "jdbc:clickhouse://cloud_url:8443/altinity_sink_connector?ssl=true"
-schema.history.internal.jdbc.url: "jdbc:clickhouse://cloud_url:8443/altinity_sink_connector?ssl=true"
-```
-
-### MySQL Configuration 
-[MySQL Configuration](sink-connector-lightweight/docker/config.yml)
-
-### PostgreSQL Configuration
-For AWS RDS users, you might need to add heartbeat interval and query to avoid WAL logs constantly growing in size.
-https://stackoverflow.com/questions/76415644/postgresql-wal-log-limiting-rds
-https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-wal-disk-space
-
-[PostgreSQL Configuration](sink-connector-lightweight/docker/config_postgres.yml)
 
 ### Configuration Reference
  Configuration                          | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
@@ -64,41 +30,3 @@ https://debezium.io/documentation/reference/stable/connectors/postgresql.html#po
 | restart.event.loop                     | If set to true, replication will be re-started based on the restart.event.loop.timeout.period parameter(which is defined in seconds)                                                                                                                                                                                                                                                                                                                                                                                                 |
 | restart.event.loop.timeout.period.secs | If the last change record(CDC) received from source database exceeds this threshold period defined in seconds, then replication is restarted.                                                                                                                                                                                                                                                                                                                                                                                        |
 | batch.max.records                      | Size of the batch that is persisted to ClickHouse.(Default 100000)                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
-
-
-
-## Kafka Sink Connector for ClickHouse
-
-### Recommended Memory limits.
-**Production Usage**
-In `docker-compose.yml` file, its recommended to set Xmx to atleast 5G `-Xmx5G` when using in Production and 
-if you encounter a `Out of memory/Heap exception` error. 
-for both **Debezium** and **Sink**
-
-```
-- KAFKA_HEAP_OPTS=-Xms2G -Xmx5G
-```
-
-### Kafka Sink Connector Configuration
-
-|  Property                        |   Default | Description                                                                                                                                                           |
-|----------------------------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| tasks.max                        | No        | SinkConnector task(essentially threads), ideally this needs to be the same as the Kafka partitions.                                                                   |
-| topics.regex                     | No        | Regex of matching topics.  Example: "SERVER5432.test.(.*)" matches SERVER5432.test.employees and SERVER5432.test.products                                             |
-| topics                           | No        | The list of topics. topics or topics.regex has to be provided.                                                                                                        |
-| clickhouse.server.url            |           | ClickHouse Server URL                                                                                                                                                 |
-| clickhouse.server.user           |           | ClickHouse Server username                                                                                                                                            |
-| clickhouse.server.password           |           | ClickHouse Server password                                                                                                                                            |
-| clickhouse.server.database       |           | ClickHouse Database name                                                                                                                                              |
-| clickhouse.server.port           | 8123      | ClickHouse Server port                                                                                                                                                |
-| clickhouse.topic2table.map       | No        | Map of Kafka topics to table names, <topic_name1>:<table_name1>,<topic_name2>:<table_name2> This variable will override the default mapping of topics to table names. |
-| store.kafka.metadata             | false     | If set to true, kafka metadata columns will be added to Clickhouse                                                                                                    |
-| store.raw.data                   | false     | If set to true, the entire row is converted to JSON and stored in the column defined by the  ` store.raw.data.column ` field                                          |
-| store.raw.data.column            | No        | Clickhouse table column to store the raw data in JSON form(String Clickhouse DataType)                                                                                |
-| metrics.enable                   | true      | Enable Prometheus scraping                                                                                                                                            |
-| metrics.port                     | 8084      | Metrics port                                                                                                                                                          |
-| buffer.flush.time.ms             | 30        | Buffer(Batch of records) flush time in milliseconds                                                                                                                   |
-| thread.pool.size                 | 10        | Number of threads that is used to connect to ClickHouse                                                                                                               |
-| auto.create.tables               | false     | Sink connector will create tables in ClickHouse (If it does not exist)                                                                                                |
-| snowflake.id                     | true      | Uses SnowFlake ID(Timestamp + GTID) as the version column for ReplacingMergeTree                                                                                      |
-| replacingmergetree.delete.column | "sign"    | Column used as the sign column for ReplacingMergeTree.
diff --git a/doc/configuration_kafka.md b/doc/configuration_kafka.md
@@ -0,0 +1,37 @@
+
+
+## Kafka Sink Connector for ClickHouse
+
+### Recommended Memory limits.
+**Production Usage**
+In `docker-compose.yml` file, its recommended to set Xmx to atleast 5G `-Xmx5G` when using in Production and
+if you encounter a `Out of memory/Heap exception` error.
+for both **Debezium** and **Sink**
+
+```
+- KAFKA_HEAP_OPTS=-Xms2G -Xmx5G
+```
+
+### Kafka Sink Connector Configuration
+
+|  Property                        |   Default | Description                                                                                                                                                           |
+|----------------------------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| tasks.max                        | No        | SinkConnector task(essentially threads), ideally this needs to be the same as the Kafka partitions.                                                                   |
+| topics.regex                     | No        | Regex of matching topics.  Example: "SERVER5432.test.(.*)" matches SERVER5432.test.employees and SERVER5432.test.products                                             |
+| topics                           | No        | The list of topics. topics or topics.regex has to be provided.                                                                                                        |
+| clickhouse.server.url            |           | ClickHouse Server URL                                                                                                                                                 |
+| clickhouse.server.user           |           | ClickHouse Server username                                                                                                                                            |
+| clickhouse.server.password           |           | ClickHouse Server password                                                                                                                                            |
+| clickhouse.server.database       |           | ClickHouse Database name                                                                                                                                              |
+| clickhouse.server.port           | 8123      | ClickHouse Server port                                                                                                                                                |
+| clickhouse.topic2table.map       | No        | Map of Kafka topics to table names, <topic_name1>:<table_name1>,<topic_name2>:<table_name2> This variable will override the default mapping of topics to table names. |
+| store.kafka.metadata             | false     | If set to true, kafka metadata columns will be added to Clickhouse                                                                                                    |
+| store.raw.data                   | false     | If set to true, the entire row is converted to JSON and stored in the column defined by the  ` store.raw.data.column ` field                                          |
+| store.raw.data.column            | No        | Clickhouse table column to store the raw data in JSON form(String Clickhouse DataType)                                                                                |
+| metrics.enable                   | true      | Enable Prometheus scraping                                                                                                                                            |
+| metrics.port                     | 8084      | Metrics port                                                                                                                                                          |
+| buffer.flush.time.ms             | 30        | Buffer(Batch of records) flush time in milliseconds                                                                                                                   |
+| thread.pool.size                 | 10        | Number of threads that is used to connect to ClickHouse                                                                                                               |
+| auto.create.tables               | false     | Sink connector will create tables in ClickHouse (If it does not exist)                                                                                                |
+| snowflake.id                     | true      | Uses SnowFlake ID(Timestamp + GTID) as the version column for ReplacingMergeTree                                                                                      |
+| replacingmergetree.delete.column | "sign"    | Column used as the sign column for ReplacingMergeTree.
diff --git a/doc/quickstart.md b/doc/quickstart.md
@@ -79,3 +79,64 @@ Stop all components.
 ```
 docker compose -f docker-compose-mysql.yml down
 ```
+
+### Connecting to External MySQL/ClickHouse
+
+**Step 1:** Update **MySQL** information in config.yaml(https://github.com/Altinity/clickhouse-sink-connector/blob/develop/sink-connector-lightweight/docker/config.yml
+):
+```
+   database.hostname: <MySQL Hostname>
+   database.port: <MySQL Port>
+   database.user: <MySQL username>
+   database.password: <MySQL password>
+```
+**Step 2:** Update **ClickHouse** information in config.yaml: 
+```
+    clickhouse.server.url: <ClickHouse hostname>
+    clickhouse.server.user: <ClickHouse username>
+    clickhouse.server.password: <ClickHouse password>
+    clickhouse.server.port. <ClickHouse port>
+```
+**Step 3:** Update **Offset storage/Schema History** to be stored in **ClickHouse**:
+```
+    offset.storage.jdbc.url: "jdbc:clickhouse://<ClickHouse hostname>:<ClickHouse port>/altinity_sink_connector"
+    schema.history.internal.jdbc.url: "jdbc:clickhouse://<ClickHouse hostname>:<ClickHouse port>/altinity_sink_connector"
+    
+    offset.storage.jdbc.user: <ClickHouse username>
+    offset.storage.jdbc.password: <ClickHouse password>
+    
+    schema.history.internal.jdbc.user: <ClickHouse username>
+    schema.history.internal.jdbc.password: <ClickHouse password>
+```
+**Step 4:** Update **MySQL databases** to be replicated:
+```
+    database.include.list: <Database name>
+```
+
+**Step 5:** Add **table filters** to include/exclude tables to be replicated:
+```
+    table.include.list: <Table names>
+    table.exclude.list: <Table names>
+```
+**Step 6:** Configure **Snapshot Mode** to define initial load vs CDC replication:
+```
+    # Initial load(transfer all existing data in MySQL)
+    snapshot.mode: initial
+    
+    or
+    
+    # CDC replication(transfer only new data in MySQL)
+    snapshot.mode: schema_only
+```
+**Note: ClickHouse Secure(Altinity Cloud/ClickHouse Cloud)**:
+Set the sever url to `https` and add `?ssl=true` to the end of the url.
+```
+clickhouse.server.url: "https://cloud_url"
+offset.storage.jdbc.url: "jdbc:clickhouse://cloud_url:8443/altinity_sink_connector?ssl=true"
+schema.history.internal.jdbc.url: "jdbc:clickhouse://cloud_url:8443/altinity_sink_connector?ssl=true"
+```
+
+## References:
+1. [Sink Connector Configuration ](configuration.md)
+2. [MySQL Topologies supported](https://debezium.io/documentation/reference/2.5/connectors/mysql.html#setting-up-mysql)
+3. [MySQL Setup](https://debezium.io/documentation/reference/2.5/connectors/mysql.html#setting-up-mysql)
diff --git a/doc/setup.md → doc/quickstart_kafka.md b/doc/setup.md → doc/quickstart_kafka.md