From 8393e6ad4a5db1a84ea9e724d0fd742a84d29471 Mon Sep 17 00:00:00 2001 From: Jim Galasyn Date: Tue, 19 May 2020 13:19:47 -0700 Subject: [PATCH] docs: copy edit WITH(KEY) doc changes (DOCS-4374) --- docs/developer-guide/joins/partition-data.md | 38 +++++----- .../ksqldb-reference/insert-values.md | 2 +- docs/developer-guide/syntax-reference.md | 40 +++++------ .../installation/server-config/avro-schema.md | 14 ++-- .../installation/upgrading.md | 70 ++++++++++--------- docs/tutorials/basics-docker.md | 4 +- docs/tutorials/examples.md | 13 ++-- 7 files changed, 94 insertions(+), 87 deletions(-) diff --git a/docs/developer-guide/joins/partition-data.md b/docs/developer-guide/joins/partition-data.md index ceffa906c5be..efdb50c7577c 100644 --- a/docs/developer-guide/joins/partition-data.md +++ b/docs/developer-guide/joins/partition-data.md @@ -48,34 +48,34 @@ given any name in the schema definition. When you create a table by using a CREATE TABLE AS SELECT statement, the key of the resulting table is determined as follows: -- If the statement contains neither a JOIN or GROUP BY clause, the key type of the resulting +- If the statement contains neither a JOIN nor GROUP BY clause, the key type of the resulting table matches the key type of the source table, and the name matches the source unless the projection defines an alias for the column. - If the statement contains a JOIN and no GROUP BY clause, the key type of the resulting table - will match the type of the join columns and then key name will: - - FULL OUTER joins and joins on expressions other than column references will have a system - generated name in the form `KSQL_COL_n`, where `n` is a positive integer, unless the projection - defines an alias for the column. - - For other joins that contain at least one column reference in their join criteria, the name - will match the left most column reference in the join criteria. + matches the type of the join columns. The key name is defined by these conditions: + - FULL OUTER joins and joins on expressions other than column references have a + system-generated name in the form `KSQL_COL_n`, where `n` is a positive integer, + unless the projection defines an alias for the column. + - For other joins that contain at least one column reference in their join criteria, + the name matches the leftmost column reference in the join criteria. - If the statement contains a GROUP BY clause, the grouping columns determine the key of the resulting table. - - When grouping by a single column or STRUCT field: the name of the key column in the + - When grouping by a single column or STRUCT field, the name of the key column in the resulting table matches the name of the column or field, unless the projection includes - an alias for the column or field, and the type of the key column matches the column or field. - - When grouping by a single expression that is not a column or STRUCT field: the resulting table - will have a system generated key column name in the form `KSQL_COL_n`, where `n` is a positive - integer, unless the projection contains an alias for the expression, and the type of the - column will match the result of the expression. - - When grouping by multiple expressions: the resulting table will have a system generated key - name in the form `KSQL_COL_n`, where `n` is a positive integer, and the type of the column will - be a [SQL `STRING`](../../concepts/schemas), containing the grouping expressions concatenated + an alias for the column or field. The type of the key column matches the column or field. + - When grouping by a single expression that is not a column or STRUCT field, the resulting table + has a system-generated key column name in the form `KSQL_COL_n`, where `n` is a positive + integer, unless the projection contains an alias for the expression. The type of the + column matches the result of the expression. + - When grouping by multiple expressions, the resulting table has a system-generated key + name in the form `KSQL_COL_n`, where `n` is a positive integer. The type of the column is + a [SQL `STRING`](../../concepts/schemas), containing the grouping expressions concatenated together. The following example shows a `users` table joined with a `clicks` stream on the `userId` column. The `users` table has the correct primary key `userId` that coincides with the joining column. But the `clicks` stream -doesn't have a defined key, so ksqlDB must repartition it on the joining +doesn't have a defined key, so ksqlDB repartitions it on the joining column (`userId`) to assign the key before performing the join. ```sql @@ -137,8 +137,8 @@ the `INT` side to a `LONG`: Tables created on top of existing Kafka topics, for example those created with a `CREATE TABLE` statement, are keyed on the data held in the key of the records -in the Kafka topic. ksqlDB presents this data in the `PRIMARY KEY` column and expects -the data to be in the `KAFKA` format. +in the {{ site.ak }} topic. ksqlDB presents this data in the `PRIMARY KEY` column +and requires the data to be in the `KAFKA` format. Tables created inside ksqlDB from other sources, for example those created with a `CREATE TABLE AS SELECT` statement, will copy the key from their source(s) diff --git a/docs/developer-guide/ksqldb-reference/insert-values.md b/docs/developer-guide/ksqldb-reference/insert-values.md index ba2bf7415050..1e57c697dcd9 100644 --- a/docs/developer-guide/ksqldb-reference/insert-values.md +++ b/docs/developer-guide/ksqldb-reference/insert-values.md @@ -23,7 +23,7 @@ Description Produce a row into an existing stream or table and its underlying topic based on explicitly specified values. -If column names are specified, then the order of the values must match the +If column names are specified, the order of the values must match the order of the names. Any column not explicitly given a value is set to `null`. Pseudo columns, for example `ROWTIME`, may be provided. diff --git a/docs/developer-guide/syntax-reference.md b/docs/developer-guide/syntax-reference.md index 0504eb12c382..70cf132ee732 100644 --- a/docs/developer-guide/syntax-reference.md +++ b/docs/developer-guide/syntax-reference.md @@ -443,44 +443,45 @@ While tables require a `PRIMARY KEY`, the `KEY` column of a stream is optional. Joins involving tables can be joined to the table on the `PRIMARY KEY` column. Joins involving streams have no such limitation. Stream joins on any expression other than the stream's `KEY` -column will require an internal repartition, joins on the stream's `KEY` column do not. +column require an internal repartition, but joins on the stream's `KEY` column do not. !!! important - Be aware that Kafka guarantees the relative order of any two messages from + {{ site.ak }} guarantees the relative order of any two messages from one source partition only if they are also both in the same partition - after the repartition. Otherwise, Kafka is likely to interleave messages. - The use case will determine if these ordering guarantees are acceptable. + *after* the repartition. Otherwise, {{ site.ak }} is likely to interleave + messages. The use case will determine if these ordering guarantees are + acceptable. ### What To Do If Your Key Is Not Set or Is In A Different Format ### Streams For streams, just leave out the `KEY` column from the column list. -ksqlDB will take care of repartitioning the stream for you using the -value(s) from the `GROUP BY` columns for aggregates, and the join +ksqlDB takes care of repartitioning the stream for you, using the +value(s) from the `GROUP BY` columns for aggregates and the join criteria for joins. ### Tables -For tables, you can still use ksqlDB if the message key is not set or if it is -not in the required format as long as the key can be rebuilt from the value data -and *one* of the following statements is true: +For tables, you can still use ksqlDB if the message key isn't set or if +it isn't in the required format, as long as the key can be rebuilt from +the value data, and *one* of the following statements is true: - The message key is a [unary function](https://en.wikipedia.org/wiki/Unary_function) of the value in the desired key column. -- It is ok for the messages in the topic to be re-ordered before being - inserted into the table. +- It's acceptable for the messages in the topic to be re-ordered before + being inserted into the table. -First create a stream which we'll then use to have ksqlDB write the message key, and then -declare the table on the output topic of this stream: +First create a stream which you'll use to have ksqlDB write the message +key, and then declare the table on the output topic of this stream. Example: - Goal: You want to create a table from a topic, which is keyed by userid of type INT. - Problem: The required key is present as a column (aptly named - `userid`) in the message value as is a string containg an integer, + `userid`) in the message value, as is a string containing an integer, but the actual message key in {{ site.ak }} is not set or has some other value or format. @@ -493,10 +494,9 @@ CREATE STREAM users_with_wrong_key (userid INT, username VARCHAR, email VARCHAR) -- 1) The CAST statement converts userId to the required SQL type. -- 2) The PARTITION BY clause re-partitions the stream based on the new, converted key. -- 3) The SELECT clause selects the required value columns, all in this case. --- The resulting schema will be: KSQL_COL_0 INT KEY, USERNAME STRING, EMAIL STRING. --- Note: the system generated KSQL_COL_0 column name can be replaced via an alias in the projection --- however, this is not necessary in this instance as we do not intend to use this stream as a --- source for other queries. +-- The resulting schema is: KSQL_COL_0 INT KEY, USERNAME STRING, EMAIL STRING. +-- Note: the system generated KSQL_COL_0 column name can be replaced via an alias in the projection, +-- but this is not necessary in this case, because this stream is not a source for other queries. CREATE STREAM users_with_proper_key WITH(KAFKA_TOPIC='users-with-proper-key') AS SELECT * @@ -510,8 +510,8 @@ CREATE TABLE users_table (userId INT PRIMARY KEY, username VARCHAR, email VARCHA VALUE_FORMAT='JSON'); -- Or, if you prefer, you can keep userId in the value of the repartitioned data --- by using the AS_VALUE function: --- The resulting schema will be: userId INT KEY, USERNAME STRING, EMAIL STRING, VUSERID INT +-- by using the AS_VALUE function. +-- The resulting schema is: userId INT KEY, USERNAME STRING, EMAIL STRING, VUSERID INT CREATE STREAM users_with_proper_key_and_user_id WITH(KAFKA_TOPIC='users_with_proper_key_and_user_id') AS SELECT userId, username, email, AS_VALUE(userId) as vUserId diff --git a/docs/operate-and-deploy/installation/server-config/avro-schema.md b/docs/operate-and-deploy/installation/server-config/avro-schema.md index b170388707ad..48bc2c4f3e3f 100644 --- a/docs/operate-and-deploy/installation/server-config/avro-schema.md +++ b/docs/operate-and-deploy/installation/server-config/avro-schema.md @@ -26,13 +26,13 @@ create new nested STRUCT data as the result of a query. For more info, see The following functionality is not supported: - Message *keys* in Avro, Protobuf, or JSON formats are not supported. Message - keys in ksqlDB are always interpreted as KAFKA format, which means ksqlDB + keys in ksqlDB are always interpreted as `KAFKA` format, which means ksqlDB ignores schemas that have been registered for message keys. -While ksqlDB does not support loading the message key's schema from the {{ site.sr }}, +Although ksqlDB doesn't support loading the message key's schema from {{ site.sr }}, you can provide the key column definition within the `CREATE TABLE` or `CREATE STREAM` -statement. Where a `CREATE TABLE` or `CREATE STREAM` statement does not provide an -explicit key columns an implicit `ROWKEY STRING` column will be added. +statement. Where a `CREATE TABLE` or `CREATE STREAM` statement doesn't provide an +explicit key column, ksqlDB adds an implicit `ROWKEY STRING` column. Configure ksqlDB for Avro, Protobuf, and JSON ============================================= @@ -96,7 +96,7 @@ substitute `PROTOBUF` or `JSON_SR` for `AVRO` in each statement. The following statement shows how to create a new `pageviews` stream by reading from a {{ site.ak }} topic that has Avro-formatted message values and -a Kafka-formatted INT message key. +a `KAFKA`-formatted `INT` message key. ```sql CREATE STREAM pageviews @@ -105,8 +105,8 @@ CREATE STREAM pageviews VALUE_FORMAT='AVRO'); ``` -If the key schema is not provided, the key of the data will be assumed to be -a single KAFKA serialized `STRING` named `ROWKEY`. +If the key schema is not provided, the key of the data is assumed to be +a single `KAFKA` serialized `STRING` named `ROWKEY`. ### Create a new table by reading Avro-formatted data diff --git a/docs/operate-and-deploy/installation/upgrading.md b/docs/operate-and-deploy/installation/upgrading.md index 9c3c11014fd7..126cddfda2c2 100644 --- a/docs/operate-and-deploy/installation/upgrading.md +++ b/docs/operate-and-deploy/installation/upgrading.md @@ -11,32 +11,32 @@ keywords: ksqldb, install, upgrade ## Why does ksqlDB not currently support in-place upgrades? -Past releases of KSQL were backwards compatible. However, there was a cost to this backwards compatibility: -progress was slower and the code base incurred increased complexity. ksqlDB is a young product and -we're wanting to move fast, so have decided to choose speed of development over strong backwards -compatibility guarantees for a few released. +Past releases of KSQL were backward compatible. But there was a cost to this backward compatibility: +progress was slower and the code base incurred increased complexity. ksqlDB is a young product and +we want to move fast, so we have decided to choose speed of development over strong backward +compatibility guarantees for a few releases. -Until version 1.0 of ksqlDB, each minor release will potentially have breaking changes in it that -mean you can not simply update the ksqlDB binaries and restart the server(s). +Until version 1.0 of ksqlDB, each minor release will potentially have breaking changes in it, +which means that you can't simply update the ksqlDB binaries and restart the server(s). The data models and binary formats used within ksqlDB are in flux. This means data local to each -ksqlDB node and stored centrally within internal Kafka topics may not be compatible with the new -version you are trying to deploy. +ksqlDB node and stored centrally within internal {{ site.ak }} topics may not be compatible with +the new version you're trying to deploy. ## Should I upgrade? It's great that you're interested in trying out the new features and fixes that new versions of -ksqlDB bring. However, before rushing off to upgrade all your ksqlDB clusters ask yourself the -question "do I need to upgrade _this_ cluster"? +ksqlDB bring. But before rushing off to upgrade all your ksqlDB clusters, ask yourself, +"Do I need to upgrade *this* cluster"? -If you're running ksqlDB in production and you don't yet need the features or fixes the new version -brings, then consider delaying any upgrade until either another release has features or fixes you -need, or until ksqlDB reaches version 1.0 and therefore promises backwards compatibility. +If you're running ksqlDB in production, and you don't yet need the features or fixes the new version +brings, consider delaying any upgrade until either another release has features or fixes you +need, or until ksqlDB reaches version 1.0 and promises backward compatibility. ## How to upgrade Upgrading a cluster involves leaving the old cluster running on the old version, bringing up a new -cluster on the new version, porting across your database schema and finally thinking about your data. +cluster on the new version, porting across your database schema, and finally thinking about your data. ### Port the database schema @@ -111,36 +111,39 @@ This will stop all processing and delete any internal topics in Kafka. !!! important ksqlDB 0.10.0 is not backward compatible. Do not upgrade in-place. -The following changes in SQL syntax and functionality may mean SQL statements that previously ran no longer run: +The following changes in SQL syntax and functionality may mean SQL statements +that ran previously no longer run. -### WITH(KEY) syntax removed. +### WITH(KEY) syntax removed -In previous versions all key columns where called `ROWKEY`. To enable a more user friendly name to be -used for the key column in queries it was possible to supply an alias for the key column in the WITH -clause, for example: +In previous versions, all key columns where called `ROWKEY`. To enable a more +user-friendly name to be used for the key column in queries it was possible +to supply an alias for the key column in the WITH clause, for example: ```sql CREATE TABLE INPUT (ROWKEY INT PRIMARY KEY, ID INT, V0 STRING) WITH (key='ID', ...); ``` -With the above query the `ID` column can be used as an alias for `ROWKEY`. This approach required -the Kafka message value to contain an exact copy of the key. +With the previous query, the `ID` column can be used as an alias for `ROWKEY`. +This approach required the {{ site.ak }} message value to contain an exact copy +of the key. [KLIP-24](https://github.com/confluentinc/ksql/blob/master/design-proposals/klip-24-key-column-semantics-in-queries.md) removed the restriction that key columns must be named `ROWKEY`, negating the need for the `WITH(KEY)` -syntax, which has been removed, also removing the requirement for the Kafka message value to contain -an exact copy of the key. +syntax, which has been removed. Also, this change removed the requirement for +the {{ site.ak }} message value to contain an exact copy of the key. -Update your queries by removing the `KEY` fro the `WITH` clause and appropriately naming your -`KEY` and `PRIMARY KEY` columns. For example, the above CREATE TABLE statement can now be rewritten -as: +Update your queries by removing the `KEY` from the `WITH` clause and naming +your `KEY` and `PRIMARY KEY` columns appropriately. For example, the previous +CREATE TABLE statement can now be rewritten as: ```sql CREATE TABLE INPUT (ID INT PRIMARY KEY, V0 STRING) WITH (...); ``` -Unless the value format is `DELIMITED`, which means the value columns are _order dependant_, so dropping -the `ID` value column would result in a deserialization error or the wrong values being loaded. If using +Unless the value format is `DELIMITED`, which means the value columns are +*order dependant*, so dropping the `ID` value column would result in a +deserialization error or the wrong values being loaded. If you're using `DELIMITED`, consider rewriting as: ```sql @@ -152,18 +155,20 @@ CREATE TABLE INPUT (ID INT PRIMARY KEY, ignoreMe INT, V0 STRING) WITH (...); !!! important ksqlDB 0.9.0 is not backward compatible. Do not upgrade in-place. -The following changes in SQL syntax and functionality may mean SQL statements that previously ran no longer run: +The following changes in SQL syntax and functionality may mean SQL statements +that ran previously no longer run. ### Table PRIMARY KEYs Tables now use `PRIMARY KEY` to define their primary key column rather than `KEY`. -Update your `CREATE TABLE` statements as required. +Update your `CREATE TABLE` statements as required. For example, statements like +the this: ```sql CREATE TABLE OUTPUT (ROWKEY INT KEY, V0 STRING, V1 DOUBLE) WITH (...); ``` -Will need to be updated to: +Must be updated to: ```sql CREATE TABLE OUTPUT (ROWKEY INT PRIMARY KEY, V0 STRING, V1 DOUBLE) WITH (...); @@ -174,7 +179,8 @@ CREATE TABLE OUTPUT (ROWKEY INT PRIMARY KEY, V0 STRING, V1 DOUBLE) WITH (...); !!! important ksqlDB 0.7.0 is not backward compatible. Do not upgrade in-place. -The following changes in SQL syntax and functionality may mean SQL statements that previously ran no longer run: +The following changes in SQL syntax and functionality may mean SQL statements +that ran previously no longer run. ### `PARTITION BY` and `GROUP BY` result schema changes: diff --git a/docs/tutorials/basics-docker.md b/docs/tutorials/basics-docker.md index 2237a9751744..5308514c1dad 100644 --- a/docs/tutorials/basics-docker.md +++ b/docs/tutorials/basics-docker.md @@ -597,8 +597,8 @@ Query terminated !!! note The grouping columns `gender` and `regionid` are currently stored combined in a single key column. This will be rectified once multiple key columns - are supported. Until then, then columns are combined and the resulting column - if given a system generated column name. + are supported. Until then, these columns are combined and the resulting column + is given a system-generated column name. ### 8. View query results using a pull query diff --git a/docs/tutorials/examples.md b/docs/tutorials/examples.md index c88cff16d044..72960d5df890 100644 --- a/docs/tutorials/examples.md +++ b/docs/tutorials/examples.md @@ -42,14 +42,15 @@ CREATE STREAM pageviews ### Define Kafka message key -The previous SQL statement does not define a column to represent the data in the -{{ site.ak }} message key in the underlying {{ site.ak }} topic. The system therefore added a -`ROWKEY` column with type `STRING`. If your data does not contain a {{ site.ak }} serialized -`STRING` in the {{ site.ak }} message key you should not use `ROWKEY` in your SQL statements, -as the behaviour will be undefined. +The previous SQL statement doesn't define a column to represent the data in the +{{ site.ak }} message key in the underlying {{ site.ak }} topic, so the system added a +`ROWKEY` column with type `STRING`. If your data doesn't contain a {{ site.ak }} serialized +`STRING` in the {{ site.ak }} message key, don't use `ROWKEY` in your SQL statements, +or undefined behavior will result. Where the {{ site.ak }} message key is serialized in a key format ksqlDB supports, -(currently only `KAFKA`), you can specify the key in the column list of the CREATE STREAM statement. +(currently only `KAFKA`), you can specify the key in the column list of the +CREATE STREAM statement. ```sql CREATE STREAM pageviews