Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: copy edit WITH(KEY) doc changes (DOCS-4374) #5417

Merged
merged 1 commit into from
May 20, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 19 additions & 19 deletions docs/developer-guide/joins/partition-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,34 +48,34 @@ given any name in the schema definition.
When you create a table by using a CREATE TABLE AS SELECT statement, the key of
the resulting table is determined as follows:

- If the statement contains neither a JOIN or GROUP BY clause, the key type of the resulting
- If the statement contains neither a JOIN nor GROUP BY clause, the key type of the resulting
table matches the key type of the source table, and the name matches the source unless the
projection defines an alias for the column.
- If the statement contains a JOIN and no GROUP BY clause, the key type of the resulting table
will match the type of the join columns and then key name will:
- FULL OUTER joins and joins on expressions other than column references will have a system
generated name in the form `KSQL_COL_n`, where `n` is a positive integer, unless the projection
defines an alias for the column.
- For other joins that contain at least one column reference in their join criteria, the name
will match the left most column reference in the join criteria.
matches the type of the join columns. The key name is defined by these conditions:
- FULL OUTER joins and joins on expressions other than column references have a
system-generated name in the form `KSQL_COL_n`, where `n` is a positive integer,
unless the projection defines an alias for the column.
- For other joins that contain at least one column reference in their join criteria,
the name matches the leftmost column reference in the join criteria.
- If the statement contains a GROUP BY clause, the grouping columns determine the key
of the resulting table.
- When grouping by a single column or STRUCT field: the name of the key column in the
- When grouping by a single column or STRUCT field, the name of the key column in the
resulting table matches the name of the column or field, unless the projection includes
an alias for the column or field, and the type of the key column matches the column or field.
- When grouping by a single expression that is not a column or STRUCT field: the resulting table
will have a system generated key column name in the form `KSQL_COL_n`, where `n` is a positive
integer, unless the projection contains an alias for the expression, and the type of the
column will match the result of the expression.
- When grouping by multiple expressions: the resulting table will have a system generated key
name in the form `KSQL_COL_n`, where `n` is a positive integer, and the type of the column will
be a [SQL `STRING`](../../concepts/schemas), containing the grouping expressions concatenated
an alias for the column or field. The type of the key column matches the column or field.
- When grouping by a single expression that is not a column or STRUCT field, the resulting table
has a system-generated key column name in the form `KSQL_COL_n`, where `n` is a positive
integer, unless the projection contains an alias for the expression. The type of the
column matches the result of the expression.
- When grouping by multiple expressions, the resulting table has a system-generated key
name in the form `KSQL_COL_n`, where `n` is a positive integer. The type of the column is
a [SQL `STRING`](../../concepts/schemas), containing the grouping expressions concatenated
together.

The following example shows a `users` table joined with a `clicks` stream
on the `userId` column. The `users` table has the correct primary key
`userId` that coincides with the joining column. But the `clicks` stream
doesn't have a defined key, so ksqlDB must repartition it on the joining
doesn't have a defined key, so ksqlDB repartitions it on the joining
column (`userId`) to assign the key before performing the join.

```sql
Expand Down Expand Up @@ -137,8 +137,8 @@ the `INT` side to a `LONG`:

Tables created on top of existing Kafka topics, for example those created with
a `CREATE TABLE` statement, are keyed on the data held in the key of the records
in the Kafka topic. ksqlDB presents this data in the `PRIMARY KEY` column and expects
the data to be in the `KAFKA` format.
in the {{ site.ak }} topic. ksqlDB presents this data in the `PRIMARY KEY` column
and requires the data to be in the `KAFKA` format.

Tables created inside ksqlDB from other sources, for example those created with
a `CREATE TABLE AS SELECT` statement, will copy the key from their source(s)
Expand Down
2 changes: 1 addition & 1 deletion docs/developer-guide/ksqldb-reference/insert-values.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Description
Produce a row into an existing stream or table and its underlying topic
based on explicitly specified values.

If column names are specified, then the order of the values must match the
If column names are specified, the order of the values must match the
order of the names. Any column not explicitly given a value is set to `null`.
Pseudo columns, for example `ROWTIME`, may be provided.

Expand Down
40 changes: 20 additions & 20 deletions docs/developer-guide/syntax-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -443,44 +443,45 @@ While tables require a `PRIMARY KEY`, the `KEY` column of a stream is optional.

Joins involving tables can be joined to the table on the `PRIMARY KEY` column. Joins involving
streams have no such limitation. Stream joins on any expression other than the stream's `KEY`
column will require an internal repartition, joins on the stream's `KEY` column do not.
column require an internal repartition, but joins on the stream's `KEY` column do not.

!!! important
Be aware that Kafka guarantees the relative order of any two messages from
{{ site.ak }} guarantees the relative order of any two messages from
one source partition only if they are also both in the same partition
after the repartition. Otherwise, Kafka is likely to interleave messages.
The use case will determine if these ordering guarantees are acceptable.
*after* the repartition. Otherwise, {{ site.ak }} is likely to interleave
messages. The use case will determine if these ordering guarantees are
acceptable.

### What To Do If Your Key Is Not Set or Is In A Different Format

### Streams

For streams, just leave out the `KEY` column from the column list.
ksqlDB will take care of repartitioning the stream for you using the
value(s) from the `GROUP BY` columns for aggregates, and the join
ksqlDB takes care of repartitioning the stream for you, using the
value(s) from the `GROUP BY` columns for aggregates and the join
criteria for joins.

### Tables

For tables, you can still use ksqlDB if the message key is not set or if it is
not in the required format as long as the key can be rebuilt from the value data
and *one* of the following statements is true:
For tables, you can still use ksqlDB if the message key isn't set or if
it isn't in the required format, as long as the key can be rebuilt from
the value data, and *one* of the following statements is true:

- The message key is a [unary
function](https://en.wikipedia.org/wiki/Unary_function) of the value
in the desired key column.
- It is ok for the messages in the topic to be re-ordered before being
inserted into the table.
- It's acceptable for the messages in the topic to be re-ordered before
being inserted into the table.

First create a stream which we'll then use to have ksqlDB write the message key, and then
declare the table on the output topic of this stream:
First create a stream which you'll use to have ksqlDB write the message
key, and then declare the table on the output topic of this stream.

Example:

- Goal: You want to create a table from a topic, which is keyed by
userid of type INT.
- Problem: The required key is present as a column (aptly named
`userid`) in the message value as is a string containg an integer,
`userid`) in the message value, as is a string containing an integer,
but the actual message key in {{ site.ak }} is not set or has some
other value or format.

Expand All @@ -493,10 +494,9 @@ CREATE STREAM users_with_wrong_key (userid INT, username VARCHAR, email VARCHAR)
-- 1) The CAST statement converts userId to the required SQL type.
-- 2) The PARTITION BY clause re-partitions the stream based on the new, converted key.
-- 3) The SELECT clause selects the required value columns, all in this case.
-- The resulting schema will be: KSQL_COL_0 INT KEY, USERNAME STRING, EMAIL STRING.
-- Note: the system generated KSQL_COL_0 column name can be replaced via an alias in the projection
-- however, this is not necessary in this instance as we do not intend to use this stream as a
-- source for other queries.
-- The resulting schema is: KSQL_COL_0 INT KEY, USERNAME STRING, EMAIL STRING.
-- Note: the system generated KSQL_COL_0 column name can be replaced via an alias in the projection,
-- but this is not necessary in this case, because this stream is not a source for other queries.
CREATE STREAM users_with_proper_key
WITH(KAFKA_TOPIC='users-with-proper-key') AS
SELECT *
Expand All @@ -510,8 +510,8 @@ CREATE TABLE users_table (userId INT PRIMARY KEY, username VARCHAR, email VARCHA
VALUE_FORMAT='JSON');

-- Or, if you prefer, you can keep userId in the value of the repartitioned data
-- by using the AS_VALUE function:
-- The resulting schema will be: userId INT KEY, USERNAME STRING, EMAIL STRING, VUSERID INT
-- by using the AS_VALUE function.
-- The resulting schema is: userId INT KEY, USERNAME STRING, EMAIL STRING, VUSERID INT
CREATE STREAM users_with_proper_key_and_user_id
WITH(KAFKA_TOPIC='users_with_proper_key_and_user_id') AS
SELECT userId, username, email, AS_VALUE(userId) as vUserId
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,13 @@ create new nested STRUCT data as the result of a query. For more info, see
The following functionality is not supported:

- Message *keys* in Avro, Protobuf, or JSON formats are not supported. Message
keys in ksqlDB are always interpreted as KAFKA format, which means ksqlDB
keys in ksqlDB are always interpreted as `KAFKA` format, which means ksqlDB
ignores schemas that have been registered for message keys.

While ksqlDB does not support loading the message key's schema from the {{ site.sr }},
Although ksqlDB doesn't support loading the message key's schema from {{ site.sr }},
you can provide the key column definition within the `CREATE TABLE` or `CREATE STREAM`
statement. Where a `CREATE TABLE` or `CREATE STREAM` statement does not provide an
explicit key columns an implicit `ROWKEY STRING` column will be added.
statement. Where a `CREATE TABLE` or `CREATE STREAM` statement doesn't provide an
explicit key column, ksqlDB adds an implicit `ROWKEY STRING` column.

Configure ksqlDB for Avro, Protobuf, and JSON
=============================================
Expand Down Expand Up @@ -96,7 +96,7 @@ substitute `PROTOBUF` or `JSON_SR` for `AVRO` in each statement.

The following statement shows how to create a new `pageviews` stream by
reading from a {{ site.ak }} topic that has Avro-formatted message values and
a Kafka-formatted INT message key.
a `KAFKA`-formatted `INT` message key.

```sql
CREATE STREAM pageviews
Expand All @@ -105,8 +105,8 @@ CREATE STREAM pageviews
VALUE_FORMAT='AVRO');
```

If the key schema is not provided, the key of the data will be assumed to be
a single KAFKA serialized `STRING` named `ROWKEY`.
If the key schema is not provided, the key of the data is assumed to be
a single `KAFKA` serialized `STRING` named `ROWKEY`.

### Create a new table by reading Avro-formatted data

Expand Down
70 changes: 38 additions & 32 deletions docs/operate-and-deploy/installation/upgrading.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,32 +11,32 @@ keywords: ksqldb, install, upgrade

## Why does ksqlDB not currently support in-place upgrades?

Past releases of KSQL were backwards compatible. However, there was a cost to this backwards compatibility:
progress was slower and the code base incurred increased complexity. ksqlDB is a young product and
we're wanting to move fast, so have decided to choose speed of development over strong backwards
compatibility guarantees for a few released.
Past releases of KSQL were backward compatible. But there was a cost to this backward compatibility:
progress was slower and the code base incurred increased complexity. ksqlDB is a young product and
we want to move fast, so we have decided to choose speed of development over strong backward
compatibility guarantees for a few releases.

Until version 1.0 of ksqlDB, each minor release will potentially have breaking changes in it that
mean you can not simply update the ksqlDB binaries and restart the server(s).
Until version 1.0 of ksqlDB, each minor release will potentially have breaking changes in it,
which means that you can't simply update the ksqlDB binaries and restart the server(s).

The data models and binary formats used within ksqlDB are in flux. This means data local to each
ksqlDB node and stored centrally within internal Kafka topics may not be compatible with the new
version you are trying to deploy.
ksqlDB node and stored centrally within internal {{ site.ak }} topics may not be compatible with
the new version you're trying to deploy.

## Should I upgrade?

It's great that you're interested in trying out the new features and fixes that new versions of
ksqlDB bring. However, before rushing off to upgrade all your ksqlDB clusters ask yourself the
question "do I need to upgrade _this_ cluster"?
ksqlDB bring. But before rushing off to upgrade all your ksqlDB clusters, ask yourself,
"Do I need to upgrade *this* cluster"?

If you're running ksqlDB in production and you don't yet need the features or fixes the new version
brings, then consider delaying any upgrade until either another release has features or fixes you
need, or until ksqlDB reaches version 1.0 and therefore promises backwards compatibility.
If you're running ksqlDB in production, and you don't yet need the features or fixes the new version
brings, consider delaying any upgrade until either another release has features or fixes you
need, or until ksqlDB reaches version 1.0 and promises backward compatibility.

## How to upgrade

Upgrading a cluster involves leaving the old cluster running on the old version, bringing up a new
cluster on the new version, porting across your database schema and finally thinking about your data.
cluster on the new version, porting across your database schema, and finally thinking about your data.

### Port the database schema

Expand Down Expand Up @@ -111,36 +111,39 @@ This will stop all processing and delete any internal topics in Kafka.
!!! important
ksqlDB 0.10.0 is not backward compatible. Do not upgrade in-place.

The following changes in SQL syntax and functionality may mean SQL statements that previously ran no longer run:
The following changes in SQL syntax and functionality may mean SQL statements
that ran previously no longer run.

### WITH(KEY) syntax removed.
### WITH(KEY) syntax removed

In previous versions all key columns where called `ROWKEY`. To enable a more user friendly name to be
used for the key column in queries it was possible to supply an alias for the key column in the WITH
clause, for example:
In previous versions, all key columns where called `ROWKEY`. To enable a more
user-friendly name to be used for the key column in queries it was possible
to supply an alias for the key column in the WITH clause, for example:

```sql
CREATE TABLE INPUT (ROWKEY INT PRIMARY KEY, ID INT, V0 STRING) WITH (key='ID', ...);
```

With the above query the `ID` column can be used as an alias for `ROWKEY`. This approach required
the Kafka message value to contain an exact copy of the key.
With the previous query, the `ID` column can be used as an alias for `ROWKEY`.
This approach required the {{ site.ak }} message value to contain an exact copy
of the key.

[KLIP-24](https://github.com/confluentinc/ksql/blob/master/design-proposals/klip-24-key-column-semantics-in-queries.md)
removed the restriction that key columns must be named `ROWKEY`, negating the need for the `WITH(KEY)`
syntax, which has been removed, also removing the requirement for the Kafka message value to contain
an exact copy of the key.
syntax, which has been removed. Also, this change removed the requirement for
the {{ site.ak }} message value to contain an exact copy of the key.

Update your queries by removing the `KEY` fro the `WITH` clause and appropriately naming your
`KEY` and `PRIMARY KEY` columns. For example, the above CREATE TABLE statement can now be rewritten
as:
Update your queries by removing the `KEY` from the `WITH` clause and naming
your `KEY` and `PRIMARY KEY` columns appropriately. For example, the previous
CREATE TABLE statement can now be rewritten as:

```sql
CREATE TABLE INPUT (ID INT PRIMARY KEY, V0 STRING) WITH (...);
```

Unless the value format is `DELIMITED`, which means the value columns are _order dependant_, so dropping
the `ID` value column would result in a deserialization error or the wrong values being loaded. If using
Unless the value format is `DELIMITED`, which means the value columns are
*order dependant*, so dropping the `ID` value column would result in a
deserialization error or the wrong values being loaded. If you're using
`DELIMITED`, consider rewriting as:

```sql
Expand All @@ -152,18 +155,20 @@ CREATE TABLE INPUT (ID INT PRIMARY KEY, ignoreMe INT, V0 STRING) WITH (...);
!!! important
ksqlDB 0.9.0 is not backward compatible. Do not upgrade in-place.

The following changes in SQL syntax and functionality may mean SQL statements that previously ran no longer run:
The following changes in SQL syntax and functionality may mean SQL statements
that ran previously no longer run.

### Table PRIMARY KEYs

Tables now use `PRIMARY KEY` to define their primary key column rather than `KEY`.
Update your `CREATE TABLE` statements as required.
Update your `CREATE TABLE` statements as required. For example, statements like
the this:

```sql
CREATE TABLE OUTPUT (ROWKEY INT KEY, V0 STRING, V1 DOUBLE) WITH (...);
```

Will need to be updated to:
Must be updated to:

```sql
CREATE TABLE OUTPUT (ROWKEY INT PRIMARY KEY, V0 STRING, V1 DOUBLE) WITH (...);
Expand All @@ -174,7 +179,8 @@ CREATE TABLE OUTPUT (ROWKEY INT PRIMARY KEY, V0 STRING, V1 DOUBLE) WITH (...);
!!! important
ksqlDB 0.7.0 is not backward compatible. Do not upgrade in-place.

The following changes in SQL syntax and functionality may mean SQL statements that previously ran no longer run:
The following changes in SQL syntax and functionality may mean SQL statements
that ran previously no longer run.

### `PARTITION BY` and `GROUP BY` result schema changes:

Expand Down
4 changes: 2 additions & 2 deletions docs/tutorials/basics-docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -597,8 +597,8 @@ Query terminated
!!! note
The grouping columns `gender` and `regionid` are currently stored combined
in a single key column. This will be rectified once multiple key columns
are supported. Until then, then columns are combined and the resulting column
if given a system generated column name.
are supported. Until then, these columns are combined and the resulting column
is given a system-generated column name.

### 8. View query results using a pull query

Expand Down
13 changes: 7 additions & 6 deletions docs/tutorials/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,14 +42,15 @@ CREATE STREAM pageviews

### Define Kafka message key

The previous SQL statement does not define a column to represent the data in the
{{ site.ak }} message key in the underlying {{ site.ak }} topic. The system therefore added a
`ROWKEY` column with type `STRING`. If your data does not contain a {{ site.ak }} serialized
`STRING` in the {{ site.ak }} message key you should not use `ROWKEY` in your SQL statements,
as the behaviour will be undefined.
The previous SQL statement doesn't define a column to represent the data in the
{{ site.ak }} message key in the underlying {{ site.ak }} topic, so the system added a
`ROWKEY` column with type `STRING`. If your data doesn't contain a {{ site.ak }} serialized
`STRING` in the {{ site.ak }} message key, don't use `ROWKEY` in your SQL statements,
or undefined behavior will result.

Where the {{ site.ak }} message key is serialized in a key format ksqlDB supports,
(currently only `KAFKA`), you can specify the key in the column list of the CREATE STREAM statement.
(currently only `KAFKA`), you can specify the key in the column list of the
CREATE STREAM statement.

```sql
CREATE STREAM pageviews
Expand Down