-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs for primitive key support #4478
Docs for primitive key support #4478
Conversation
ksqlDB now supports the following primitive key types: `INT`, `BIGINT`, `DOUBLE` as well as the existing `STRING` type. The key type can be defined in the CREATE TABLE or CREATE STREAM statement by including a column definition for `ROWKEY` in the form `ROWKEY <primitive-key-type> KEY,`, for example: ```sql CREATE TABLE USERS (ROWKEY BIGINT KEY, NAME STRING, RATING DOUBLE) WITH (kafka_topic='users', VALUE_FORMAT='json'); ``` ksqlDB currently requires the name of the key column to be `ROWKEY`. Support for arbitrary key names is tracked by confluentinc#3536. ksqlDB currently requires keys to use the `KAFKA` format. Support for additional formats is tracked by https://github.com/confluentinc/ksql/projects/3. Schema inference currently only works with `STRING` keys, Support for additional key types is tracked by confluentinc#4462. (Schema inference is where ksqlDB infers the schema of a CREATE TABLE and CREATE STREAM statements from the schema registered in the Schema Registry, as opposed to the user supplying the set of columns in the statement). Apache Kafka Connect can be configured to output keys in the `KAFKA` format by using a Converter, e.g. `"key.converter": "org.apache.kafka.connect.converters.IntegerConverter"`. Details of which converter to use for which key type can be found here: https://docs.confluent.io/current/ksql/docs/developer-guide/serialization.html#kafka in the `Connect Converter` column. @rmoff has written an introductory blog about primitive keys: https://rmoff.net/2020/02/07/primitive-keys-in-ksqldb/ BREAKING CHANGE: existing queries that perform a PARTITION BY or GROUP BY on a single column of one of the above supported primitive key types will now set the key to the appropriate type, not a `STRING` as previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some reviewing notes....
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
CREATE STREAM clicks (userId INT, url STRING) WITH(kafka_topic='clickstream', value_format='json'); | ||
|
||
-- table with BIGINT userId stored in they key: | ||
CREATE TABLE users (ROWKEY BIGINT KEY, fullName STRING) WITH(kafka_topic='users', value_format='json'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to have the example above explicitly name userID
as the KEY
- I know this isn't necessary for the example, but I think it's good to have an example of a stream with a key declared using the KEY
syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've deliberately not added the WITH KEY bit as its not required. The example only has what's needed. Adding additional stuff if just noise IMHO and can lead to confusion.
Plus I intend to drop the whole WITH KEY thing soon.
So do you mind if we leave this as it is?
@@ -398,14 +398,16 @@ message key by setting the `KEY` property of the `WITH` clause. | |||
Example: | |||
|
|||
```sql | |||
CREATE TABLE users (registertime BIGINT, gender VARCHAR, regionid VARCHAR, userid VARCHAR) | |||
CREATE TABLE users (rowkey INT KEY, registertime BIGINT, gender VARCHAR, regionid VARCHAR, userid INT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all the examples have added ROWKEY <TYPE> KEY
- is this now a requirement or is it just illustrative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not currently required. If you don't add one KSQL currently defaults to ROWKEY STRING KEY
. However, we should encourage people to be explicit, rather than rely on implicit.
I've not documented this yet as its a bit meh. Ideally, if you don't supply the key column it should mean there is no key column. We can update the docs when this is the case.
ksql-benchmark/src/main/java/io/confluent/ksql/benchmark/SerdeBenchmark.java
Show resolved
Hide resolved
Comments above addressed in #4551 |
Description
Fixes: #4143
This PR updates the docs to reflect the new primitive key functionality and some code changes to update DataGen.
ksqlDB now supports the following primitive key types:
INT
,BIGINT
,DOUBLE
as well as the existingSTRING
type.The key type can be defined in the CREATE TABLE or CREATE STREAM statement by including a column definition for
ROWKEY
in the formROWKEY <primitive-key-type> KEY,
, for example:ksqlDB currently requires the name of the key column to be
ROWKEY
. Support for arbitrary key names is tracked by Primitive Keys: allow key names other than ROWKEY #3536.ksqlDB currently requires keys to use the
KAFKA
format. Support for additional formats is tracked by https://github.com/confluentinc/ksql/projects/3.Schema inference currently only works with
STRING
keys, Support for additional key types is tracked by Retrieve key schemas from the schema registry #4462. (Schema inference is where ksqlDB infers the schema of a CREATE TABLE and CREATE STREAM statements from the schema registered in the Schema Registry, as opposed to the user supplying the set of columns in the statement).Apache Kafka Connect can be configured to output keys in the
KAFKA
format by using a Converter, e.g."key.converter": "org.apache.kafka.connect.converters.IntegerConverter"
. Details of which converter to use for which key type can be found here: https://docs.confluent.io/current/ksql/docs/developer-guide/serialization.html#kafka in theConnect Converter
column.@rmoff has written an introductory blog about primitive keys: https://rmoff.net/2020/02/07/primitive-keys-in-ksqldb/
BREAKING CHANGE: existing queries that perform a PARTITION BY or GROUP BY on a single column of one of the above supported primitive key types will now set the key to the appropriate type, not a
STRING
as previously.Testing done
Ran through the docker and non-docker examples, ensuring statements work, updating statements and expected output where necessary.
Reviewer checklist