Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs for primitive key support #4478

Merged
merged 80 commits into from
Feb 7, 2020

Conversation

big-andy-coates
Copy link
Contributor

@big-andy-coates big-andy-coates commented Feb 7, 2020

Description

Fixes: #4143

This PR updates the docs to reflect the new primitive key functionality and some code changes to update DataGen.

  • ksqlDB now supports the following primitive key types: INT, BIGINT, DOUBLE as well as the existing STRING type.

    The key type can be defined in the CREATE TABLE or CREATE STREAM statement by including a column definition for ROWKEY in the form ROWKEY <primitive-key-type> KEY,, for example:

    CREATE TABLE USERS (ROWKEY BIGINT KEY, NAME STRING, RATING DOUBLE) WITH (kafka_topic='users', VALUE_FORMAT='json');
  • ksqlDB currently requires the name of the key column to be ROWKEY. Support for arbitrary key names is tracked by Primitive Keys: allow key names other than ROWKEY #3536.

  • ksqlDB currently requires keys to use the KAFKA format. Support for additional formats is tracked by https://github.com/confluentinc/ksql/projects/3.

  • Schema inference currently only works with STRING keys, Support for additional key types is tracked by Retrieve key schemas from the schema registry  #4462. (Schema inference is where ksqlDB infers the schema of a CREATE TABLE and CREATE STREAM statements from the schema registered in the Schema Registry, as opposed to the user supplying the set of columns in the statement).

  • Apache Kafka Connect can be configured to output keys in the KAFKA format by using a Converter, e.g. "key.converter": "org.apache.kafka.connect.converters.IntegerConverter". Details of which converter to use for which key type can be found here: https://docs.confluent.io/current/ksql/docs/developer-guide/serialization.html#kafka in the Connect Converter column.

  • @rmoff has written an introductory blog about primitive keys: https://rmoff.net/2020/02/07/primitive-keys-in-ksqldb/

BREAKING CHANGE: existing queries that perform a PARTITION BY or GROUP BY on a single column of one of the above supported primitive key types will now set the key to the appropriate type, not a STRING as previously.

Testing done

Ran through the docker and non-docker examples, ensuring statements work, updating statements and expected output where necessary.

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

ksqlDB now supports the following primitive key types: `INT`, `BIGINT`, `DOUBLE` as well as the existing `STRING` type.

The key type can be defined in the CREATE TABLE or CREATE STREAM statement by including a column definition for `ROWKEY` in the form `ROWKEY <primitive-key-type> KEY,`, for example:

```sql
CREATE TABLE USERS (ROWKEY BIGINT KEY, NAME STRING, RATING DOUBLE) WITH (kafka_topic='users', VALUE_FORMAT='json');
```

ksqlDB currently requires the name of the key column to be `ROWKEY`. Support for arbitrary key names is tracked by confluentinc#3536.

ksqlDB currently requires keys to use the `KAFKA` format. Support for additional formats is tracked by https://github.com/confluentinc/ksql/projects/3.

Schema inference currently only works with `STRING` keys, Support for additional key types is tracked by confluentinc#4462. (Schema inference is where ksqlDB infers the schema of a CREATE TABLE and CREATE STREAM statements from the schema registered in the Schema Registry, as opposed to the user supplying the set of columns in the statement).

Apache Kafka Connect can be configured to output keys in the `KAFKA` format by using a Converter, e.g. `"key.converter": "org.apache.kafka.connect.converters.IntegerConverter"`. Details of which converter to use for which key type can be found here: https://docs.confluent.io/current/ksql/docs/developer-guide/serialization.html#kafka in the `Connect Converter` column.

@rmoff has written an introductory blog about primitive keys: https://rmoff.net/2020/02/07/primitive-keys-in-ksqldb/

BREAKING CHANGE: existing queries that perform a PARTITION BY or GROUP BY on a single column of one of the above supported primitive key types will now set the key to the appropriate type, not a `STRING` as previously.
@big-andy-coates big-andy-coates requested review from JimGalasyn and a team as code owners February 7, 2020 16:06
Copy link
Contributor Author

@big-andy-coates big-andy-coates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some reviewing notes....

big-andy-coates and others added 2 commits February 7, 2020 17:10
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
big-andy-coates and others added 24 commits February 7, 2020 18:11
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
Co-Authored-By: Jim Galasyn <jim.galasyn@confluent.io>
@big-andy-coates big-andy-coates merged commit ddf09d7 into confluentinc:master Feb 7, 2020
@big-andy-coates big-andy-coates deleted the prim_key_docs branch February 7, 2020 18:25
Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

CREATE STREAM clicks (userId INT, url STRING) WITH(kafka_topic='clickstream', value_format='json');

-- table with BIGINT userId stored in they key:
CREATE TABLE users (ROWKEY BIGINT KEY, fullName STRING) WITH(kafka_topic='users', value_format='json');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to have the example above explicitly name userID as the KEY - I know this isn't necessary for the example, but I think it's good to have an example of a stream with a key declared using the KEY syntax

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've deliberately not added the WITH KEY bit as its not required. The example only has what's needed. Adding additional stuff if just noise IMHO and can lead to confusion.

Plus I intend to drop the whole WITH KEY thing soon.

So do you mind if we leave this as it is?

@@ -398,14 +398,16 @@ message key by setting the `KEY` property of the `WITH` clause.
Example:

```sql
CREATE TABLE users (registertime BIGINT, gender VARCHAR, regionid VARCHAR, userid VARCHAR)
CREATE TABLE users (rowkey INT KEY, registertime BIGINT, gender VARCHAR, regionid VARCHAR, userid INT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the examples have added ROWKEY <TYPE> KEY - is this now a requirement or is it just illustrative?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not currently required. If you don't add one KSQL currently defaults to ROWKEY STRING KEY. However, we should encourage people to be explicit, rather than rely on implicit.

I've not documented this yet as its a bit meh. Ideally, if you don't supply the key column it should mean there is no key column. We can update the docs when this is the case.

@big-andy-coates
Copy link
Contributor Author

Comments above addressed in #4551

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Primitive Keys: Update docs for primitive keys
3 participants