Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial support for arbitrary key column names. #4701

Merged
merged 3 commits into from
Mar 5, 2020

Conversation

big-andy-coates
Copy link
Contributor

@big-andy-coates big-andy-coates commented Mar 4, 2020

Description

Partial fix for: #3536

First part of supporting key column names other than ROWKEY.

With this initial pass you can now name your key columns anything you want in your CREATE TABLE and CREATE STREAM statements, e.g.

CREATE STREAM S (ID INT KEY, NAME STRING) WITH (...);

Any GROUP BY, PARTITION BY or JOIN on the key column results any created data source having a key column with a matching name, e.g.

-- schema of T: ID INT KEY, COUNT BIGINT
CREATE TABLE T AS SELECT COUNT() AS COUNT FROM S GROUP BY ID;

Pull and push queries work as expected and quoted identifiers work too.

However, this functionality is not complete yet. Hence it is guarded by the ksql.any.key.name.enabled feature flag, which defaults to off. The following big ticket items are remaining:

  • PARTITION BY a single value column should result in a stream with the key column that matches the value column name.
  • GROUP BY a single value column should result in a table with the key column that matches the value column name.
  • JOIN on a single value column should result in a stream/table with the key column that matches the value column name.

This additional work will be tracked under the same ticket, e.g. #3536

Reviewing notes:

Documentation changes will be done in a separate PR: #4686 (WIP)

Some changes are just renaming vars/funcs so that they no longer contain 'rowkey' in their names. This may seem unnecessary. However, use of ROWKEY is deeply ingrained in the code base. I'm having to search for 'rowkey' as part of this work. So these renames are mainly just so that they don't show up in my next search.

PR is broken down into the following commits to help with reviewing:

  1. Production code changes.
  2. Test code, including JSON based tests, changes.
    • Note: some of the QTT / RQTT tests now have duplicate versions: 1 version using ROWKEY and a new version using a custom key column name. The old ROWKEY version will be removed once this feature is complete.
  3. Historical test plans committed.

Testing done

Extensive (R)QTT testing

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

Partial fix for: confluentinc#3536

First part of supporting key column names other than `ROWKEY`.

With this initial pass you can now name your key columns anything you want in your `CREATE TABLE` and `CREATE STREAM` statements, e.g.

```sql
CREATE STREAM S (ID INT KEY, NAME STRING) WITH (...);
```

Any GROUP BY, PARTITION BY or JOIN on the key column results any created data source having a key column with a matching name, e.g.

```sql
-- schema of T: ID INT KEY, COUNT BIGINT
CREATE TABLE T AS SELECT COUNT() AS COUNT FROM S GROUP BY ID;
```

Pull and push queries work as expected and quoted identifiers work too.

However, this functionality is not complete yet.
Hence it is guarded by the `ksql.any.key.name.enabled` feature flag, which defaults to off.
The following big ticket items are remaining:

* PARTITION BY a single value column should result in a stream with the key column that matches the value column name.
* GROUP BY a single value column should result in a table with the key column that matches the value column name.
* JOIN on a single value column should  result in a stream/table with the key column that matches the value column name.

This additional work will be tracked under the same ticket, e.g. confluentinc#3536
@big-andy-coates big-andy-coates requested a review from a team as a code owner March 4, 2020 13:44
@big-andy-coates big-andy-coates changed the title Rowkey Partial support for arbitrary key column names. Mar 4, 2020
Copy link
Contributor

@purplefox purplefox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
I went through the main changes in more detail then just a quick skim of the rest.

Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -123,4 +129,19 @@ public KeyField getKeyField() {
getTimestampColumn()
);
}

private void validate() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have this, do we need #4697?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not. But #4697 makes things explicit. I'm not 100% convinced its an improvement or just noise though...

@big-andy-coates
Copy link
Contributor Author

Merging to avoid merge hell. @agavra : reach out if you still think we need any changes and I'll pick them up in the next PR.

Thanks for the reviews!

@big-andy-coates big-andy-coates merged commit eaa8f23 into confluentinc:master Mar 5, 2020
@big-andy-coates big-andy-coates deleted the rowkey branch March 5, 2020 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants