-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix GROUP BY semantics for keys with any names #4898
Comments
big-andy-coates
added a commit
to big-andy-coates/ksql
that referenced
this issue
Mar 26, 2020
fixes: confluentinc#4898 This commit sees the result of a GROUP BY on a single column reference have a schema with a key column matching the name of the column, e.g. ```sql -- source schema: A -> B, C CREATE STREAM OUTPUT AS SELECT COUNT(1) AS COUNT FROM INPUT GROUP BY B; -- output schema: B -> COUNT ``` If the GROUP BY is on anything other than a single column reference then the key column will be a unique generated column name, e.g. ```sql -- source schema: A -> B, C CREATE STREAM OUTPUT AS SELECT COUNT(1) FROM INPUT GROUP BY B+1; -- output schema: KSQL_COL_1 -> KSQL_COL_0 (Both names are generated) ``` BREAKING CHANGE: Existing queries that reference a single GROUP BY column in the projection would fail if they were resubmitted, due to a duplicate column. The same existing queries will continue to run if already running, i.e. this is only a change for newly submitted queries. Existing queries will use the old query semantics.
2 tasks
agavra
pushed a commit
to big-andy-coates/ksql
that referenced
this issue
Mar 26, 2020
fixes: confluentinc#4898 This commit sees the result of a GROUP BY on a single column reference have a schema with a key column matching the name of the column, e.g. ```sql -- source schema: A -> B, C CREATE STREAM OUTPUT AS SELECT COUNT(1) AS COUNT FROM INPUT GROUP BY B; -- output schema: B -> COUNT ``` If the GROUP BY is on anything other than a single column reference then the key column will be a unique generated column name, e.g. ```sql -- source schema: A -> B, C CREATE STREAM OUTPUT AS SELECT COUNT(1) FROM INPUT GROUP BY B+1; -- output schema: KSQL_COL_1 -> KSQL_COL_0 (Both names are generated) ``` BREAKING CHANGE: Existing queries that reference a single GROUP BY column in the projection would fail if they were resubmitted, due to a duplicate column. The same existing queries will continue to run if already running, i.e. this is only a change for newly submitted queries. Existing queries will use the old query semantics.
big-andy-coates
added a commit
that referenced
this issue
Mar 27, 2020
* chore: add GROUP BY support for any key names fixes: #4898 This commit sees the result of a GROUP BY on a single column reference have a schema with a key column matching the name of the column, e.g. ```sql -- source schema: A -> B, C CREATE STREAM OUTPUT AS SELECT COUNT(1) AS COUNT FROM INPUT GROUP BY B; -- output schema: B -> COUNT ``` If the GROUP BY is on anything other than a single column reference then the key column will be a unique generated column name, e.g. ```sql -- source schema: A -> B, C CREATE STREAM OUTPUT AS SELECT COUNT(1) FROM INPUT GROUP BY B+1; -- output schema: KSQL_COL_1 -> KSQL_COL_0 (Both names are generated) ``` BREAKING CHANGE: Existing queries that reference a single GROUP BY column in the projection would fail if they were resubmitted, due to a duplicate column. The same existing queries will continue to run if already running, i.e. this is only a change for newly submitted queries. Existing queries will use the old query semantics. Co-authored-by: Big Andy Coates <andy@confluent.io>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
If grouping by a single column, e.g.
GROUP BY B
, then the schema of the result should have a column namedB
, notROWKEY
.If grouping by something other than a single column, then we should generate a unique column name, e.g.
KSQL_COL_0
.Also, note we'll need a slight change in semantics:
An old style GROUP BY a single column might look like:
Moving that same persistent query to the new world of any key name goes and we run into a problem:
Hence, in the new world, the above statement will be rejected. This seems fine to me as the data for column
B
is already in the key! If the user wants the data in the value they can just add an aliases.The text was updated successfully, but these errors were encountered: