You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For example, (taken from key-field.json and enhanced to check for repartition topic)
{
"name": "table | initially set | group by (same) | key in value | no aliasing",
"statements": [
"CREATE TABLE INPUT (foo INT, bar INT) WITH (kafka_topic='input_topic', key='foo', value_format='JSON');",
"CREATE TABLE OUTPUT AS SELECT foo, COUNT(*) FROM INPUT GROUP BY foo;"
],
"inputs": [
{"topic": "input_topic", "key": "1", "value": {"foo": 1, "bar": 2}}
],
"outputs": [
{"topic": "OUTPUT", "key": "1", "value": {"FOO": 1, "KSQL_COL_1": 1}}
],
"post": {
"sources": [
{"name": "OUTPUT", "type": "table", "keyField": {"name": "FOO", "legacyName": "KSQL_INTERNAL_COL_0", "legacySchema": "STRING"}}
],
"topics": {
"blacklist": ".*-repartition"
}
}
}
The above creates a table with a key field 'foo' and then does a group by on 'foo'. The topic/table is already keyed off 'foo', so no repartition should be required.
Admittedly, it's a bit of a strange GROUP BY, given there will be exactly one row per key.
So... we should either fix this so it doesn't repartition or just throw an error, given it's also possible to achieve the same output with the following:
Marked for release 6.0 as this would be a breaking change unless we added explicit handling, (i.e. complexity), to make this backwards compatible.
We could do it earlier and make it backwards compatible by effectively ignoring GROUP BY ROWKEY or GROUP BY keyField on tables for newer queries.
big-andy-coates
changed the title
Table GROUP BY on keyfield or ROWTIME does unnecessary repartition step
Table GROUP BY on keyfield or ROWKEY does unnecessary repartition step
Sep 17, 2019
For example, (taken from
key-field.json
and enhanced to check for repartition topic)The above creates a table with a key field 'foo' and then does a group by on 'foo'. The topic/table is already keyed off 'foo', so no repartition should be required.
Admittedly, it's a bit of a strange GROUP BY, given there will be exactly one row per key.
So... we should either fix this so it doesn't repartition or just throw an error, given it's also possible to achieve the same output with the following:
The text was updated successfully, but these errors were encountered: