Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Window bounds such be accessible in projection for aggregated windowed joins #5931

Open
big-andy-coates opened this issue Aug 3, 2020 · 4 comments
Labels
bug names-and-aliases P1 Slightly lower priority to P0 ;) sql-syntax streaming-engine Tickets owned by the ksqlDB Streaming Team user-experience

Comments

@big-andy-coates
Copy link
Contributor

A query that includes a join and a windowed aggregation should allow users to access the window bounds columns in the projection, e.g.

SELECT 
     A.ID, 
     WINDOWSTART as WSTART, 
     WINDOWEND AS WEND, 
     COUNT(*)
  FROM A JOIN B on A.ID = B.ID 
  WINDOW TUMBLING (SIZE 10 MILLISECONDS) 
  GROUP BY A.ID;

Unfortunately, this currently fails with an unknown column error on WINDOWSTART.

The following QTT test case highlights this:

{
      "name": "windowed join with window bounds",
      "statements": [
        "CREATE STREAM A (ID VARCHAR KEY, col1 VARCHAR) WITH (kafka_topic='a', value_format='JSON');",
        "CREATE TABLE B (ID VARCHAR PRIMARY KEY, col1 VARCHAR) WITH (kafka_topic='b', value_format='JSON');",
        "CREATE TABLE C AS SELECT A.ID, collect_list(A.COL1), collect_list(B.COL1), WINDOWSTART as WSTART, WINDOWEND AS WEND FROM A JOIN B on A.ID = B.ID WINDOW TUMBLING (SIZE 10 MILLISECONDS) GROUP BY a.ID;"
      ],
      "inputs": [
        {"topic": "b", "key": "1", "value": {"col1": "B1"}},
        {"topic": "a", "key": "1", "value": {"col1": "A1"}},
        {"topic": "a", "key": "1", "value": {"col1": "A2"}},
        {"topic": "b", "key": "1", "value": {"col1": "B2"}},
        {"topic": "a", "key": "1", "value": {"col1": "A3"}},
        {"topic": "a", "key": "1", "value": {"col1": "A4"}, "timestamp": 12}
      ],
      "outputs": [
        {"topic": "C", "key": "1", "value": {"A_ID": "1", "KSQL_COL_1": ["A1"], "KSQL_COL_2": ["B1"]}, "window": {"start": 0, "end": 10, "type": "time"}},
        {"topic": "C", "key": "1", "value": {"A_ID": "1", "KSQL_COL_1": ["A1", "A2"], "KSQL_COL_2": ["B1", "B1"]}, "window": {"start": 0, "end": 10, "type": "time"}},
        {"topic": "C", "key": "1", "value": {"A_ID": "1", "KSQL_COL_1": ["A1", "A2", "A3"], "KSQL_COL_2": ["B1", "B1", "B2"]}, "window": {"start": 0, "end": 10, "type": "time"}},
        {"topic": "C", "key": "1", "value": {"A_ID": "1", "KSQL_COL_1": ["A4"], "KSQL_COL_2": ["B2"]}, "window": {"start": 10, "end": 20, "type": "time"}}
      ]
    }
@big-andy-coates
Copy link
Contributor Author

Note, it is the result of the join AB that is windowed, not the sources A and B.

The current implementation, which attempts to first analysis the query and then build a logical plan, fails to correctly handle this combination of JOIN + Windowed Aggregation. Enhancing it to support this may be tricky as later code expects all columns to come from either A or B. As the window bounds columns do not come from A or B, this will fail.

It may be possible to extend the current model, or it may be better to move the validation of columns into the logical model. The query analyser requires lots of conditional logic to handle different query types. This is painful to maintain and enhance. Moving the validation to the logical model would allow us to leverage the power of polymorphism.

big-andy-coates added a commit that referenced this issue Aug 3, 2020
fixes: #5898

This change fixes a regression introduced in v5.5.0 that meant any windowed aggregation with a join would fail with an `IllegalArgumentException`.

This change fixes the regression, how ever follow on work is required to allow access to the window bounds columns `WINDOWSTART` and `WINDOWEND` in such queries.  Access to these columns was not possible in v5.4, i.e. this is not a regression. The follow on work will be tracked under #5931.
@big-andy-coates
Copy link
Contributor Author

Added to the structured keys project so that its picked up eventually if the triage doesn't rate this as an important thing to fix asap.

@big-andy-coates
Copy link
Contributor Author

Duplicate of #4397?

@vcrfxia
Copy link
Contributor

vcrfxia commented Apr 20, 2021

As mentioned in #7369, at minimum the error message here stands to be improved.

@vcrfxia vcrfxia added streaming-engine Tickets owned by the ksqlDB Streaming Team user-experience P1 Slightly lower priority to P0 ;) labels Apr 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug names-and-aliases P1 Slightly lower priority to P0 ;) sql-syntax streaming-engine Tickets owned by the ksqlDB Streaming Team user-experience
Projects
None yet
Development

No branches or pull requests

4 participants