Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAST(col, VARCHAR) null values become the string "null" #4892

Closed
deuscapturus opened this issue Mar 25, 2020 · 1 comment · Fixed by #5769
Closed

CAST(col, VARCHAR) null values become the string "null" #4892

deuscapturus opened this issue Mar 25, 2020 · 1 comment · Fixed by #5769
Assignees
Labels
bug P0 Denotes must-have for a given milestone
Milestone

Comments

@deuscapturus
Copy link

deuscapturus commented Mar 25, 2020

Describe the bug

null columns that are CAST(col, VARCHAR) become the string "null", but should stay as null.

Example: INT columns that are CAST(col, VARCHAR) when null are mistakenly counted by the COUNT() aggregate function.

This may also effect other field type and aggregation functions. I've only tested this situation.

To Reproduce

KSQL 5.4.0

-- Create a stream and insert some data into it with null values
CREATE STREAM HAS_NULL_VALUES (USER VARCHAR, VIEWTIME INT) WITH (VALUE_FORMAT='Avro', PARTITIONS=1);

INSERT INTO HAS_NULL_VALUES (USER) VALUES ('amy');
INSERT INTO HAS_NULL_VALUES (USER, VIEWTIME) VALUES ('amy', 123);


-- Aggregate without CAST.  Result should be 1
CREATE TABLE COUNT_WITH_NULL WITH (VALUE_FORMAT='Avro') AS
SELECT USER, COUNT(VIEWTIME) FROM HAS_NULL_VALUES GROUP BY USER;

SELECT * FROM COUNT_WITH_NULL EMIT CHANGES LIMIT 1;


-- Aggregate with CAST.  Result should still be 1, but is isntead 2.
CREATE TABLE COUNT_WITH_CAST_NULL WITH (VALUE_FORMAT='Avro') AS
SELECT USER, COUNT(CAST(VIEWTIME AS VARCHAR)) FROM HAS_NULL_VALUES GROUP BY USER;

SELECT * FROM COUNT_WITH_CAST_NULL EMIT CHANGES LIMIT 1;

Expected behavior

The CAST null should not be counted, as it's value should still be null.

Actual behaviour

ksql> SELECT * FROM COUNT_WITH_NULL EMIT CHANGES LIMIT 1;
+---------------------+---------------------+---------------------+---------------------+
|ROWTIME              |ROWKEY               |USER                 |KSQL_COL_1           |
+---------------------+---------------------+---------------------+---------------------+
|1585140737589        |amy                  |amy                  |1                    |
Limit Reached
Query terminated
ksql> SELECT * FROM COUNT_WITH_CAST_NULL EMIT CHANGES LIMIT 1;
+---------------------+---------------------+---------------------+---------------------+
|ROWTIME              |ROWKEY               |USER                 |KSQL_COL_1           |
+---------------------+---------------------+---------------------+---------------------+
|1585140737589        |amy                  |amy                  |2                    |
Limit Reached
Query terminated
@deuscapturus deuscapturus changed the title null values are counted with COUNT() after CAST() CAST(col, VARCHAR) null values become the string "null" Mar 28, 2020
@derekjn derekjn added this to the 0.11.0 milestone Jun 12, 2020
@stevenpyzhang stevenpyzhang self-assigned this Jun 23, 2020
@vcrfxia vcrfxia added the P0 Denotes must-have for a given milestone label Jun 24, 2020
@stevenpyzhang
Copy link
Member

Confirmed this is still an issue in master as of today, looking into it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug P0 Denotes must-have for a given milestone
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants