refactor: do not allow duplicates in schemas by default #4697

big-andy-coates · 2020-03-03T21:02:03Z

Description

KsqlDB does not support data sources with duplicate column names in the key and value. (Mainly due to the fact that it copies the key columns into the value while processing with Kafka Streams).

It therefore makes sense, to me at least, that our LogicalSchema rejects duplicates by default, but allows them if explicitly told to do so. Doesn't change anything functionally, but does mean code is more explicit about when it does and does not allow duplicates.

Thoughts? Is this an improvement or just noise?

Testing done

Usual.

Reviewer checklist

Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
Ensure relevant issues are linked (description should include text like "Fixes #")

KsqlDB does not support data sources with duplicate column names in the key and value. (Mainly due to the fact that it copies the key columns into the value while processing with Kafka Streams). It therefore makes sense, to me at least, that our `LogicalSchema` rejects duplicates by default, but allows them if explicitly told to do so. Doesn't change anything functionally, but does mean code is more explicit about when it does and does not allow duplicates.

agavra · 2020-03-04T00:12:50Z

KsqlDB does not support data sources with duplicate column names in the key and value.

How does it fail currently?

big-andy-coates · 2020-03-04T10:56:41Z

KsqlDB does not support data sources with duplicate column names in the key and value.

How does it fail currently?

Because key columns must be named ROWKEY and ROWKEY is a system column name, and you're not allowed to have a persistent query with a a system column name in the value schema.

However, I'm dropping the requirement that key columns must be called ROWKEY ...

agavra · 2020-03-04T17:15:45Z

ah yes... that makes sense. In that case I think this PR is good - i'll take a look at the code itself in a bit

agavra · 2020-03-10T22:20:36Z

ksql-common/src/main/java/io/confluent/ksql/schema/ksql/LogicalSchema.java

+     *
+     * @return self.
+     */
+    public Builder allowDuplicates() {


as discussed offline, I think having this makes things a little more confusing. If we wanted to make it strongly typed, I think we should have a different class that allows duplicates but I think that's overkill

Yep, cool, my thinking was going down the same route.

big-andy-coates requested a review from a team as a code owner March 3, 2020 21:02

agavra mentioned this pull request Mar 4, 2020

Partial support for arbitrary key column names. #4701

Merged

2 tasks

agavra reviewed Mar 10, 2020

View reviewed changes

big-andy-coates closed this Mar 11, 2020

big-andy-coates deleted the schema_duplicates branch March 11, 2020 09:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: do not allow duplicates in schemas by default #4697

refactor: do not allow duplicates in schemas by default #4697

big-andy-coates commented Mar 3, 2020

agavra commented Mar 4, 2020

big-andy-coates commented Mar 4, 2020

agavra commented Mar 4, 2020

agavra Mar 10, 2020

big-andy-coates Mar 11, 2020

refactor: do not allow duplicates in schemas by default #4697

refactor: do not allow duplicates in schemas by default #4697

Conversation

big-andy-coates commented Mar 3, 2020

Description

Testing done

Reviewer checklist

agavra commented Mar 4, 2020

big-andy-coates commented Mar 4, 2020

agavra commented Mar 4, 2020

agavra Mar 10, 2020

Choose a reason for hiding this comment

big-andy-coates Mar 11, 2020

Choose a reason for hiding this comment