feat: Allows lots of table scans cases where keys cannot easily be extracted #7155

AlanConfluent · 2021-03-04T02:08:08Z

Description

This allows many cases when table scans are enabled. Prior to this, a comparison must have been of the form KEY=12345 or possibly KEY=1 + 2 where there was a column on one side and a resolvable expression on the other. When table scans are enabled, now these can be done:

Both sides are column references (e.g. KEY = COL)
Neither side is a column reference (e.g. 5 > 3, KEY + 1 = COL + 1 )
One side is a column reference and the other is an unresolveable expression (e.g. KEY = COL + 10)

Resolves #4484

Testing done

RQTT tests and unit tests

Reviewer checklist

Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
Ensure relevant issues are linked (description should include text like "Fixes #")

agavra

LGTM! Nice quick win in usability

ksqldb-engine/src/main/java/io/confluent/ksql/engine/generic/GenericRecordFactory.java

agavra · 2021-03-04T02:30:55Z

ksqldb-engine/src/main/java/io/confluent/ksql/planner/plan/PullFilterNode.java

@@ -345,6 +363,28 @@ private void setTableScanOrElseThrow(final Supplier<KsqlException> exceptionSupp
    }
  }

+  private final class NonColumnRefValidator extends TraversalExpressionVisitor<Object> {


Suggested change

private final class NonColumnRefValidator extends TraversalExpressionVisitor<Object> {

private final class HasColumnRef extends TraversalExpressionVisitor<Object> {

It was really tough to grok nonColumnRefUnresolvable - it looks like what this does is answer "did we visit a column reference?"

Yeah, you're right that this is a more descriptive name. I had originally thought it might do other checks, but I'll go with HasColumnRef.

agavra · 2021-03-04T02:33:49Z

...test/resources/rest-query-validation-tests/pull-queries-against-materialized-aggregates.json

+      "statements": [
+        "CREATE STREAM INPUT (ID INTEGER KEY, IGNORED INT) WITH (kafka_topic='test_topic', value_format='JSON');",
+        "CREATE TABLE AGGREGATE AS SELECT ID, COUNT(1) AS COUNT FROM INPUT GROUP BY ID;",
+        "SELECT * FROM AGGREGATE WHERE ID = 20 - 10;",


SELECT * FROM AGGREGATE WHERE ID = 20 - 10;

at some point, this should result in a Key lookup right? What's the status of #4484 after we merge this PR? We might want to consider renaming that one (or opening a new one). I think it's important to consider that alongside this PR because we'd probably want a single-pass strategy (naively, we could just use the GenericExpressionResolver to try to resolve the non-key side of the equation and if it throws we assume it's not resolvable).

Yes, this should result in a key lookup. It should fall into the case of non column ref needs to be resolved.

I think #4484 should be resolved after this.

I was debating whether to do it that way, and I'm not opposed to doing it like that. The current code is a common pattern where there's a validation and an extraction pass. In this case, it makes dealing with the consequences a bit easier (we can just declare table scan required with the other similar checks and avoid extraction). Doing some of this in extraction would just slightly complicate that code a bit. Or did you mean merging Validation and Extraction more broadly to a single pass?

Ah ignore that original comment, I hadn't properly understood the code on my first pass. Originally I had thought that we would table scan for ID = 20 - 10 but now I'm seeing that we don't. I think the current implementation is fine.

guozhangwang

Made a pass, overall LGTM!

guozhangwang · 2021-03-04T23:42:40Z

ksqldb-engine/src/main/java/io/confluent/ksql/planner/plan/PullFilterNode.java

+          ksqlConfig.getBoolean(KsqlConfig.KSQL_QUERY_PULL_INTERPRETER_ENABLED)
+      ).resolve(other);
+
+      if (obj instanceof Integer || obj instanceof Long) {


Should we allow float/double to be casted to INT as well?

The GenericExpressionResolver is in strict mode which I don't believe allows this. We don't do a lot of conversions elsewhere automatically, such as with the key values (and would be a bit harder to generify), so I think this if fine for consistency.

guozhangwang · 2021-03-04T23:53:46Z

ksqldb-engine/src/main/java/io/confluent/ksql/planner/plan/PullFilterNode.java

      final UnqualifiedColumnReferenceExp column = getColumnRefSide(node);
+      final Expression other = getNonColumnRefSide(node);


nit: I'm wondering if we can merge these two conditions into a single one, like:

first = getFirstColumnRefSide(); // if both are column ref, return left; if non are column ref, return null; if (first != null) { second = getOtherSide(); secondHasColumnRef = new ... ; secondHasColumnRef.process(second); if (second isColumnRef || secondHasColumnRef.hasColumnRef) setTableScan(); } else { setTableScan(); // i.e. for `WHERE 100 = 100` or `100 = 101`, we would always do table scan }

Sure, I reoriented the logic similar to this and allowed getFirstColumnRefSide to return null so that it can be checked for, allowing me to get rid of the other method.

guozhangwang · 2021-03-04T23:55:28Z

ksqldb-engine/src/main/java/io/confluent/ksql/planner/plan/PullFilterNode.java

@@ -295,7 +297,23 @@ public Void visitComparisonExpression(
        final ComparisonExpression node,
        final Object context
    ) {
+      if (!isSingleColumnReference(node)) {


Just occurred to me: if the WHERE clause is just 100 = 101 then we would still do a table scan instead of immediately return empty values?

That's unfortunately the case today. I have a ticket #6973 which generally covers this kind of thing. You can simplify key ranges or even see that none exist at all, or that 100 = 101 can be simplified without any row information. Feel free to either add some cases to that ticket or even create another, so that we don't forget all of the optimization ideas we have.

AlanConfluent added 2 commits March 3, 2021 17:00

feat: Allows pull query filters that aren't of the form col op literal

f0eaf15

Adds more cases

ff09ea0

AlanConfluent requested a review from a team as a code owner March 4, 2021 02:08

agavra approved these changes Mar 4, 2021

View reviewed changes

More test cases

02153be

guozhangwang mentioned this pull request Mar 4, 2021

feat: Adds Lambda functionality to the interpreter #7152

Merged

2 tasks

Feedback

4ace270

AlanConfluent changed the title ~~feat: Allows lots of non simple key=literal cases when table scans are enabled~~ feat: Allows lots of table scans cases where keys cannot easily be extracted Mar 4, 2021

guozhangwang reviewed Mar 4, 2021

View reviewed changes

Feedback

d84c56f

guozhangwang approved these changes Mar 5, 2021

View reviewed changes

AlanConfluent added 4 commits March 8, 2021 09:38

Adds a new comment

a533927

Fix lint, making class static

5c9fb9a

Fixes test break -- error message

72646ab

Fixes RQTT

ed3dc63

AlanConfluent merged commit 71becea into master Mar 9, 2021

AlanConfluent deleted the allow_non_col_comparison branch March 9, 2021 00:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Allows lots of table scans cases where keys cannot easily be extracted #7155

feat: Allows lots of table scans cases where keys cannot easily be extracted #7155

AlanConfluent commented Mar 4, 2021 •

edited

Loading

agavra left a comment

agavra Mar 4, 2021

AlanConfluent Mar 4, 2021

agavra Mar 4, 2021

AlanConfluent Mar 4, 2021

agavra Mar 4, 2021

guozhangwang left a comment

guozhangwang Mar 4, 2021

AlanConfluent Mar 5, 2021

guozhangwang Mar 4, 2021

AlanConfluent Mar 5, 2021

guozhangwang Mar 4, 2021

AlanConfluent Mar 5, 2021

	private final class NonColumnRefValidator extends TraversalExpressionVisitor<Object> {
	private final class HasColumnRef extends TraversalExpressionVisitor<Object> {

		final UnqualifiedColumnReferenceExp column = getColumnRefSide(node);
		final Expression other = getNonColumnRefSide(node);

feat: Allows lots of table scans cases where keys cannot easily be extracted #7155

feat: Allows lots of table scans cases where keys cannot easily be extracted #7155

Conversation

AlanConfluent commented Mar 4, 2021 • edited Loading

Description

Testing done

Reviewer checklist

agavra left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guozhangwang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlanConfluent commented Mar 4, 2021 •

edited

Loading