Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove any aliases in Filter::try_new rather than erroring #11307

Merged
merged 4 commits into from
Jul 8, 2024

Conversation

samuelcolvin
Copy link
Contributor

@samuelcolvin samuelcolvin commented Jul 6, 2024

Which issue does this PR close?

fix #11306.

Rationale for this change

See datafusion-contrib/datafusion-functions-json#26 and #11306

What changes are included in this PR?

Remove the block on aliases as prediates, add test.

Are these changes tested?

Yes, the only way I could get a failure without this change.

Are there any user-facing changes?

AFAIK, we just allow something that should be allowed.

NOTE: Aliases were already allowed in binary expressions, just not as full predicates, so this restriction seemed unusually strict. Also the SQL parser wouldn't allow me to build such an expression directly in SQL, e.g. from x where (y = z) as the_alias was invalid SQL.

@github-actions github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate labels Jul 6, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @samuelcolvin

I think the original rationale as I remember for this check was to avoid optimizer passes creating ill formed plans.

I think @andygrove added this check in #3796 and while we can remove it I think there are other options.

You can't trigger this bug in the current sql implementation because it is not syntactically valid to do something like SELECT ... FROM foo WHERE my_user_function(x) as nice_name

However, the user defined planning doesn't have enough context to know if the expr is in the context of a select item (where aliases are allowed) or a filter clause (where aliases are not allowed)

Some other potential options:

  1. Strip aliases in the sql planner where they don't make sense (as in if an extension planner returns an Alias in WHERE, just remove the alias before creating the filter)
  2. Add another parameter to plan_binary_op that says if aliases are allowed or not (or maybe if it was in a SELECT list)

@andygrove , @jonahgao or @jackwener I wonder if you have any thoughts / opinions?

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@jonahgao
Copy link
Member

jonahgao commented Jul 8, 2024

I think aliases in a predicate will not affect functionality. They are just useless because they do not contribute to the output names of the filter plan, nor will any other expressions reference them(with the exception of subqueries).

If this is true, we can move the unalias predicate logic to Filter::try_new and convert this check into a test.

@samuelcolvin
Copy link
Contributor Author

I've updated this to un-alias the predicate.

I'm not sure if this was your suggestion, but I don't think it's necessarily to un-alias recursively within Filter::try_new — nested aliases were already accepted and are presumably elided by the optimiser already.

@@ -2130,16 +2130,10 @@ impl Filter {
}
}

// filter predicates should not be aliased
if let Expr::Alias(Alias { expr, name, .. }) = predicate {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about this check; it is not aligned with unalias_nested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a very old check and substantially predates unalias_nested so it may have become unaligned over time


Ok(Self { predicate, input })
Ok(Self {
predicate: predicate.unalias_nested().data,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought is to execute unalias_nested when creating the initial logical plan.

But the current implementation is also acceptable to me. Thank you @samuelcolvin . If we follow the current implementation, we also need to remove the redundant unalias_nested calls outside of try_new.

cc @alamb

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #11339 to track

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR to fix: #11340


Ok(Self { predicate, input })
Ok(Self {
predicate: predicate.unalias_nested().data,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unaliasing rather than throwing an exception here makes sense to me.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this PR and then address @jonahgao 's concerns / #11339 in a follow up

PR to do this is: #11340

@alamb
Copy link
Contributor

alamb commented Jul 8, 2024

Thanks everyone

@alamb alamb changed the title allow alias in predicate Remove any aliases in Filter::try_new rather than erroring Jul 8, 2024
@alamb alamb merged commit 9991144 into apache:main Jul 8, 2024
24 checks passed
@samuelcolvin samuelcolvin deleted the allow-predicate-alias branch July 8, 2024 20:04
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
…11307)

* allow alias in predicate, fix apache#11306

* Fix typo.

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* unalise predicate

* use unalias_nested

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jul 18, 2024
…11307)

* allow alias in predicate, fix apache#11306

* Fix typo.

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* unalise predicate

* use unalias_nested

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"Filter predicates should not be aliased." seems too strict
4 participants