-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20854][SQL] Extend hint syntax to support expressions #18086
Conversation
Test build #77300 has finished for PR 18086 at commit
|
@@ -533,13 +533,16 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging | |||
} | |||
|
|||
/** | |||
* Add a [[UnresolvedHint]] to a logical plan. | |||
* Add a [[UnresolvedHint]]s to a logical plan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove a
*/ | ||
private def withHints( | ||
ctx: HintContext, | ||
query: LogicalPlan): LogicalPlan = withOrigin(ctx) { | ||
val stmt = ctx.hintStatement | ||
UnresolvedHint(stmt.hintName.getText, stmt.parameters.asScala.map(_.getText), query) | ||
var plan = query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using foldLeft
instead of having a var
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I think foldLeft is almost always a bad idea ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used foldRight somewhere too. Why is it a bad idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i always find a loop simpler to reason about ...
@@ -25,7 +25,7 @@ import org.apache.spark.sql.internal.SQLConf | |||
* should be removed This node will be eliminated post analysis. | |||
* A pair of (name, parameters). | |||
*/ | |||
case class UnresolvedHint(name: String, parameters: Seq[String], child: LogicalPlan) | |||
case class UnresolvedHint(name: String, parameters: Seq[Any], child: LogicalPlan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we use Expression
as type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use Expression then either:
- Dataset.hint parameters should be Expression too, in which case you can't do
df.hint("hint", 1, 2, "c")
you'd have to dodf.hint("hint", Literal(1), Literal(2), Literal("c"))
or a shortcut if there is - Dataset.hint accepts Any but then has to convert Any to Expressions. One problem here is that Seq(1,2,3) can't be converted to Literal. So you have to use
df.hint("hint", Array(1,2,3))
The disadvantage of have Any in UnresolvedHint is that to resolve the hint you have to check both for String and Literal(String) but the API is easier to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can keep Any
in the API(df.hint(xxx)
), but use Expression
in UnresolvedHint
, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One useful hint parameter is a list of columns.
Something like df.hint("hint", $"table", Seq($"col1", $"col2", $"col3"))
In this case UnresolvedHint could be called like this:
UnresolvedHint(name: String, parameters: Seq(Expression, Seq[Expression]), child)
But if UnresolvedHint.parameters
is Seq[Expression]
then it's not possible to have this kind of hint.
Test build #77421 has finished for PR 18086 at commit
|
Test build #77424 has finished for PR 18086 at commit
|
@@ -371,7 +371,7 @@ querySpecification | |||
(RECORDREADER recordReader=STRING)? | |||
fromClause? | |||
(WHERE where=booleanExpression)?) | |||
| ((kind=SELECT hint? setQuantifier? namedExpressionSeq fromClause? | |||
| ((kind=SELECT (hints+=hint)* setQuantifier? namedExpressionSeq fromClause? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Hive and Oracle, multiple hints are put in the same /*+ */
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch supports both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile does hive support multiple /*+ */
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Hive does not support multiple /*+ */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not hurt anything if we support more hint styles, as long as they are user-friendly.
@@ -381,12 +381,12 @@ querySpecification | |||
; | |||
|
|||
hint | |||
: '/*+' hintStatement '*/' | |||
: '/*+' hintStatements+=hintStatement (hintStatements+=hintStatement)* '*/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the same block /*+ */
, multiple hints are separated by commas in Hive. However, in Oracle, it is separated by spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added support for optional comma
@@ -25,7 +25,7 @@ import org.apache.spark.sql.internal.SQLConf | |||
* should be removed This node will be eliminated post analysis. | |||
* A pair of (name, parameters). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs an update.
case tableName: String => tableName | ||
case tableId: UnresolvedAttribute => tableId.name | ||
case unsupported => throw new AnalysisException("Broadcast hint parameter should be " + | ||
s" identifier or string but was $unsupported (${unsupported.getClass}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: s" identifier or string
-> s"an identifier or string
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.spark.sql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally, we move such a test suite to org.apache.spark.sql.catalyst
. We just need to add hint
into org.apache.spark.sql.catalyst.dsl
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a new test for dsl. I also want a test that calls df.hint
parsePlan("SELECT /*+ HINT1(a, 1) hint2(b, 2) */ * from t"), | ||
UnresolvedHint("hint2", Seq($"b", Literal(2)), | ||
UnresolvedHint("HINT1", Seq($"a", Literal(1)), | ||
table("t").select(star()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Indent
@@ -25,7 +25,7 @@ import org.apache.spark.sql.internal.SQLConf | |||
* should be removed This node will be eliminated post analysis. | |||
* A pair of (name, parameters). | |||
*/ | |||
case class UnresolvedHint(name: String, parameters: Seq[String], child: LogicalPlan) | |||
case class UnresolvedHint(name: String, parameters: Seq[Any], child: LogicalPlan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To support multiple parameters in hint
, does it make sense to do it like df.hint("hint", "1, 2, c")
? We can use our Parser to parse this parameter string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that could be something extra. The DF API should accept scala expressions too: function calls (df.hint("hint", getInterestingValues()))
Test build #77531 has finished for PR 18086 at commit
|
why rename |
LGTM pending Jenkins |
Test build #77536 has finished for PR 18086 at commit
|
r1.hint("hint1"), | ||
UnresolvedHint("hint1", Seq(), | ||
r1 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we collapse it to the previous line?
r1.hint("hint1", 1, $"a"), | ||
UnresolvedHint("hint1", Seq(1, $"a"), | ||
r1 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
r1.hint("hint1", Seq(1, 2, 3), Seq($"a", $"b", $"c")), | ||
UnresolvedHint("hint1", Seq(Seq(1, 2, 3), Seq($"a", $"b", $"c")), | ||
r1 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@@ -407,7 +407,7 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging | |||
val withWindow = withDistinct.optionalMap(windows)(withWindows) | |||
|
|||
// Hint | |||
withWindow.optionalMap(hint)(withHints) | |||
hints.asScala.foldRight(withWindow)(withHints) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we construct the hint from right to left?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so that select /*+ hint1() /* /*+ hint2() */
produces Hint1(Hint2(plan))
and not Hint2(Hint1(plan))
. withHints
adds a Hint on top so the last one folded is the top most.
|
||
private def check(df: Dataset[_], expected: LogicalPlan) = { | ||
comparePlans( | ||
EliminateBarriers(df.queryExecution.logical), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that PR has been reverted, can you rebase?
Test build #77636 has finished for PR 18086 at commit
|
thanks, merging to master/2.2! |
SQL hint syntax: * support expressions such as strings, numbers, etc. instead of only identifiers as it is currently. * support multiple hints, which was missing compared to the DataFrame syntax. DataFrame API: * support any parameters in DataFrame.hint instead of just strings Existing tests. New tests in PlanParserSuite. New suite DataFrameHintSuite. Author: Bogdan Raducanu <bogdan@databricks.com> Closes #18086 from bogdanrdc/SPARK-20854. (cherry picked from commit 2134196) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
) | ||
|
||
comparePlans( | ||
parsePlan("SELECT /*+ HINT1(a, array(1, 2, 3)) */ * from t"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this test case redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, @bogdanrdc can you send a follow-up PR to clean it up?
What changes were proposed in this pull request?
SQL hint syntax:
DataFrame API:
How was this patch tested?
Existing tests. New tests in PlanParserSuite. New suite DataFrameHintSuite.