Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20576][SQL] Support generic hint function in Dataset/DataFrame #17839

Closed
wants to merge 3 commits into from

Conversation

rxin
Copy link
Contributor

@rxin rxin commented May 3, 2017

What changes were proposed in this pull request?

We allow users to specify hints (currently only "broadcast" is supported) in SQL and DataFrame. However, while SQL has a standard hint format (/*+ ... */), DataFrame doesn't have one and sometimes users are confused that they can't find how to apply a broadcast hint. This ticket adds a generic hint function on DataFrame that allows using the same hint on DataFrames as well as SQL.

As an example, after this patch, the following will apply a broadcast hint on a DataFrame using the new hint function:

df1.join(df2.hint("broadcast"))

How was this patch tested?

Added a test case in DataFrameJoinSuite.

@SparkQA
Copy link

SparkQA commented May 3, 2017

Test build #76410 has started for PR 17839 at commit b84badc.

@gatorsmile
Copy link
Member

LGTM pending Jenkins

@rxin
Copy link
Contributor Author

rxin commented May 3, 2017

Actually somebody should add the Python / R wrapper.

cc @felixcheung and @zero323

@cloud-fan
Copy link
Contributor

LGTM

@zero323
Copy link
Member

zero323 commented May 3, 2017

Actually somebody should add the Python / R wrapper.

I can add both, once it is merged.

@SparkQA
Copy link

SparkQA commented May 3, 2017

Test build #3683 has finished for PR 17839 at commit b84badc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

just a thought - hint sounds fairly generic, especially in R as hint(df, ...)

@rxin
Copy link
Contributor Author

rxin commented May 3, 2017

Merging in master/branch-2.2.

@rxin
Copy link
Contributor Author

rxin commented May 3, 2017

@felixcheung do you worry about conflicts?

asfgit pushed a commit that referenced this pull request May 3, 2017
## What changes were proposed in this pull request?
We allow users to specify hints (currently only "broadcast" is supported) in SQL and DataFrame. However, while SQL has a standard hint format (/*+ ... */), DataFrame doesn't have one and sometimes users are confused that they can't find how to apply a broadcast hint. This ticket adds a generic hint function on DataFrame that allows using the same hint on DataFrames as well as SQL.

As an example, after this patch, the following will apply a broadcast hint on a DataFrame using the new hint function:

```
df1.join(df2.hint("broadcast"))
```

## How was this patch tested?
Added a test case in DataFrameJoinSuite.

Author: Reynold Xin <rxin@databricks.com>

Closes #17839 from rxin/SPARK-20576.

(cherry picked from commit 527fc5d)
Signed-off-by: Reynold Xin <rxin@databricks.com>
@rxin
Copy link
Contributor Author

rxin commented May 3, 2017

BTW I filed follow-up tickets for Python/R at https://issues.apache.org/jira/browse/SPARK-20576

@asfgit asfgit closed this in 527fc5d May 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants