-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-30589][DOC] Document DISTRIBUTE BY Clause of SELECT statement in SQL Reference #27298
[SPARK-30589][DOC] Document DISTRIBUTE BY Clause of SELECT statement in SQL Reference #27298
Conversation
Test build #117140 has finished for PR 27298 at commit
|
limitations under the License. | ||
--- | ||
The <code>DISTRIBUTE BY</code> clause is used to repartition the data based | ||
on the input expressions. Unlike the `CLUSTER BY` clause, this does not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link to CLUSTER BY
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huaxingao will do it in the finalization pr when the links are available.
('John A', 18), | ||
('Jack N', 16); | ||
-- Reduce the number of shuffle partitions to 2 to illustrate the behaviour of `DISTRIBUTE BY`. | ||
-- Its easier to see the clustering and sorting behaviour with less number of partitions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its
-> it's
?
SET spark.sql.shuffle.partitions = 2; | ||
|
||
-- Select the rows with no ordering. Please note that without any sort directive, the results | ||
-- of the query is not deterministic. Its included here to just contrast it with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its
-> it's
?
-- Its easier to see the clustering and sorting behaviour with less number of partitions. | ||
SET spark.sql.shuffle.partitions = 2; | ||
|
||
-- Select the rows with no ordering. Please note that without any sort directive, the results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the results of the query is...
-> the result of the query is...
?
|
||
-- Select the rows with no ordering. Please note that without any sort directive, the results | ||
-- of the query is not deterministic. Its included here to just contrast it with the | ||
-- behaviour of `DISTRIBUTE BY`. The query below produces rows where age column are not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
age column are not...
-> age columns are not...
?
Test build #117148 has finished for PR 27298 at commit
|
|
||
### Syntax | ||
{% highlight sql %} | ||
DISTRIBUTE BY { expression [ , ...] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[ , ...]
-> [ , ... ]
?
Test build #117278 has finished for PR 27298 at commit
|
Test build #117300 has finished for PR 27298 at commit
|
Merged to master. @dilipbiswal did you want to make a follow up to link several pages? |
@srowen Thanks a lot Sean. Yeah.. I will do it today. |
What changes were proposed in this pull request?
Document DISTRIBUTE BY clause of SELECT statement in SQL Reference Guide.
Why are the changes needed?
Currently Spark lacks documentation on the supported SQL constructs causing
confusion among users who sometimes have to look at the code to understand the
usage. This is aimed at addressing this issue.
Does this PR introduce any user-facing change?
Yes.
Before:
There was no documentation for this.
After.
data:image/s3,"s3://crabby-images/79acf/79acf77e2edaf6278b3b01152e41ca138afb5bf6" alt="Screen Shot 2020-01-20 at 3 08 24 PM"
data:image/s3,"s3://crabby-images/0e0cb/0e0cbda09689512ff6e98cc85b2200a83642aa17" alt="Screen Shot 2020-01-20 at 3 08 34 PM"
How was this patch tested?
Tested using jykyll build --serve