-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-30581][DOC] Document SORT BY Clause of SELECT statement in SQLReference #27289
Conversation
Test build #117108 has finished for PR 27289 at commit
|
<dl> | ||
<dt><code><em>SORT BY</em></code></dt> | ||
<dd> | ||
Specifies a comma separated list of expression along with optional parameters sort_direction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comma-separated
expression -> expresisons
Back-tick things like sort_direction
, etc.
<dt><code><em>nulls_sort_order</em></code></dt> | ||
<dd> | ||
Optionally specifies whether NULL values are returned before/after non-NULL values, based on the | ||
sort direction. In spark, NULL values are considered to be lower than any non-NULL values. Therefore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark -> Spark. Are you describing a default here? make it explicit if so. Because this option lets you control that behavior, right?
Test build #117136 has finished for PR 27289 at commit
|
limitations under the License. | ||
--- | ||
The <code>SORT BY</code> clause is used to return the result rows sorted | ||
within each partition in the user specified order. When there are more than one partition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When there are more than one partition
-> When there is more than one partition
?
Test build #117159 has finished for PR 27289 at commit
|
|
||
### Syntax | ||
{% highlight sql %} | ||
SORT BY { expression [ sort_direction | nulls_sort_oder ] [ , ...] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually different from ORDER BY or are they aliases? do we need to copy the docs if so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen Yeah.. they are a bit different. ORDER BY performs a total sort where as SORT BY does sort within a partition. I kept them separate for only reason that to the best of my understanding "SORT BY" is not a frequently used clause. Its a hive compatibility thing. I thought of keeping them separate as with discussion with @gatorsmile we think we should have a top level link for "simple select" and "full select". In the "simple select" link, we will only include the most used clauses (i.e not include things like sort by, distribute by and cluster by).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK same comments here apply to the ORDER BY PR too then, I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If most parts are duplicate, how about working on ORDER BY
first, then copying the committed (final) version of ORDER BY
here for less review overheads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good @maropu.
Test build #117265 has finished for PR 27289 at commit
|
Test build #117279 has finished for PR 27289 at commit
|
Test build #117293 has finished for PR 27289 at commit
|
Test build #117299 has finished for PR 27289 at commit
|
Test build #117302 has finished for PR 27289 at commit
|
ea68204
to
6177310
Compare
Test build #117414 has finished for PR 27289 at commit
|
How about explicitly leaving some comments about the difference between ORDER BY and SORT BY behaviours somewhere in this doc? The other parts looks fine to me. |
@maropu I had tried to document this in the main description section like this :
What do you think ? |
@dilipbiswal Yea, that looks ok. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @srowen
Merged to master |
What changes were proposed in this pull request?
Document SORT BY clause of SELECT statement in SQL Reference Guide.
Why are the changes needed?
Currently Spark lacks documentation on the supported SQL constructs causing
confusion among users who sometimes have to look at the code to understand the
usage. This is aimed at addressing this issue.
Does this PR introduce any user-facing change?
Yes.
Before:
There was no documentation for this.
After.
data:image/s3,"s3://crabby-images/9a21a/9a21a9581b5637cca5a7b8bd4d6c033a6a44b28c" alt="Screen Shot 2020-01-20 at 1 25 57 AM"
data:image/s3,"s3://crabby-images/b76ac/b76ac38c247075e47e7522cd60fcb69c8db4d7b2" alt="Screen Shot 2020-01-20 at 1 26 11 AM"
data:image/s3,"s3://crabby-images/8b476/8b476704b81613fecc854a5d6ecb8f507d025362" alt="Screen Shot 2020-01-20 at 1 26 28 AM"
data:image/s3,"s3://crabby-images/c1658/c165815b0d9a3a67f2042857a4ce79972d4762a2" alt="Screen Shot 2020-01-20 at 1 26 46 AM"
data:image/s3,"s3://crabby-images/51813/51813a8c786562e403ae9d11599750741d4c62c8" alt="Screen Shot 2020-01-20 at 1 27 02 AM"
How was this patch tested?
Tested using jykyll build --serve