Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-30579][DOC] Document ORDER BY Clause of SELECT statement in SQL Reference #27288

Closed
wants to merge 7 commits into from

Conversation

dilipbiswal
Copy link
Contributor

What changes were proposed in this pull request?

Document ORDER BY clause of SELECT statement in SQL Reference Guide.

Why are the changes needed?

Currently Spark lacks documentation on the supported SQL constructs causing
confusion among users who sometimes have to look at the code to understand the
usage. This is aimed at addressing this issue.

Does this PR introduce any user-facing change?

Yes.

Before:
There was no documentation for this.

After.
Screen Shot 2020-01-19 at 11 50 57 PM
Screen Shot 2020-01-19 at 11 51 14 PM
Screen Shot 2020-01-19 at 11 51 33 PM

How was this patch tested?

Tested using jykyll build --serve

@SparkQA
Copy link

SparkQA commented Jan 20, 2020

Test build #117094 has finished for PR 27288 at commit 581a1e9.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jan 20, 2020

Test build #117101 has finished for PR 27288 at commit 581a1e9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 20, 2020

Test build #117137 has finished for PR 27288 at commit cb78d8d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -18,5 +18,121 @@ license: |
See the License for the specific language governing permissions and
limitations under the License.
---
The <code>ORDER BY</code> clause is used to return the result rows in a sorted manner
in the user specified order. Unlike the <code>SORT BY</code> clause, this clause guarantees
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to SORT BY?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huaxingao will do it in the finalization pr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK if you want to take care of that at the end

<dl>
<dt><code><em>ORDER BY</em></code></dt>
<dd>
Specifies a comma separated list of expressions along with optional parameters <code>sort_direction</code>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comma separated -> comma-separated?
I caught this because I saw Sean's comment in another PR :)

Actually in many places we have comma separated, for example,

An optional parameter that specifies a comma separated list of key and value pairs for partitions.

Not sure if I should open a MINOR PR to fix all of them.

@huaxingao
Copy link
Contributor

I happen to see this:
image
Could you please fix this when you update the menu-sql.yaml?

@dilipbiswal
Copy link
Contributor Author

Could you please fix this when you update the menu-sql.yaml?

Sure.. will do.

@SparkQA
Copy link

SparkQA commented Jan 21, 2020

Test build #117156 has finished for PR 27288 at commit d1fb7bd.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -18,5 +18,121 @@ license: |
See the License for the specific language governing permissions and
limitations under the License.
---
The <code>ORDER BY</code> clause is used to return the result rows in a sorted manner
in the user specified order. Unlike the <code>SORT BY</code> clause, this clause guarantees
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK if you want to take care of that at the end

Optionally specifies whether to sort the rows in ascending (lowest to highest) or descending
(highest to lowest) order. The valid values for sort direction are <code>ASC</code> for ascending
and <code>DESC</code> for descending. If sort direction is not explicitly specified then by default
rows are sorted in ascending manner. <br><br>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just "sorted ascending"

sort direction. In Spark, NULL values are considered to be lower than any non-NULL values by default.
Therefore the ordering of NULL values depend on the sort direction.<br><br>
<ol>
<li>If the sort order is ASC, NULLS are returned first; to force NULLS to be last, use NULLS LAST</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might turn this around, to emphasize the null sort order: "If NULLS FIRST (the default), then NULLs are returned first if sort order is ASC, and last if sort order is DESC" and likewise for the second items.

Copy link
Contributor Author

@dilipbiswal dilipbiswal Jan 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen Hmmn.. trying to think if we are conveying properly with the above re-wording. So

  • If if NULLS FIRST is explicitly specified, you would ALWAYS see NULLs at the top of your resultset no matter what the sort order is.
  • if NULLS LAST is explicitly specified , you would ALWAYS see NULLS at the end no matter what the sort order is

.What do you think ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, my mistake, is that how it works? so NULLS FIRST puts nulls first no matter what the sort order? Maybe we can take the focus off of the sort order then. For example NULLS LAST does the same thing no matter what the sort order.

Maybe like: "NULLS FIRST forces NULLs to sort before all non-NULL values, regardless of the sort order" and likewise for NULLS LAST, and then explain that, if not specified, NULLs sort first if ASC and last if DESC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen Got it.. Will make the change.

@SparkQA
Copy link

SparkQA commented Jan 22, 2020

Test build #117263 has finished for PR 27288 at commit a9dfbb4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -18,5 +18,125 @@ license: |
See the License for the specific language governing permissions and
limitations under the License.
---
The <code>ORDER BY</code> clause is used to return the result rows in a sorted manner
in the user specified order. Unlike the <code>SORT BY</code> clause, this clause guarantees
total order in the output.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total order -> a total order?

</dd>
<dt><code><em>sort_direction</em></code></dt>
<dd>
Optionally specifies whether to sort the rows in ascending (lowest to highest) or descending
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need the statement (lowest to highest)? ascending or descending is enough?

<dt><code><em>sort_direction</em></code></dt>
<dd>
Optionally specifies whether to sort the rows in ascending (lowest to highest) or descending
(highest to lowest) order. The valid values for sort direction are <code>ASC</code> for ascending
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort direction -> the sort direction?

@SparkQA
Copy link

SparkQA commented Jan 23, 2020

Test build #117280 has finished for PR 27288 at commit ff61fa8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 23, 2020

Test build #117292 has finished for PR 27288 at commit 7e46a5f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 23, 2020

Test build #117297 has finished for PR 27288 at commit c409609.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu are you OK with this? looks OK to me. If so we can move on to ORDER BY.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it looks fine. Thanks, @dilipbiswal !

@maropu maropu closed this in d5b92b2 Jan 26, 2020
@maropu
Copy link
Member

maropu commented Jan 26, 2020

Thanks! Merged to master.

@maropu
Copy link
Member

maropu commented Jan 26, 2020

@dilipbiswal ok, lets move on to SORT BY.

@dilipbiswal
Copy link
Contributor Author

Thanks a lot @srowen @huaxingao @maropu

@maropu Sure.. Let me double check on SORT by.

@maropu
Copy link
Member

maropu commented Jan 26, 2020

Thanks alot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants