[SPARK-30589][DOC] Document DISTRIBUTE BY Clause of SELECT statement in SQL Reference #27298

dilipbiswal · 2020-01-20T23:08:59Z

What changes were proposed in this pull request?

Document DISTRIBUTE BY clause of SELECT statement in SQL Reference Guide.

Why are the changes needed?

Currently Spark lacks documentation on the supported SQL constructs causing
confusion among users who sometimes have to look at the code to understand the
usage. This is aimed at addressing this issue.

Does this PR introduce any user-facing change?

Yes.

Before:
There was no documentation for this.

After.

How was this patch tested?

Tested using jykyll build --serve

…in SQL Reference

SparkQA · 2020-01-20T23:20:21Z

Test build #117140 has finished for PR 27298 at commit e5dc12e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

huaxingao · 2020-01-21T00:25:25Z

docs/sql-ref-syntax-qry-select-distribute-by.md

+  limitations under the License.
+---
+The <code>DISTRIBUTE BY</code> clause is used to repartition the data based
+on the input expressions. Unlike the `CLUSTER BY` clause, this does not


link to CLUSTER BY?

@huaxingao will do it in the finalization pr when the links are available.

huaxingao · 2020-01-21T00:25:39Z

docs/sql-ref-syntax-qry-select-distribute-by.md

+                          ('John A', 18), 
+                          ('Jack N', 16);
+-- Reduce the number of shuffle partitions to 2 to illustrate the behaviour of `DISTRIBUTE BY`.
+-- Its easier to see the clustering and sorting behaviour with less number of partitions.


Its -> it's?

huaxingao · 2020-01-21T00:25:50Z

docs/sql-ref-syntax-qry-select-distribute-by.md

+SET spark.sql.shuffle.partitions = 2;
+
+-- Select the rows with no ordering. Please note that without any sort directive, the results
+-- of the query is not deterministic. Its included here to just contrast it with the 


Its -> it's?

huaxingao · 2020-01-21T00:27:27Z

docs/sql-ref-syntax-qry-select-distribute-by.md

+-- Its easier to see the clustering and sorting behaviour with less number of partitions.
+SET spark.sql.shuffle.partitions = 2;
+
+-- Select the rows with no ordering. Please note that without any sort directive, the results


the results of the query is... -> the result of the query is...?

huaxingao · 2020-01-21T00:28:11Z

docs/sql-ref-syntax-qry-select-distribute-by.md

+
+-- Select the rows with no ordering. Please note that without any sort directive, the results
+-- of the query is not deterministic. Its included here to just contrast it with the 
+-- behaviour of `DISTRIBUTE BY`. The query below produces rows where age column are not


age column are not... -> age columns are not...?

SparkQA · 2020-01-21T03:21:56Z

Test build #117148 has finished for PR 27298 at commit fbd4096.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

huaxingao · 2020-01-23T02:37:29Z

docs/sql-ref-syntax-qry-select-distribute-by.md

+
+### Syntax
+{% highlight sql %}
+DISTRIBUTE BY { expression [ , ...] }


[ , ...] -> [ , ... ]?

SparkQA · 2020-01-23T06:08:21Z

Test build #117278 has finished for PR 27298 at commit 7e40347.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-01-23T10:26:13Z

Test build #117300 has finished for PR 27298 at commit 96b5628.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2020-01-27T15:00:40Z

Merged to master. @dilipbiswal did you want to make a follow up to link several pages?

dilipbiswal · 2020-01-27T15:16:00Z

@srowen Thanks a lot Sean. Yeah.. I will do it today.

[SPARK-30589][DOC] Document DISTRIBUTE BY Clause of SELECT statement …

e5dc12e

…in SQL Reference

huaxingao reviewed Jan 21, 2020

View reviewed changes

Review

fbd4096

huaxingao reviewed Jan 23, 2020

View reviewed changes

Code review

7e40347

minor

96b5628

dongjoon-hyun added the DOCUMENTATION label Jan 24, 2020

srowen closed this in 5781e57 Jan 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-30589][DOC] Document DISTRIBUTE BY Clause of SELECT statement in SQL Reference #27298

[SPARK-30589][DOC] Document DISTRIBUTE BY Clause of SELECT statement in SQL Reference #27298

dilipbiswal commented Jan 20, 2020

SparkQA commented Jan 20, 2020

huaxingao Jan 21, 2020

dilipbiswal Jan 21, 2020

huaxingao Jan 21, 2020

huaxingao Jan 21, 2020

huaxingao Jan 21, 2020

huaxingao Jan 21, 2020

SparkQA commented Jan 21, 2020

huaxingao Jan 23, 2020

SparkQA commented Jan 23, 2020

SparkQA commented Jan 23, 2020

srowen commented Jan 27, 2020

dilipbiswal commented Jan 27, 2020

[SPARK-30589][DOC] Document DISTRIBUTE BY Clause of SELECT statement in SQL Reference #27298

[SPARK-30589][DOC] Document DISTRIBUTE BY Clause of SELECT statement in SQL Reference #27298

Conversation

dilipbiswal commented Jan 20, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Jan 20, 2020

huaxingao Jan 21, 2020

Choose a reason for hiding this comment

dilipbiswal Jan 21, 2020

Choose a reason for hiding this comment

huaxingao Jan 21, 2020

Choose a reason for hiding this comment

huaxingao Jan 21, 2020

Choose a reason for hiding this comment

huaxingao Jan 21, 2020

Choose a reason for hiding this comment

huaxingao Jan 21, 2020

Choose a reason for hiding this comment

SparkQA commented Jan 21, 2020

huaxingao Jan 23, 2020

Choose a reason for hiding this comment

SparkQA commented Jan 23, 2020

SparkQA commented Jan 23, 2020

srowen commented Jan 27, 2020

dilipbiswal commented Jan 27, 2020