Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4410][SQL] Add support for external sort #3268

Closed
wants to merge 3 commits into from

Conversation

marmbrus
Copy link
Contributor

Adds a new operator that uses Spark's ExternalSort class. It is off by default now, but we might consider making it the default if benchmarks show that it does not regress performance.

@SparkQA
Copy link

SparkQA commented Nov 14, 2014

Test build #23381 has started for PR 3268 at commit 82b787a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 14, 2014

Test build #23381 has finished for PR 3268 at commit 82b787a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class ExternalSort(

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23381/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Nov 14, 2014

Test build #23396 has started for PR 3268 at commit b98799d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 15, 2014

Test build #23396 has finished for PR 3268 at commit b98799d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class ExternalSort(

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23396/
Test PASSed.

@@ -189,6 +191,7 @@ case class TakeOrdered(limit: Int, sortOrder: Seq[SortOrder], child: SparkPlan)

/**
* :: DeveloperApi ::
* Performs a sort on-heap.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we document the parameters, e.g. "global" for both Sort and ExternalSort?

@rxin
Copy link
Contributor

rxin commented Nov 15, 2014

LGTM other than the minor comment.

One thing I noticed is that we'd want to control the closure size at some point. Right now the entire query plan is being captured by every stage.

@@ -17,6 +17,8 @@

package org.apache.spark.sql.execution

import org.apache.spark.util.collection.ExternalSorter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import order here

@SparkQA
Copy link

SparkQA commented Nov 17, 2014

Test build #23449 has started for PR 3268 at commit 48b9726.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 17, 2014

Test build #23449 has finished for PR 3268 at commit 48b9726.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class ExternalSort(

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23449/
Test PASSed.

@rxin
Copy link
Contributor

rxin commented Nov 17, 2014

Merging in master & branch-1.2. Thanks!

asfgit pushed a commit that referenced this pull request Nov 17, 2014
Adds a new operator that uses Spark's `ExternalSort` class.  It is off by default now, but we might consider making it the default if benchmarks show that it does not regress performance.

Author: Michael Armbrust <michael@databricks.com>

Closes #3268 from marmbrus/externalSort and squashes the following commits:

48b9726 [Michael Armbrust] comments
b98799d [Michael Armbrust] Add test
afd7562 [Michael Armbrust] Add support for external sort.

(cherry picked from commit 64c6b9b)
Signed-off-by: Reynold Xin <rxin@databricks.com>
@asfgit asfgit closed this in 64c6b9b Nov 17, 2014
@marmbrus marmbrus deleted the externalSort branch November 19, 2014 02:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants