Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3463] [PySpark] aggregate and show spilled bytes in Python #2336

Closed
wants to merge 5 commits into from

Conversation

davies
Copy link
Contributor

@davies davies commented Sep 9, 2014

Aggregate the number of bytes spilled into disks during aggregation or sorting, show them in Web UI.

spilled

This patch is blocked by SPARK-3465. (It includes a fix for that).

@SparkQA
Copy link

SparkQA commented Sep 10, 2014

QA tests have started for PR 2336 at commit fbe9029.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 10, 2014

QA tests have finished for PR 2336 at commit fbe9029.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 10, 2014

QA tests have started for PR 2336 at commit fbe9029.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 10, 2014

QA tests have finished for PR 2336 at commit fbe9029.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 11, 2014

QA tests have started for PR 2336 at commit fbe9029.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 11, 2014

QA tests have started for PR 2336 at commit fbe9029.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 11, 2014

QA tests have finished for PR 2336 at commit fbe9029.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 11, 2014

QA tests have finished for PR 2336 at commit fbe9029.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 11, 2014

QA tests have started for PR 2336 at commit fbe9029.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 11, 2014

QA tests have finished for PR 2336 at commit fbe9029.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 11, 2014

QA tests have started for PR 2336 at commit 7e4ad04.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 11, 2014

QA tests have finished for PR 2336 at commit 7e4ad04.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -242,7 +242,8 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging {
t.taskMetrics)

// Overwrite task metrics
t.taskMetrics = Some(taskMetrics)
// FIXME: deepcopy the metrics, or they will be the same object in local mode
t.taskMetrics = Some(scala.util.Marshal.load[TaskMetrics](scala.util.Marshal.dump(taskMetrics)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to do something similar to what you did in #2338 here, i.e. do it only if this is local mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rebase it after #2338 is merged.

@SparkQA
Copy link

SparkQA commented Sep 12, 2014

QA tests have started for PR 2336 at commit 1245eb7.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 12, 2014

QA tests have finished for PR 2336 at commit 1245eb7.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class CreateTableAsSelect(
    • case class CreateTableAsSelect(

@@ -27,12 +27,11 @@
# copy_reg module.
from pyspark.accumulators import _accumulatorRegistry
from pyspark.broadcast import Broadcast, _broadcastRegistry
from pyspark.cloudpickle import CloudPickler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few lines prior to this, there was a comment

# CloudPickler needs to be imported so that depicklers are registered using the
# copy_reg module.

If this import is no longer necessary (was it ever?), then we should delete that comment, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldpickle is imported by serializers, so it's not needed here. The comments are removed.

@JoshRosen
Copy link
Contributor

This looks good to me.

@SparkQA
Copy link

SparkQA commented Sep 14, 2014

QA tests have started for PR 2336 at commit e37df38.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 14, 2014

QA tests have finished for PR 2336 at commit e37df38.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 4e3fbe8 Sep 14, 2014
@davies davies deleted the metrics branch September 15, 2014 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants