-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3463] [PySpark] aggregate and show spilled bytes in Python #2336
Conversation
QA tests have started for PR 2336 at commit
|
QA tests have finished for PR 2336 at commit
|
QA tests have started for PR 2336 at commit
|
QA tests have finished for PR 2336 at commit
|
QA tests have started for PR 2336 at commit
|
QA tests have started for PR 2336 at commit
|
QA tests have finished for PR 2336 at commit
|
QA tests have finished for PR 2336 at commit
|
QA tests have started for PR 2336 at commit
|
QA tests have finished for PR 2336 at commit
|
QA tests have started for PR 2336 at commit
|
QA tests have finished for PR 2336 at commit
|
@@ -242,7 +242,8 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging { | |||
t.taskMetrics) | |||
|
|||
// Overwrite task metrics | |||
t.taskMetrics = Some(taskMetrics) | |||
// FIXME: deepcopy the metrics, or they will be the same object in local mode | |||
t.taskMetrics = Some(scala.util.Marshal.load[TaskMetrics](scala.util.Marshal.dump(taskMetrics))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to do something similar to what you did in #2338 here, i.e. do it only if this is local mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will rebase it after #2338 is merged.
QA tests have started for PR 2336 at commit
|
QA tests have finished for PR 2336 at commit
|
@@ -27,12 +27,11 @@ | |||
# copy_reg module. | |||
from pyspark.accumulators import _accumulatorRegistry | |||
from pyspark.broadcast import Broadcast, _broadcastRegistry | |||
from pyspark.cloudpickle import CloudPickler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few lines prior to this, there was a comment
# CloudPickler needs to be imported so that depicklers are registered using the
# copy_reg module.
If this import is no longer necessary (was it ever?), then we should delete that comment, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couldpickle is imported by serializers, so it's not needed here. The comments are removed.
This looks good to me. |
QA tests have started for PR 2336 at commit
|
QA tests have finished for PR 2336 at commit
|
Aggregate the number of bytes spilled into disks during aggregation or sorting, show them in Web UI.
This patch is blocked by SPARK-3465. (It includes a fix for that).