-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3577] Report Spill size on disk for UnsafeExternalSorter #17471
Conversation
Test build #75363 has finished for PR 17471 at commit
|
cc - @rxin, @kayousterhout, @squito |
cc @cloud-fan / @ueshin / @sameeragarwal can you review this? |
SPARK-3577 is about reporting spill time, seems not related to this PR? |
@cloud-fan - There is discussion in the JIRA to report the disk spill size as well. I can create a new JIRA if you prefer. |
this is basically the same as this old pr, right? #15616 the last comment on there is a request for a test, which I think still applies here. I also think a separate jira would be clearer. |
I did not see we already have an open PR for this. Sure, I will add test to this PR and also file a separate JIRA. |
@sitalkedia any updates here? |
@sameeragarwal - Thanks for taking a look. I will update the PR adding test case soon. |
I was just looking though PRs for my curiosity. Please let me leave a gentle ping @sitalkedia. |
4546d43
to
da1f384
Compare
Test build #78348 has started for PR 17471 at commit |
Thanks for the ping @HyukjinKwon. Updated the PR with test case. |
test this please |
Test build #78363 has started for PR 17471 at commit |
Jenkins retest this please. |
@shaneknapp, @sameeragarwal - Not able to access the jenkins build link- https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78405/testReport, something wrong with the jenkins sever? |
I think the power is still out in the CS building at Berkeley because of the earthquake, so I'm guessing Jenkins is down as a result (note that even the vanilla AMP website doesn't work: http://amplab.cs.berkeley.edu/) |
it was actually a cut power line due to street construction. apparently
power is back online, and we're waiting for machines and switches to finish
booting back up.
i'm WFH today, so i'm monitoring things remotely.
…On Wed, Jun 21, 2017 at 2:52 PM, Kay Ousterhout ***@***.***> wrote:
I think the power is still out in the CS building at Berkeley because of
the earthquake, so I'm guessing Jenkins is down as a result (note that even
the vanilla AMP website doesn't work: http://amplab.cs.berkeley.edu/)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17471 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABiDrCpWyketc1oi9h0i7qKbvgEEVIbnks5sGZCtgaJpZM4MtZcI>
.
|
also, see my message to the dev@ list. jenkins is up, but not reachable
via the amplab link.
…On Wed, Jun 21, 2017 at 3:23 PM, shane knapp ☠ ***@***.***> wrote:
it was actually a cut power line due to street construction. apparently
power is back online, and we're waiting for machines and switches to finish
booting back up.
i'm WFH today, so i'm monitoring things remotely.
On Wed, Jun 21, 2017 at 2:52 PM, Kay Ousterhout ***@***.***>
wrote:
> I think the power is still out in the CS building at Berkeley because of
> the earthquake, so I'm guessing Jenkins is down as a result (note that even
> the vanilla AMP website doesn't work: http://amplab.cs.berkeley.edu/)
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#17471 (comment)>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/ABiDrCpWyketc1oi9h0i7qKbvgEEVIbnks5sGZCtgaJpZM4MtZcI>
> .
>
|
Test build #78405 has finished for PR 17471 at commit
|
record[0] = (long) i; | ||
sorter.insertRecord(record, Platform.LONG_ARRAY_OFFSET, recordSize, 0, false); | ||
} | ||
assertTrue(sorter.getNumberOfAllocatedPages() >= 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some comments to explain the math here? I don't quite understand why it should be >= 2
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment to make it clear.
LGTM, shall we fix it for other sorters? e.g. |
@@ -540,6 +538,7 @@ public long spill() throws IOException { | |||
inMemSorter.free(); | |||
inMemSorter = null; | |||
taskContext.taskMetrics().incMemoryBytesSpilled(released); | |||
taskContext.taskMetrics().incDiskBytesSpilled(writeMetrics.bytesWritten()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we create a new writeMetrics here, instead of report this.writeMetrics.bytesWritten
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're spilling, so bytes written should be counted towards spill rather than write.
Approach similar to - https://github.com/sitalkedia/spark/blob/master/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java#L138
Actually other sorters are using UnsafeExternalSorter internally(https://github.com/sitalkedia/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java#L67), so this change should fix others as well. |
Test build #78821 has finished for PR 17471 at commit
|
Jenkins retest this please. |
Test build #78830 has finished for PR 17471 at commit
|
retest this please |
Test build #78851 has finished for PR 17471 at commit
|
thanks, merging to master! |
## What changes were proposed in this pull request? Report Spill size on disk for UnsafeExternalSorter ## How was this patch tested? Tested by running a job on cluster and verify the spill size on disk. Author: Sital Kedia <skedia@fb.com> Closes apache#17471 from sitalkedia/fix_disk_spill_size.
What changes were proposed in this pull request?
Report Spill size on disk for UnsafeExternalSorter
How was this patch tested?
Tested by running a job on cluster and verify the spill size on disk.