Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3577] Report Spill size on disk for UnsafeExternalSorter #17471

Closed
wants to merge 3 commits into from

Conversation

sitalkedia
Copy link

What changes were proposed in this pull request?

Report Spill size on disk for UnsafeExternalSorter

How was this patch tested?

Tested by running a job on cluster and verify the spill size on disk.

@SparkQA
Copy link

SparkQA commented Mar 29, 2017

Test build #75363 has finished for PR 17471 at commit 4546d43.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sitalkedia
Copy link
Author

cc - @rxin, @kayousterhout, @squito

@rxin
Copy link
Contributor

rxin commented Apr 5, 2017

cc @cloud-fan / @ueshin / @sameeragarwal can you review this?

@cloud-fan
Copy link
Contributor

SPARK-3577 is about reporting spill time, seems not related to this PR?

@sitalkedia
Copy link
Author

@cloud-fan - There is discussion in the JIRA to report the disk spill size as well. I can create a new JIRA if you prefer.

@squito
Copy link
Contributor

squito commented Apr 5, 2017

this is basically the same as this old pr, right? #15616

the last comment on there is a request for a test, which I think still applies here. I also think a separate jira would be clearer.

@sitalkedia
Copy link
Author

I did not see we already have an open PR for this. Sure, I will add test to this PR and also file a separate JIRA.

@sameeragarwal
Copy link
Member

@sitalkedia any updates here?

@sitalkedia
Copy link
Author

@sameeragarwal - Thanks for taking a look. I will update the PR adding test case soon.

@HyukjinKwon
Copy link
Member

I was just looking though PRs for my curiosity. Please let me leave a gentle ping @sitalkedia.

@sitalkedia sitalkedia force-pushed the fix_disk_spill_size branch from 4546d43 to da1f384 Compare June 21, 2017 01:11
@SparkQA
Copy link

SparkQA commented Jun 21, 2017

Test build #78348 has started for PR 17471 at commit da1f384.

@sitalkedia
Copy link
Author

Thanks for the ping @HyukjinKwon. Updated the PR with test case.
cc - @sameeragarwal

@shaneknapp
Copy link
Contributor

test this please

@SparkQA
Copy link

SparkQA commented Jun 21, 2017

Test build #78363 has started for PR 17471 at commit da1f384.

@sitalkedia
Copy link
Author

Jenkins retest this please.

@sitalkedia
Copy link
Author

@shaneknapp, @sameeragarwal - Not able to access the jenkins build link- https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78405/testReport, something wrong with the jenkins sever?

@kayousterhout
Copy link
Contributor

I think the power is still out in the CS building at Berkeley because of the earthquake, so I'm guessing Jenkins is down as a result (note that even the vanilla AMP website doesn't work: http://amplab.cs.berkeley.edu/)

@shaneknapp
Copy link
Contributor

shaneknapp commented Jun 21, 2017 via email

@shaneknapp
Copy link
Contributor

shaneknapp commented Jun 21, 2017 via email

@SparkQA
Copy link

SparkQA commented Jun 21, 2017

Test build #78405 has finished for PR 17471 at commit da1f384.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

record[0] = (long) i;
sorter.insertRecord(record, Platform.LONG_ARRAY_OFFSET, recordSize, 0, false);
}
assertTrue(sorter.getNumberOfAllocatedPages() >= 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some comments to explain the math here? I don't quite understand why it should be >= 2...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to make it clear.

@cloud-fan
Copy link
Contributor

LGTM, shall we fix it for other sorters? e.g. UnsafeExternalRowSorter, UnsafeKVExternalSorter, etc.

@@ -540,6 +538,7 @@ public long spill() throws IOException {
inMemSorter.free();
inMemSorter = null;
taskContext.taskMetrics().incMemoryBytesSpilled(released);
taskContext.taskMetrics().incDiskBytesSpilled(writeMetrics.bytesWritten());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we create a new writeMetrics here, instead of report this.writeMetrics.bytesWritten?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're spilling, so bytes written should be counted towards spill rather than write.
Approach similar to - https://github.com/sitalkedia/spark/blob/master/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java#L138

@sitalkedia
Copy link
Author

shall we fix it for other sorters? e.g. UnsafeExternalRowSorter, UnsafeKVExternalSorter, etc.

Actually other sorters are using UnsafeExternalSorter internally(https://github.com/sitalkedia/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java#L67), so this change should fix others as well.

@SparkQA
Copy link

SparkQA commented Jun 28, 2017

Test build #78821 has finished for PR 17471 at commit 6b94c2b.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sitalkedia
Copy link
Author

Jenkins retest this please.

@SparkQA
Copy link

SparkQA commented Jun 29, 2017

Test build #78830 has finished for PR 17471 at commit 6b94c2b.

  • This patch fails due to an unknown error code, -10.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jun 29, 2017

Test build #78851 has finished for PR 17471 at commit 6b94c2b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in a946be3 Jun 29, 2017
robert3005 pushed a commit to palantir/spark that referenced this pull request Jun 29, 2017
## What changes were proposed in this pull request?

Report Spill size on disk for UnsafeExternalSorter

## How was this patch tested?

Tested by running a job on cluster and verify the spill size on disk.

Author: Sital Kedia <skedia@fb.com>

Closes apache#17471 from sitalkedia/fix_disk_spill_size.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants