-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-4480] Avoid many small spills in external data structures #3353
Conversation
Test build #23577 has started for PR 3353 at commit
|
Test build #23577 has finished for PR 3353 at commit
|
Test FAILed. |
Test build #23591 has started for PR 3353 at commit
|
23f2a2e
to
f4736e3
Compare
retest this please |
Test build #23594 has started for PR 3353 at commit
|
Test build #23595 has started for PR 3353 at commit
|
Test build #23591 has finished for PR 3353 at commit
|
Test FAILed. |
Test build #23594 has finished for PR 3353 at commit
|
Test PASSed. |
Test build #23595 has finished for PR 3353 at commit
|
Test PASSed. |
LGTM. Feel free to merge it. |
Conflicts: core/src/main/scala/org/apache/spark/util/collection/Spillable.scala
Test build #23614 has started for PR 3353 at commit
|
This is the branch-1.1 version of #3353. This requires a separate PR because the code in master has been refactored a little to eliminate duplicate code. I have tested this on a standalone cluster. The goal is to merge this into 1.1.1. Author: Andrew Or <andrew@databricks.com> Closes #3354 from andrewor14/avoid-small-spills-1.1 and squashes the following commits: f2e552c [Andrew Or] Fix tests 7012595 [Andrew Or] Avoid many small spills
Was this not going into master? |
Whoops I accidentally closed this without merging into master. I'll re-open it. |
Test build #23614 has finished for PR 3353 at commit
|
Test FAILed. |
Test build #23621 has started for PR 3353 at commit
|
Argh, tests won't pass because MIMA checks are broken in master. I'll send a hot fix. |
Test build #23621 has finished for PR 3353 at commit
|
Test FAILed. |
…into avoid-small-spills
Test build #23633 has started for PR 3353 at commit
|
Test build #23633 timed out for PR 3353 at commit |
Test FAILed. |
retest this please |
Test build #23645 has started for PR 3353 at commit
|
Test build #23645 has finished for PR 3353 at commit
|
Test PASSed. |
Finally. I'm merging this into master and 1.2. |
**Summary.** Currently, we may spill many small files in `ExternalAppendOnlyMap` and `ExternalSorter`. The underlying root cause of this is summarized in [SPARK-4452](https://issues.apache.org/jira/browse/SPARK-4452). This PR does not address this root cause, but simply provides the guarantee that we never spill the in-memory data structure if its size is less than a configurable threshold of 5MB. This config is not documented because we don't want users to set it themselves, and it is not hard-coded because we need to change it in tests. **Symptom.** Each spill is orders of magnitude smaller than 1MB, and there are many spills. In environments where the ulimit is set, this frequently causes "too many open file" exceptions observed in [SPARK-3633](https://issues.apache.org/jira/browse/SPARK-3633). ``` 14/11/13 19:20:43 INFO collection.ExternalSorter: Thread 60 spilling in-memory batch of 4792 B to disk (292769 spills so far) 14/11/13 19:20:43 INFO collection.ExternalSorter: Thread 60 spilling in-memory batch of 4760 B to disk (292770 spills so far) 14/11/13 19:20:43 INFO collection.ExternalSorter: Thread 60 spilling in-memory batch of 4520 B to disk (292771 spills so far) 14/11/13 19:20:43 INFO collection.ExternalSorter: Thread 60 spilling in-memory batch of 4560 B to disk (292772 spills so far) 14/11/13 19:20:43 INFO collection.ExternalSorter: Thread 60 spilling in-memory batch of 4792 B to disk (292773 spills so far) 14/11/13 19:20:43 INFO collection.ExternalSorter: Thread 60 spilling in-memory batch of 4784 B to disk (292774 spills so far) ``` **Reproduction.** I ran the following on a small 4-node cluster with 512MB executors. Note that the back-to-back shuffle here is necessary for reasons described in [SPARK-4522](https://issues.apache.org/jira/browse/SPARK-4452). The second shuffle is a `reduceByKey` because it performs a map-side combine. ``` sc.parallelize(1 to 100000000, 100) .map { i => (i, i) } .groupByKey() .reduceByKey(_ ++ _) .count() ``` Before the change, I notice that each thread may spill up to 1000 times, and the size of each spill is on the order of 10KB. After the change, each thread spills only up to 20 times in the worst case, and the size of each spill is on the order of 1MB. Author: Andrew Or <andrew@databricks.com> Closes #3353 from andrewor14/avoid-small-spills and squashes the following commits: 49f380f [Andrew Or] Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/spark into avoid-small-spills 27d6966 [Andrew Or] Merge branch 'master' of github.com:apache/spark into avoid-small-spills f4736e3 [Andrew Or] Fix tests a919776 [Andrew Or] Avoid many small spills (cherry picked from commit 0eb4a7f) Signed-off-by: Andrew Or <andrew@databricks.com>
This is blocking apache#3353 and other patches. Author: Andrew Or <andrew@databricks.com> Closes apache#3371 from andrewor14/mima-hot-fix and squashes the following commits: 842d059 [Andrew Or] Move excludes to the right section c4d4f4e [Andrew Or] MIMA hot fix
Summary. Currently, we may spill many small files in
ExternalAppendOnlyMap
andExternalSorter
. The underlying root cause of this is summarized in SPARK-4452. This PR does not address this root cause, but simply provides the guarantee that we never spill the in-memory data structure if its size is less than a configurable threshold of 5MB. This config is not documented because we don't want users to set it themselves, and it is not hard-coded because we need to change it in tests.Symptom. Each spill is orders of magnitude smaller than 1MB, and there are many spills. In environments where the ulimit is set, this frequently causes "too many open file" exceptions observed in SPARK-3633.
Reproduction. I ran the following on a small 4-node cluster with 512MB executors. Note that the back-to-back shuffle here is necessary for reasons described in SPARK-4522. The second shuffle is a
reduceByKey
because it performs a map-side combine.Before the change, I notice that each thread may spill up to 1000 times, and the size of each spill is on the order of 10KB. After the change, each thread spills only up to 20 times in the worst case, and the size of each spill is on the order of 1MB.