[SPARK-21501] Change CacheLoader to limit entries based on memory footprint #18940

redsanket · 2017-08-14T15:21:37Z

Right now the spark shuffle service has a cache for index files. It is based on a # of files cached (spark.shuffle.service.index.cache.entries). This can cause issues if people have a lot of reducers because the size of each entry can fluctuate based on the # of reducers.
We saw an issues with a job that had 170000 reducers and it caused NM with spark shuffle service to use 700-800MB or memory in NM by itself.
We should change this cache to be memory based and only allow a certain memory size used. When I say memory based I mean the cache should have a limit of say 100MB.

https://issues.apache.org/jira/browse/SPARK-21501

Manual Testing with 170000 reducers has been performed with cache loaded up to max 100MB default limit, with each shuffle index file of size 1.3MB. Eviction takes place as soon as the total cache size reaches the 100MB limit and the objects will be ready for garbage collection there by avoiding NM to crash. No notable difference in runtime has been observed.

tgravescs · 2017-08-14T15:42:49Z

ok to test

dbolshak · 2017-08-14T16:35:32Z

LGTM, btw, no unit tests for the change?

SparkQA · 2017-08-14T18:57:40Z

Test build #80636 has finished for PR 18940 at commit f23a4c7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2017-08-14T22:25:47Z

Can you fix the title? Doesn't look like this is related to SPARK-734.

vanzin · 2017-08-14T22:27:01Z

...ork-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java

-                                    .maximumSize(indexCacheEntries).build(indexCacheLoader);
+    shuffleIndexCache =
+        CacheBuilder.newBuilder()
+                    .maximumWeight(JavaUtils.byteStringAsBytes(indexCacheSize))


nit: these lines are indented way too far (the previous code was, too).

See block above this one for example.

yeah the prev code actually made me to follow the convention, ok will revert to 2 space indentation thanks

redsanket · 2017-08-15T15:35:50Z

@dbolshak there were no unit tests for google cache implementation here before, I could add a simple test to check for cache behavior if it is necessary but ideally a scale test is necessary to understand the shuffleCacheIndex behavior.

kiszk · 2017-08-15T15:56:36Z

I like this feature.
For backward compatibility, how about referring to spark.shuffle.service.index.cache.entries only if spark.shuffle.service.index.cache.entries is explicitly declared.
If spark.shuffle.service.index.cache.size is specified, it should be referred.

kiszk · 2017-08-15T15:57:30Z

nit: title should be "[SPARK-21501] ..." instead of "SPARK-21501 ....

redsanket · 2017-08-15T16:24:54Z

@kiszk I dont think that would be ideal, it is better to backport the feature itself to a desired version or branch, having two conflicting configs for the same task is not ideal, if that is what you mean, thanks.

kiszk · 2017-08-15T17:36:42Z

@redsanket I am thinking about the case that the same configuration file, which explicitly sets a value (e.g. 4096) into spark.shuffle.service.index.cache.entries, is used in Spark 2.3.
The user may feel strange since the performance would be degraded. If Spark 2.3 will completely ignore spark.shuffle.service.index.cache.entries, it would be good to show a warning message when spark.shuffle.service.index.cache.entries is explicitly set.

What do you think?

redsanket · 2017-08-15T17:52:45Z

@kiszk wouldn't the updated release notes/docs take care of that, which configs can no longer be used and which are not. I don't mind adding a warning msg saying please use another cache.size instead of cache.entries or providing two alternate implementations based on entries/size. I would like to see what other PMC's think about this @tgravescs @vanzin

vanzin · 2017-08-15T18:25:15Z

If you're removing a public config, you should at least add it to SparkConf.deprecatedConfigs. It would be nice, but not required, to have some kind of mapping of the old value to the new (in which case you'd use SparkConf.configsWithAlternatives instead).

redsanket · 2017-08-15T18:40:10Z

Thanks @vanzin @kiszk will do, makes sense to me now

SparkQA · 2017-08-15T18:46:27Z

Test build #80691 has finished for PR 18940 at commit e9afdf7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-15T22:00:05Z

Test build #80696 has finished for PR 18940 at commit 8826ca3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jiangxb1987 · 2017-08-17T14:20:37Z

...ork-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java

@@ -104,15 +105,21 @@ public ExternalShuffleBlockResolver(TransportConf conf, File registeredExecutorF
      Executor directoryCleaner) throws IOException {
    this.conf = conf;
    this.registeredExecutorFile = registeredExecutorFile;
-    int indexCacheEntries = conf.getInt("spark.shuffle.service.index.cache.entries", 1024);
+    String indexCacheSize = conf.get("spark.shuffle.service.index.cache.size", "100m");


Let's create a new config in internal/config, then we can use it by package$.MODULE$.SHUFFLE_INDEX_CACHE_SIZE().

internal/config is in spark core not in the common/network code. we shouldn't be using the core version here as it will add an extra dependency that isn't needed and then will be required to ship in like the external shuffle jar, etc.
I prefer to leave it as is and if we want a config builder in the common network code we do that as a separate jira.

tgravescs · 2017-08-18T20:55:30Z

+1. Any further comments. @vanzin @jiangxb1987

vanzin · 2017-08-18T21:25:32Z

core/src/main/scala/org/apache/spark/SparkConf.scala

@@ -597,7 +597,8 @@ private[spark] object SparkConf extends Logging {
      DeprecatedConfig("spark.scheduler.executorTaskBlacklistTime", "2.1.0",
        "Please use the new blacklisting options, spark.blacklist.*"),
      DeprecatedConfig("spark.yarn.am.port", "2.0.0", "Not used any more"),
-      DeprecatedConfig("spark.executor.port", "2.0.0", "Not used any more")
+      DeprecatedConfig("spark.executor.port", "2.0.0", "Not used any more"),
+      DeprecatedConfig("spark.shuffle.service.index.cache.entries", "2.3.0", "Not used any more")


It would be good to mention the new config that's replacing it in the warning message.

+1 We better let user know what the alternative config is.

jiangxb1987 · 2017-08-19T13:55:50Z

LGTM, also cc @cloud-fan

redsanket · 2017-08-21T14:13:58Z

@vanzin addressed the config comment thanks

SparkQA · 2017-08-21T17:23:01Z

Test build #80931 has finished for PR 18940 at commit 09a17c6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tgravescs · 2017-08-23T16:50:01Z

+1, I'm going to merge as it appears all comments addressed.

YSPARK-734 Change CacheLoader to limit entries based on memory footprint

f23a4c7

vanzin reviewed Aug 14, 2017

View reviewed changes

redsanket changed the title ~~YSPARK-734 Change CacheLoader to limit entries based on memory footprint~~ SPARK-21501 Change CacheLoader to limit entries based on memory footprint Aug 15, 2017

Revert back to 2 space Indentation

e9afdf7

redsanket changed the title ~~SPARK-21501 Change CacheLoader to limit entries based on memory footprint~~ [SPARK-21501] Change CacheLoader to limit entries based on memory footprint Aug 15, 2017

Add cache entries config to deprecated list from 2.3.0 onwards

8826ca3

jiangxb1987 reviewed Aug 17, 2017

View reviewed changes

vanzin reviewed Aug 18, 2017

View reviewed changes

add replaced config in the warning msg

09a17c6

asfgit closed this in 1662e93 Aug 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-21501] Change CacheLoader to limit entries based on memory footprint #18940

[SPARK-21501] Change CacheLoader to limit entries based on memory footprint #18940

redsanket commented Aug 14, 2017

tgravescs commented Aug 14, 2017

dbolshak commented Aug 14, 2017

SparkQA commented Aug 14, 2017

vanzin commented Aug 14, 2017

vanzin Aug 14, 2017

redsanket Aug 15, 2017

redsanket commented Aug 15, 2017

kiszk commented Aug 15, 2017

kiszk commented Aug 15, 2017 •

edited

Loading

redsanket commented Aug 15, 2017

kiszk commented Aug 15, 2017 •

edited

Loading

redsanket commented Aug 15, 2017

vanzin commented Aug 15, 2017

redsanket commented Aug 15, 2017

SparkQA commented Aug 15, 2017

SparkQA commented Aug 15, 2017

jiangxb1987 Aug 17, 2017

tgravescs Aug 17, 2017

tgravescs commented Aug 18, 2017

vanzin Aug 18, 2017

viirya Aug 19, 2017

jiangxb1987 commented Aug 19, 2017

redsanket commented Aug 21, 2017

SparkQA commented Aug 21, 2017

tgravescs commented Aug 23, 2017

[SPARK-21501] Change CacheLoader to limit entries based on memory footprint #18940

[SPARK-21501] Change CacheLoader to limit entries based on memory footprint #18940

Conversation

redsanket commented Aug 14, 2017

tgravescs commented Aug 14, 2017

dbolshak commented Aug 14, 2017

SparkQA commented Aug 14, 2017

vanzin commented Aug 14, 2017

vanzin Aug 14, 2017

Choose a reason for hiding this comment

redsanket Aug 15, 2017

Choose a reason for hiding this comment

redsanket commented Aug 15, 2017

kiszk commented Aug 15, 2017

kiszk commented Aug 15, 2017 • edited Loading

redsanket commented Aug 15, 2017

kiszk commented Aug 15, 2017 • edited Loading

redsanket commented Aug 15, 2017

vanzin commented Aug 15, 2017

redsanket commented Aug 15, 2017

SparkQA commented Aug 15, 2017

SparkQA commented Aug 15, 2017

jiangxb1987 Aug 17, 2017

Choose a reason for hiding this comment

tgravescs Aug 17, 2017

Choose a reason for hiding this comment

tgravescs commented Aug 18, 2017

vanzin Aug 18, 2017

Choose a reason for hiding this comment

viirya Aug 19, 2017

Choose a reason for hiding this comment

jiangxb1987 commented Aug 19, 2017

redsanket commented Aug 21, 2017

SparkQA commented Aug 21, 2017

tgravescs commented Aug 23, 2017

kiszk commented Aug 15, 2017 •

edited

Loading

kiszk commented Aug 15, 2017 •

edited

Loading