Added a fast and low-memory append-only map for shuffle operations #823

mateiz · 2013-08-14T01:23:24Z

This is an attempt to reduce the CPU cost and memory usage of shuffles by taking advantage of the properties their hashmaps need. In particular, the hashmaps there are append-only, and a common operation is updating a key's value based on the old value. The included AppendOnlyMap class uses open hashing to use less space than Java's (by not having a linked list per bucket), does not support deletes, and has a changeValue operation to update a key in place without following the hash chain twice.

This is just an experiment now because it remains to test it in real Spark apps, but in micro-benchmarks against java.util.HashMap, scala.collection.mutable.HashMap, this is 20-30% smaller and 10-40% faster depending on the number and type of keys. It's also noticeably faster than fastutil's Object2ObjectOpenHashMap.

and parallel reduce operations

AmplabJenkins · 2013-08-14T01:45:24Z

Thank you for submitting this pull request.

All automated tests for this request have passed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/575/

rxin · 2013-08-14T03:12:45Z

core/src/main/scala/spark/util/AppendOnlyMap.scala

+ */
+private[spark]
+class AppendOnlyMap[K, V](initialCapacity: Int = 64) extends Iterable[(K, V)] with Serializable {
+  if (!isPowerOf2(initialCapacity)) {


Maybe change this to

require(isPowerOf2(initialCapacity), "Initial capacity must be power of 2")

and ditto for initialCapacity >= 1 << 30

Actually, it'd be better to get the user to pass in an arbitrary initial size, and then we just round it up to the next power of 2.

mateiz · 2013-08-14T03:19:40Z

FYI, for anyone interested, my benchmark vs fastutil and Java/Scala maps is at http://www.cs.berkeley.edu/~matei/maptest.tgz.

mateiz · 2013-08-14T03:21:28Z

You can run it with, for example, sbt/sbt "run 10000 10000000". Vary the first number to change the number of keys -- things like cache locality matter more or less with different numbers of them.

ryanlecompte · 2013-08-14T03:40:28Z

core/src/main/scala/spark/util/AppendOnlyMap.scala

+    val mask = (data.length / 2) - 1
+    var pos = rehash(key.hashCode) & mask
+    var i = 1
+    while (true) {


Are we sure we'll always break out of this while (true)?

Yes, this is what the quadratic probing method and power-of-2 table size guarantees (see http://en.wikipedia.org/wiki/Quadratic_probing)

My concern was that incrementSize(), and consequently growTable(), is only called after this attempts to add an entry, meaning that there must be at least one free slot on method entry, else the loop never terminates. initialCapacity >= 1 is required, so a free slot exists on the first call; none would exist on subsequent calls only if we could increment curSize to capacity without growing the table. This could happen (modulo multiplication rounding errors) only if LOAD_FACTOR >= 1.

One could iterate only capacity times, then fall through to an error message, like in CLRS, but this would add a check per iteration.

I was thinking that this search logic could be extracted and shared with apply() and changeValue(). Don't know whether the Scala or JIT compiler would inline it for the same performance. But then I realized that the most likely case, which should be tested first, differs between putInto() and apply()/changeValue(); the former expects to find an empty slot, the latter expect to find the given key.

Last week you were talking of using double hashing...? I notice that fastutil seemed to have a performance issue with that and replaced it with linear probing.

ryanlecompte · 2013-08-14T04:15:48Z

Exciting stuff! Looking forward to any shuffle benchmarks/comparisons in a real cluster with this change.

wannabeast · 2013-08-14T17:48:15Z

core/src/main/scala/spark/util/AppendOnlyMap.scala

+ * TODO: Cache the hash values of each key? java.util.HashMap does that.
+ */
+private[spark]
+class AppendOnlyMap[K, V](initialCapacity: Int = 64) extends Iterable[(K, V)] with Serializable {


Might as well [K <: AnyRef, V <: AnyRef] and avoid all of the downcasting to AnyRef.

Unfortunately that won't make it work for primitive types, which we allow in some clients that use this class (e.g. Aggregator and CoGroupedRDD). The casting is ugly but it seems necessary until we build some specialized classes for those types, which would be a later project.

mateiz · 2013-08-14T18:46:00Z

core/src/main/scala/spark/util/AppendOnlyMap.scala

+    data = newData
+    capacity = newCapacity
+  }
+


Just dealt with this by rounding capacity up to the next power of 2, as Reynold suggested

AmplabJenkins · 2013-08-14T19:11:16Z

Thank you for submitting this pull request.

All automated tests for this request have passed.

Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/580/

wannabeast · 2013-08-15T06:13:43Z

core/src/main/scala/spark/util/AppendOnlyMap.scala

+    var i = 1
+    while (true) {
+      val curKey = data(2 * pos)
+      if (curKey.eq(k) || curKey.eq(null) || curKey == k) {


When curKey.eq(null), the key isn't present, and data(2 * pos + 1) is returned, which should be null. Clearer to say

else if (curKey.eq(null)) { return null.asInstanceOf[V] }

in parallel with the similar logic in changeValue().

wannabeast · 2013-08-15T23:35:43Z

Are these the only two use cases for AppendOnlyMap, or should I expect others?

wannabeast · 2013-08-15T23:42:36Z

core/src/main/scala/spark/util/AppendOnlyMap.scala

+    if (k.eq(null)) {
+      return nullValue
+    }
+    val mask = capacity - 1


Shouldn't mask be an instance variable that you update in conjunction with capacity?

wannabeast · 2013-08-16T02:56:23Z

It should be noted that the Scala library also has an OpenHashMap, but it's probably slower due to an extra level of indirection to the OpenEntry[Key, Value] elements of its internal Array.

wannabeast · 2013-08-27T06:58:24Z

So far I've found no discernible difference in the performance of k-means with this change.

mateiz · 2013-08-27T18:14:09Z

Ah, interesting, thanks for looking at it. K-means is actually fairly CPU-heavy due to code in the application itself, so it might be better to try another thing (e.g. the group-by test in spark-perf). But at least the good news is that it doesn't hurt performance either.

Anyway, I'm not going to merge this in 0.8.0 until we have more tests.

mateiz · 2013-10-09T08:14:19Z

Closing this since it's now here: https://github.com/apache/incubator-spark/pull/44

…lasses See https://issues.apache.org/jira/browse/SPARK-1879 -- builds with Hadoop2 and Hive ran out of PermGen space in spark-shell, when those things added up with the Scala compiler. Note that users can still override it by setting their own Java options with this change. Their options will come later in the command string than the -XX:MaxPermSize=128m. Author: Matei Zaharia <matei@databricks.com> Closes mesos#823 from mateiz/spark-1879 and squashes the following commits: 6bc0ee8 [Matei Zaharia] Increase MaxPermSize to 128m since some of our builds have lots of classes

Added a fast and low-memory append-only map implementation for cogroup

085602b

and parallel reduce operations

rxin reviewed Aug 14, 2013
View reviewed changes

ryanlecompte reviewed Aug 14, 2013
View reviewed changes

wannabeast reviewed Aug 14, 2013
View reviewed changes

Fix some review comments

20856e7

mateiz reviewed Aug 14, 2013
View reviewed changes

wannabeast reviewed Aug 15, 2013
View reviewed changes

mateiz closed this Oct 9, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added a fast and low-memory append-only map for shuffle operations #823

Added a fast and low-memory append-only map for shuffle operations #823

mateiz commented Aug 14, 2013

AmplabJenkins commented Aug 14, 2013

rxin Aug 14, 2013

rxin Aug 14, 2013

mateiz commented Aug 14, 2013

mateiz commented Aug 14, 2013

ryanlecompte Aug 14, 2013

mateiz Aug 14, 2013

wannabeast Aug 15, 2013

ryanlecompte commented Aug 14, 2013

wannabeast Aug 14, 2013

mateiz Aug 14, 2013

mateiz Aug 14, 2013

AmplabJenkins commented Aug 14, 2013

wannabeast Aug 15, 2013

wannabeast commented Aug 15, 2013

wannabeast Aug 15, 2013

wannabeast commented Aug 16, 2013

wannabeast commented Aug 27, 2013

mateiz commented Aug 27, 2013

mateiz commented Oct 9, 2013

Added a fast and low-memory append-only map for shuffle operations #823

Added a fast and low-memory append-only map for shuffle operations #823

Conversation

mateiz commented Aug 14, 2013

AmplabJenkins commented Aug 14, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mateiz commented Aug 14, 2013

mateiz commented Aug 14, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryanlecompte commented Aug 14, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AmplabJenkins commented Aug 14, 2013

Choose a reason for hiding this comment

wannabeast commented Aug 15, 2013

Choose a reason for hiding this comment

wannabeast commented Aug 16, 2013

wannabeast commented Aug 27, 2013

mateiz commented Aug 27, 2013

mateiz commented Oct 9, 2013