[SPARK-11271][SPARK-11016][Core] Use Spark BitSet instead of RoaringBitmap to reduce memory usage #9243

viirya · 2015-10-23T08:08:50Z

JIRA: https://issues.apache.org/jira/browse/SPARK-11271

As reported in the JIRA ticket, when there are too many tasks, the memory usage of MapStatus will cause problem. Use BitSet instead of RoaringBitMap should be more efficient in memory usage.

rxin · 2015-10-23T08:16:03Z

Why is an uncompressed bitset using less space than a compressed one?

yaooqinn · 2015-10-23T09:53:16Z

@rxin According to the annotation of HighlyCompressedMapStatus(During serialization, this bitmap is compressed.) compressed progress only happens in serialization with RoaringBitMap, In driver's JVM, BitSet uses less space than the Roaring one as an object.

srowen · 2015-10-23T09:59:51Z

@viirya we have an outstanding issue about RoaringBitmap not being serialized by kryo at the moment: https://issues.apache.org/jira/browse/SPARK-11016 It sounds like we have two bitset implementations (three if you count the JDK). What's your view on replacing RoaringBitmap everywhere? It's kind of unfortunate Spark reimplemented the wheel here but, given that this is done (and maybe has some marginal advantage), what about removing this dependency entirely? that is do you see an argument for RoaringBitmap in Spark?

viirya · 2015-10-23T10:09:35Z

@srowen Actually I can't find where in the Spark RoaringBitmap is used other than MapStatus. Correct me if I am wrong. If so, seems we can remove it safely now?

srowen · 2015-10-23T10:30:11Z

@viirya oh, well, that would certainly kill two birds with one stone. It can come out of the poms, kryo serializer too. I personally favor this even only on the grounds of simplification.

SparkQA · 2015-10-23T12:29:11Z

Test build #44213 has finished for PR 9243 at commit 392975d.

This patch fails from timeout after a configured wait of 250m.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-10-23T13:03:05Z

Test build #1946 has started for PR 9243 at commit 392975d.

drcrallen · 2015-10-23T13:40:42Z

Regarding SPARK-11016 , can references to Roaring also be removed from core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala ?

srowen · 2015-10-23T13:42:34Z

@drcrallen yes they would have to be.

drcrallen · 2015-10-23T13:54:13Z

@srowen I was improperly asking if that change would be appropriate to include in this PR :)

viirya · 2015-10-23T14:01:25Z

@srowen @drcrallen I've updated this patch to remove RoaringBitmap from KryoSerializer and pom.

drcrallen · 2015-10-23T14:06:38Z

General FYI, we did a lot of analysis on various bitmap implementations for the Druid project. We were more interested in the direct-memory (mmap'd file) capabilities. The general discussion thread can be found in the dev group

The general findings were that Roaring (at the time) took up more space. This has been much improved in the 0.5.x line. So an alternate solution to this PR and the issue may be to upgrade to 0.5.x

Roaring on the 0.4.x line was generally comparable or slightly faster than Concise in many aspects on our particular datasets in benchmarks against production-like workloads (judged by query time). Roaring was way faster than Concise at doing unions and intersections. Full analysis has not been re-run since the 0.5.x release.

rxin · 2015-10-23T17:43:19Z

@lemire any comment on this thread? Looks like we are having some trouble with the roaring bitmap.

SparkQA · 2015-10-23T18:19:46Z

Test build #44225 has finished for PR 9243 at commit d30ec97.

This patch fails from timeout after a configured wait of 250m.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2015-10-23T18:47:36Z

retest this please.

lemire · 2015-10-23T19:21:44Z

@rxin

There are definitively cases where attempting to use compressed bitmaps is wasteful. For example, if you have a small universe size. E.g., your bitmaps represent sets of integer from [0,n) where n is small (e.g., n=64 or n=128).

It is just generally true that compression is not always a good idea.

The fact that you are able to use uncompressed BitSet and it does not blow up memory usage tells me that you might be in a scenario where compression is not useful.

Techniques like Roaring or Concise do not make uncompressed BitSet obsolete. Rather, they are there to help when regular BitSets would fail you due to excessive memory usage.

How can this happen? Well. Suppose that you are trying to index a column containing 1000 distinct integer values. If you try to do it with a BitSet, each row will use 125 bytes... just to index this column... if you have 10,000 distinct values, then you use over 1kB per row just to index this one column. And so forth.

But, if your BitSets are tiny then compressing them could definitively be wasteful.

rxin · 2015-10-23T22:38:45Z

If I understand the problem here correctly, we are using a bitmap to track non-zero-size blocks (or zero-size blocks) when the number of partitions is large (> 2000). If we use uncompressed bitset, the tracking would be fixed size. The compression at a high level is essentially run length encoding. If we use a compressed one, when there are lots of zeros, it should be compressed to a very small size? The worst case for the compressed bitmap is alternating 0s and 1s. Am I understanding the situation incorrectly?

SparkQA · 2015-10-23T23:05:12Z

Test build #44247 has finished for PR 9243 at commit d30ec97.

This patch fails from timeout after a configured wait of 250m.
This patch merges cleanly.
This patch adds no public classes.

lemire · 2015-10-23T23:30:23Z

@rxin

Right.

If you have only 2000 bits or so in your bitmaps, then that's about 250 bytes per bitmap, maybe less in practice. Considering that a single entry in a HashMap can use 150 bytes (http://lemire.me/blog/2015/10/15/on-the-memory-usage-of-maps-in-java/), that's not something that I'd worry about compressing normally. If that's your use case and you do want to compress them, I'd be glad to have a look at the problem, but it is not a usual scenario for compressed bitmaps. In our work, we often consider bitmaps that range over hundreds of thousands or millions of bits, corresponding to large tables. And we have lots of them. Then it makes sense to worry about compressing each bitmap.

Thankfully, I can explain the bitmap formats in a few lines. Maybe this can be helpful.

First, Java's BitSet is not a mere array of bits. Java's BitSet class does have a form of compression: it does not store trailing zeroes. This can help in some cases (to speed up processing and reduce memory usage). When it is applicable, it is pretty good and does not waste memory. However, take a BitSet and set the bit at position 1,000,000 to true and you have just over 100kB. That's over 100kB to store the position of one bit. But even if you do not care about memory... speed becomes a factor... when you want to compute the intersection between this BitSet, and the BitSet that has a bit at position 1,000,001to true, you need to go through all these zeroes, whether you like it or not. That can become very wasteful.

For BBC (used by Oracle), WAH, EWAH (used by Apache Hive), Concise (used by Druid in addition to Roaring)... you identify long runs of 1s or 0s and you represent them with a marker word. So it is, essentially, run length encoding except that if you have a mix of 1s and 0s, you use an uncompressed word. So it is best described as a hybrid between RLE and bitmaps. Oracle's BBC is pretty much obsolete. Concise compresses better than most alternatives in specific cases, but is otherwise a slower version of WAH. EWAH has lower compression ratios, but faster processing speed. EWAH is faster because it allows some form of "skipping".

There is a big problem with these formats however that can hurt you badly in some cases: there is no random access. If you want to check whether a given value is present in the set, you have to start from the beginning and "uncompress" the whole thing. This means that if you want to intersect a big set with a large set, you still have to uncompress the whole big set in the worst case... That's bad news. This is not a property of run-length encoding per se... since you can have run-length encoding and fast random access, but here we have a hybrid so that unindexed access is a problem.

Roaring solves this problem. It works in the following manner. It divides the data into chunks of 2^{16} integers (e.g., [0, 2^{16}), [2^{16}, 2 \times 2^{16}), ...). Within a chunk, it can use an uncompressed bitmap, a simple list of integers, or a list of runs. Whatever format it uses, they all allow you to check for the present of any one value quickly (e.g., with a binary search). The net result is that Roaring can compute many operations much faster that run-length-encoded formats (WAH, EWAH, Concise...). Maybe surprisingly, it also generally offers better compression ratios.

All these formats, Roaring, WAH, Concise... basically "look" very close to an uncompressed bitmap when the data is uncompressible... (Roaring will use uncompressed bitmap containers.) However, they all add a bit of overhead. The storage overhead is not what should worry you however... what should worry you is that by storing uncompressible data in a compressed format, you are slowing things down.

Think about zipping a binary file that is not compressible (e.g., a JPEG file). Not only would the resulting file be slightly larger, but, also, accessing the data would take quite a bit longer... needlessly.

Let me conclude with this: all these bitmap formats are quite simple conceptually. They behave in rather predictable ways. So given the data, it is not very difficult to benchmark them. If you do have the data, I can setup a benchmark in no time, if you tell me what you care about.

SparkQA · 2015-10-24T15:19:21Z

Test build #44294 has finished for PR 9243 at commit e981fc9.

This patch fails from timeout after a configured wait of 250m.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2015-10-24T16:02:36Z

Hmm, I don't see any failed test but it is timeout still.

viirya · 2015-10-24T16:02:44Z

retest this please.

SparkQA · 2015-10-24T20:17:51Z

Test build #44297 has finished for PR 9243 at commit e981fc9.

This patch fails from timeout after a configured wait of 250m.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2015-10-25T16:27:06Z

The motivation here is not just memory, but (secondarily) avoiding the outstanding Kryo serialization problem and simplifying the build.

The explanations here all make sense, but the question is, does this change reduce memory consumption or not? The assertion is that it does by about 20% and I'm not clear if we're arguing that doesn't make sense or not. I am not sure I'd expect the set of non-empty blocks to be sparse? so maybe it does.

If it doesn't obviously change memory consumption either way, then it's still a win.

I still don't get why the test is timing out though.

SparkQA · 2015-10-25T16:29:13Z

Test build #1954 has started for PR 9243 at commit e981fc9.

lemire · 2015-10-25T17:41:15Z

@srowen

Trying to compress something that is tiny and otherwise incompressible can indeed lead to a storage increase of the order of 20%. That's credible. But even if the 20% was in the other direction... that is, the compressed bitmaps were 20% smaller, it would still not be worth it to compress in general. RAM and storage are not that expensive in 2015. We use compressed bitmaps when uncompressed ones are many times larger (at least 2x). Then you get both less memory usage and faster processing speed.

rxin · 2015-10-25T20:56:18Z

Note that this can go up to 200k partitions (2k is the lower bound). Would you say it is still not worth it to compress something that has 200k bits?

lemire · 2015-10-26T00:27:13Z

@rxin

If one were to share a dump, say one text file per bitmap, as a list of set bits (one per row), we could quantify the problem easily.

viirya · 2015-10-26T06:49:52Z

Seems it passed all tests. But in the end it is always time out.

viirya · 2015-10-26T13:26:01Z

I can't reproduce the time out in local side. @srowen @rxin any ideas?

SparkQA · 2015-10-27T19:25:46Z

Test build #1959 has started for PR 9243 at commit e981fc9.

drcrallen · 2015-10-27T21:24:35Z

This is still getting pretty stuck. any clue if its an actual problem or just a weirdness in the test system? I'm very interested in cherry picking this to a local branch whenever it is ready to accept

srowen · 2015-10-30T19:34:45Z

@viirya @drcrallen this is a real failure and it's easy to reproduce. It really stumped me until I figured out why it was just hanging -- it's the call to runApproximateJob with such a short timeout (1ms). It must be some other subtle problem that causes it to fail to notice the job has already failed before it begins waiting.

Anyway, if you increase that in the test you can see the problem: BitSet just needs a no-arg constructor with def this() = this(0) and you can use this instead of the call to new BitSet(0).

Also there is still some mention of roaring bitmaps in HighlyCompressedMapStatus comments.

Remove comment mentioning roaring bitmaps in KryoSerializerSuite.

SparkQA · 2015-10-31T02:02:55Z

Test build #44718 has finished for PR 9243 at commit 37f85ce.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2015-10-31T03:08:17Z

retest this please.

SparkQA · 2015-10-31T05:12:50Z

Test build #44725 has finished for PR 9243 at commit 37f85ce.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2015-10-31T10:52:17Z

core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala

@@ -161,7 +160,7 @@ private[spark] class HighlyCompressedMapStatus private (

  override def readExternal(in: ObjectInput): Unit = Utils.tryOrIOException {
    loc = BlockManagerId(in)
-    emptyBlocks = new RoaringBitmap()
+    emptyBlocks = new BitSet(0)


LGTM except that I'd make this a call to the no-arg constructor. It's almost a plus, in that it verifies it exists for exactly the purpose it's needed.

Any objections? i'll merge soon anyway.

I've updated it.

SparkQA · 2015-11-02T06:59:21Z

Test build #44791 has finished for PR 9243 at commit a6ea5fc.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * case class Corr(\n * case class Corr(left: Expression, right: Expression)\n

srowen · 2015-11-02T08:53:01Z

Merged to master

Use Spark BitSet instead of RoaringBitmap to reduce memory usage.

392975d

drcrallen mentioned this pull request Oct 23, 2015

[SPARK-11016] Move RoaringBitmap to explicit Kryo serializer metamx/spark#1

Closed

Remove RoaringBitmap from KryoSerializer and pom.

d30ec97

viirya changed the title ~~[SPARK-11271][Core] Use Spark BitSet instead of RoaringBitmap to reduce memory usage~~ [SPARK-11271][SPARK-11016][Core] Use Spark BitSet instead of RoaringBitmap to reduce memory usage Oct 23, 2015

Add BitSet to KryoSerializer.

e981fc9

viirya added 2 commits October 31, 2015 08:02

Merge remote-tracking branch 'upstream/master' into mapstatus-bitset

0ac1fae

Fix the test getting stuck by adding default constructor.

37f85ce

Remove comment mentioning roaring bitmaps in KryoSerializerSuite.

srowen reviewed Oct 31, 2015
View reviewed changes

viirya added 2 commits November 2, 2015 12:50

Merge remote-tracking branch 'upstream/master' into mapstatus-bitset

95495f8

Use default constructor.

a6ea5fc

asfgit closed this in e209fa2 Nov 2, 2015

davies mentioned this pull request Nov 16, 2015

[SPARK-11583] [Core]MapStatus Using RoaringBitmap More Properly #9661

Closed

liancheng mentioned this pull request Nov 17, 2015

[SPARK-11016] Move RoaringBitmap to explicit Kryo serializer #9748

Closed

viirya deleted the mapstatus-bitset branch December 27, 2023 18:18

[SPARK-11271][SPARK-11016][Core] Use Spark BitSet instead of RoaringBitmap to reduce memory usage #9243

[SPARK-11271][SPARK-11016][Core] Use Spark BitSet instead of RoaringBitmap to reduce memory usage #9243

Conversation

viirya commented Oct 23, 2015

rxin commented Oct 23, 2015

yaooqinn commented Oct 23, 2015

srowen commented Oct 23, 2015

viirya commented Oct 23, 2015

srowen commented Oct 23, 2015

SparkQA commented Oct 23, 2015

SparkQA commented Oct 23, 2015

drcrallen commented Oct 23, 2015

srowen commented Oct 23, 2015

drcrallen commented Oct 23, 2015

viirya commented Oct 23, 2015

drcrallen commented Oct 23, 2015

rxin commented Oct 23, 2015

SparkQA commented Oct 23, 2015

viirya commented Oct 23, 2015

lemire commented Oct 23, 2015

rxin commented Oct 23, 2015

SparkQA commented Oct 23, 2015

lemire commented Oct 23, 2015

SparkQA commented Oct 24, 2015

viirya commented Oct 24, 2015

viirya commented Oct 24, 2015

SparkQA commented Oct 24, 2015

srowen commented Oct 25, 2015

SparkQA commented Oct 25, 2015

lemire commented Oct 25, 2015

rxin commented Oct 25, 2015

lemire commented Oct 26, 2015

viirya commented Oct 26, 2015

viirya commented Oct 26, 2015

SparkQA commented Oct 27, 2015

drcrallen commented Oct 27, 2015

srowen commented Oct 30, 2015

SparkQA commented Oct 31, 2015

viirya commented Oct 31, 2015

SparkQA commented Oct 31, 2015

srowen Oct 31, 2015

Choose a reason for hiding this comment

viirya Nov 2, 2015

Choose a reason for hiding this comment

SparkQA commented Nov 2, 2015

srowen commented Nov 2, 2015