Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11271][SPARK-11016][Core] Use Spark BitSet instead of RoaringBitmap to reduce memory usage #9243

Closed
wants to merge 7 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Oct 23, 2015

JIRA: https://issues.apache.org/jira/browse/SPARK-11271

As reported in the JIRA ticket, when there are too many tasks, the memory usage of MapStatus will cause problem. Use BitSet instead of RoaringBitMap should be more efficient in memory usage.

@rxin
Copy link
Contributor

rxin commented Oct 23, 2015

Why is an uncompressed bitset using less space than a compressed one?

@yaooqinn
Copy link
Member

@rxin According to the annotation of HighlyCompressedMapStatus(During serialization, this bitmap is compressed.) compressed progress only happens in serialization with RoaringBitMap, In driver's JVM, BitSet uses less space than the Roaring one as an object.

@srowen
Copy link
Member

srowen commented Oct 23, 2015

@viirya we have an outstanding issue about RoaringBitmap not being serialized by kryo at the moment: https://issues.apache.org/jira/browse/SPARK-11016 It sounds like we have two bitset implementations (three if you count the JDK). What's your view on replacing RoaringBitmap everywhere? It's kind of unfortunate Spark reimplemented the wheel here but, given that this is done (and maybe has some marginal advantage), what about removing this dependency entirely? that is do you see an argument for RoaringBitmap in Spark?

@viirya
Copy link
Member Author

viirya commented Oct 23, 2015

@srowen Actually I can't find where in the Spark RoaringBitmap is used other than MapStatus. Correct me if I am wrong. If so, seems we can remove it safely now?

@srowen
Copy link
Member

srowen commented Oct 23, 2015

@viirya oh, well, that would certainly kill two birds with one stone. It can come out of the poms, kryo serializer too. I personally favor this even only on the grounds of simplification.

@SparkQA
Copy link

SparkQA commented Oct 23, 2015

Test build #44213 has finished for PR 9243 at commit 392975d.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 23, 2015

Test build #1946 has started for PR 9243 at commit 392975d.

@drcrallen
Copy link
Contributor

Regarding SPARK-11016 , can references to Roaring also be removed from core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala ?

@srowen
Copy link
Member

srowen commented Oct 23, 2015

@drcrallen yes they would have to be.

@drcrallen
Copy link
Contributor

@srowen I was improperly asking if that change would be appropriate to include in this PR :)

@viirya
Copy link
Member Author

viirya commented Oct 23, 2015

@srowen @drcrallen I've updated this patch to remove RoaringBitmap from KryoSerializer and pom.

@viirya viirya changed the title [SPARK-11271][Core] Use Spark BitSet instead of RoaringBitmap to reduce memory usage [SPARK-11271][SPARK-11016][Core] Use Spark BitSet instead of RoaringBitmap to reduce memory usage Oct 23, 2015
@drcrallen
Copy link
Contributor

General FYI, we did a lot of analysis on various bitmap implementations for the Druid project. We were more interested in the direct-memory (mmap'd file) capabilities. The general discussion thread can be found in the dev group

The general findings were that Roaring (at the time) took up more space. This has been much improved in the 0.5.x line. So an alternate solution to this PR and the issue may be to upgrade to 0.5.x

Roaring on the 0.4.x line was generally comparable or slightly faster than Concise in many aspects on our particular datasets in benchmarks against production-like workloads (judged by query time). Roaring was way faster than Concise at doing unions and intersections. Full analysis has not been re-run since the 0.5.x release.

@rxin
Copy link
Contributor

rxin commented Oct 23, 2015

@lemire any comment on this thread? Looks like we are having some trouble with the roaring bitmap.

@SparkQA
Copy link

SparkQA commented Oct 23, 2015

Test build #44225 has finished for PR 9243 at commit d30ec97.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Oct 23, 2015

retest this please.

@lemire
Copy link
Contributor

lemire commented Oct 23, 2015

@rxin

There are definitively cases where attempting to use compressed bitmaps is wasteful. For example, if you have a small universe size. E.g., your bitmaps represent sets of integer from [0,n) where n is small (e.g., n=64 or n=128).

It is just generally true that compression is not always a good idea.

The fact that you are able to use uncompressed BitSet and it does not blow up memory usage tells me that you might be in a scenario where compression is not useful.

Techniques like Roaring or Concise do not make uncompressed BitSet obsolete. Rather, they are there to help when regular BitSets would fail you due to excessive memory usage.

How can this happen? Well. Suppose that you are trying to index a column containing 1000 distinct integer values. If you try to do it with a BitSet, each row will use 125 bytes... just to index this column... if you have 10,000 distinct values, then you use over 1kB per row just to index this one column. And so forth.

But, if your BitSets are tiny then compressing them could definitively be wasteful.

@rxin
Copy link
Contributor

rxin commented Oct 23, 2015

If I understand the problem here correctly, we are using a bitmap to track non-zero-size blocks (or zero-size blocks) when the number of partitions is large (> 2000). If we use uncompressed bitset, the tracking would be fixed size. The compression at a high level is essentially run length encoding. If we use a compressed one, when there are lots of zeros, it should be compressed to a very small size? The worst case for the compressed bitmap is alternating 0s and 1s. Am I understanding the situation incorrectly?

@SparkQA
Copy link

SparkQA commented Oct 23, 2015

Test build #44247 has finished for PR 9243 at commit d30ec97.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@lemire
Copy link
Contributor

lemire commented Oct 23, 2015

@rxin

Right.

If you have only 2000 bits or so in your bitmaps, then that's about 250 bytes per bitmap, maybe less in practice. Considering that a single entry in a HashMap can use 150 bytes (http://lemire.me/blog/2015/10/15/on-the-memory-usage-of-maps-in-java/), that's not something that I'd worry about compressing normally. If that's your use case and you do want to compress them, I'd be glad to have a look at the problem, but it is not a usual scenario for compressed bitmaps. In our work, we often consider bitmaps that range over hundreds of thousands or millions of bits, corresponding to large tables. And we have lots of them. Then it makes sense to worry about compressing each bitmap.

Thankfully, I can explain the bitmap formats in a few lines. Maybe this can be helpful.

First, Java's BitSet is not a mere array of bits. Java's BitSet class does have a form of compression: it does not store trailing zeroes. This can help in some cases (to speed up processing and reduce memory usage). When it is applicable, it is pretty good and does not waste memory. However, take a BitSet and set the bit at position 1,000,000 to true and you have just over 100kB. That's over 100kB to store the position of one bit. But even if you do not care about memory... speed becomes a factor... when you want to compute the intersection between this BitSet, and the BitSet that has a bit at position 1,000,001to true, you need to go through all these zeroes, whether you like it or not. That can become very wasteful.

For BBC (used by Oracle), WAH, EWAH (used by Apache Hive), Concise (used by Druid in addition to Roaring)... you identify long runs of 1s or 0s and you represent them with a marker word. So it is, essentially, run length encoding except that if you have a mix of 1s and 0s, you use an uncompressed word. So it is best described as a hybrid between RLE and bitmaps. Oracle's BBC is pretty much obsolete. Concise compresses better than most alternatives in specific cases, but is otherwise a slower version of WAH. EWAH has lower compression ratios, but faster processing speed. EWAH is faster because it allows some form of "skipping".

There is a big problem with these formats however that can hurt you badly in some cases: there is no random access. If you want to check whether a given value is present in the set, you have to start from the beginning and "uncompress" the whole thing. This means that if you want to intersect a big set with a large set, you still have to uncompress the whole big set in the worst case... That's bad news. This is not a property of run-length encoding per se... since you can have run-length encoding and fast random access, but here we have a hybrid so that unindexed access is a problem.

Roaring solves this problem. It works in the following manner. It divides the data into chunks of 2^{16} integers (e.g., [0, 2^{16}), [2^{16}, 2 \times 2^{16}), ...). Within a chunk, it can use an uncompressed bitmap, a simple list of integers, or a list of runs. Whatever format it uses, they all allow you to check for the present of any one value quickly (e.g., with a binary search). The net result is that Roaring can compute many operations much faster that run-length-encoded formats (WAH, EWAH, Concise...). Maybe surprisingly, it also generally offers better compression ratios.

All these formats, Roaring, WAH, Concise... basically "look" very close to an uncompressed bitmap when the data is uncompressible... (Roaring will use uncompressed bitmap containers.) However, they all add a bit of overhead. The storage overhead is not what should worry you however... what should worry you is that by storing uncompressible data in a compressed format, you are slowing things down.

Think about zipping a binary file that is not compressible (e.g., a JPEG file). Not only would the resulting file be slightly larger, but, also, accessing the data would take quite a bit longer... needlessly.

Let me conclude with this: all these bitmap formats are quite simple conceptually. They behave in rather predictable ways. So given the data, it is not very difficult to benchmark them. If you do have the data, I can setup a benchmark in no time, if you tell me what you care about.

@SparkQA
Copy link

SparkQA commented Oct 24, 2015

Test build #44294 has finished for PR 9243 at commit e981fc9.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Oct 24, 2015

Hmm, I don't see any failed test but it is timeout still.

@viirya
Copy link
Member Author

viirya commented Oct 24, 2015

retest this please.

@SparkQA
Copy link

SparkQA commented Oct 24, 2015

Test build #44297 has finished for PR 9243 at commit e981fc9.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Oct 25, 2015

The motivation here is not just memory, but (secondarily) avoiding the outstanding Kryo serialization problem and simplifying the build.

The explanations here all make sense, but the question is, does this change reduce memory consumption or not? The assertion is that it does by about 20% and I'm not clear if we're arguing that doesn't make sense or not. I am not sure I'd expect the set of non-empty blocks to be sparse? so maybe it does.

If it doesn't obviously change memory consumption either way, then it's still a win.

I still don't get why the test is timing out though.

@SparkQA
Copy link

SparkQA commented Oct 25, 2015

Test build #1954 has started for PR 9243 at commit e981fc9.

@lemire
Copy link
Contributor

lemire commented Oct 25, 2015

@srowen

Trying to compress something that is tiny and otherwise incompressible can indeed lead to a storage increase of the order of 20%. That's credible. But even if the 20% was in the other direction... that is, the compressed bitmaps were 20% smaller, it would still not be worth it to compress in general. RAM and storage are not that expensive in 2015. We use compressed bitmaps when uncompressed ones are many times larger (at least 2x). Then you get both less memory usage and faster processing speed.

@rxin
Copy link
Contributor

rxin commented Oct 25, 2015

Note that this can go up to 200k partitions (2k is the lower bound). Would you say it is still not worth it to compress something that has 200k bits?

@lemire
Copy link
Contributor

lemire commented Oct 26, 2015

@rxin

If one were to share a dump, say one text file per bitmap, as a list of set bits (one per row), we could quantify the problem easily.

@viirya
Copy link
Member Author

viirya commented Oct 26, 2015

Seems it passed all tests. But in the end it is always time out.

@viirya
Copy link
Member Author

viirya commented Oct 26, 2015

I can't reproduce the time out in local side. @srowen @rxin any ideas?

@SparkQA
Copy link

SparkQA commented Oct 27, 2015

Test build #1959 has started for PR 9243 at commit e981fc9.

@drcrallen
Copy link
Contributor

This is still getting pretty stuck. any clue if its an actual problem or just a weirdness in the test system? I'm very interested in cherry picking this to a local branch whenever it is ready to accept

@srowen
Copy link
Member

srowen commented Oct 30, 2015

@viirya @drcrallen this is a real failure and it's easy to reproduce. It really stumped me until I figured out why it was just hanging -- it's the call to runApproximateJob with such a short timeout (1ms). It must be some other subtle problem that causes it to fail to notice the job has already failed before it begins waiting.

Anyway, if you increase that in the test you can see the problem: BitSet just needs a no-arg constructor with def this() = this(0) and you can use this instead of the call to new BitSet(0).

Also there is still some mention of roaring bitmaps in HighlyCompressedMapStatus comments.

@SparkQA
Copy link

SparkQA commented Oct 31, 2015

Test build #44718 has finished for PR 9243 at commit 37f85ce.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Oct 31, 2015

retest this please.

@SparkQA
Copy link

SparkQA commented Oct 31, 2015

Test build #44725 has finished for PR 9243 at commit 37f85ce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -161,7 +160,7 @@ private[spark] class HighlyCompressedMapStatus private (

override def readExternal(in: ObjectInput): Unit = Utils.tryOrIOException {
loc = BlockManagerId(in)
emptyBlocks = new RoaringBitmap()
emptyBlocks = new BitSet(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except that I'd make this a call to the no-arg constructor. It's almost a plus, in that it verifies it exists for exactly the purpose it's needed.

Any objections? i'll merge soon anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated it.

@SparkQA
Copy link

SparkQA commented Nov 2, 2015

Test build #44791 has finished for PR 9243 at commit a6ea5fc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * case class Corr(\n * case class Corr(left: Expression, right: Expression)\n

@srowen
Copy link
Member

srowen commented Nov 2, 2015

Merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants