-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-11583] [Core]MapStatus Using RoaringBitmap More Properly #9661
Conversation
Can one of the admins verify this patch? |
The problem is that this brings back roaringbitmap, and one of the reasons we removed it was, separately, that kryo wasn't serializing it properly. You'd have to address that too. What is your conclusion though? why is this better than your alternative/ |
@@ -174,6 +174,11 @@ | |||
<artifactId>lz4</artifactId> | |||
</dependency> | |||
<dependency> | |||
<groupId>org.roaringbitmap</groupId> | |||
<artifactId>RoaringBitmap</artifactId> | |||
<version>0.4.5</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend using version 0.5.10 or better (http://central.maven.org/maven2/org/roaringbitmap/RoaringBitmap/0.5.10/).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in fact the version should be removed here -- it should be inherited from the parent pom
@srowen There is still some analysis left to be done, but I think there is a growing case that roaring is actually the right way to go, with some minor tweaks. So I'll do more reviewing here, assuming the analysis works out the way we expect, but this is still pending some final details from the discussion on the jira. |
if (numNonEmptyBlocks < totalNumBlocks / 2) { | ||
new HighlyCompressedMapStatus(loc, numNonEmptyBlocks, nonEmptyBlocks, avgSize, isSparse = true) | ||
} else { | ||
new HighlyCompressedMapStatus(loc, numNonEmptyBlocks, emptyBlocks, avgSize, isSparse = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(a) isSparse
is not the right name and (b) we might not even need to bother with tracking empty vs. non-empty, after we see the results of calling runOptimize
. In fact, if they are the same, I'd suggest we change this code to consistently track nonEmpty blocks, just for clarity.
test casessparse case: 4085 empty --- S sc.makeRDD(1 to 40950, 4095).groupBy(x=>x).top(5) dense case: 95 empty ---- D sc.makeRDD(1 to 16380000, 4095).groupBy(x=>x).top(5) test result1 without
|
case | Class Name | Objects | Shallow Heap | Retained Heap |
---|---|---|---|---|
S | HighlyCompressedMapStatus | 4,095 | 131,040 | >= 34,135,920 |
D | HighlyCompressedMapStatus | 4,095 | 131,040 | >= 1,441,440 |
2 with runOptimze
and trim
case | Class Name | Objects | Shallow Heap | Retained Heap |
---|---|---|---|---|
S | HighlyCompressedMapStatus | 4,095 | 131,040 | >= 687,960 |
D | HighlyCompressedMapStatus | 4,095 | 131,040 | >= 687,960 |
3 with runOptinize
and trim
, also isSparse
case | Class Name | Objects | Shallow Heap | Retained Heap |
---|---|---|---|---|
S | HighlyCompressedMapStatus | 4,095 | 131,040 | >= 687,960 |
D | HighlyCompressedMapStatus | 4,095 | 131,040 | >= 687,960 |
conclusion
- with
runOptimize
, roaing saves a lot, especially in S - after
runOptimize
, separating sparse case seems not necessary in this 4095bit test - what if 100k, 200k bits? Is runOptimize OK? Or use my
separating sparse case
idea?
---- i don't have an env for such large tasks (HELP!)
For my questions in my last comment continuousscala> import org.roaringbitmap._
import org.roaringbitmap._
scala> val r = new RoaringBitmap()
r: org.roaringbitmap.RoaringBitmap = {}
scala> for (i <- 1 to 2000000) r.add(i)
scala> r.runOptimize
res1: Boolean = true
scala> r.contains(2000000)
res2: Boolean = true
uncontinuousscala> import org.roaringbitmap._
import org.roaringbitmap._
scala> val r = new RoaringBitmap()
r: org.roaringbitmap.RoaringBitmap = {}
scala> for (i <- 1 to 2000000) if(i%10==0) r.add(i)
scala> r.runOptimize
res1: Boolean = false
scala> r.trim
scala> r.contains(2000000)
res2: Boolean = true
conclusionIn I guess my Notice: Here
|
bits | original size | opitimized size |
---|---|---|
200K(3) | 256 | 256 |
Although runOptimize failed, the total size is still very small
@yaooqinn thanks for the additional analysis, but I would really prefer this is written up and attached to the jira for reference -- extended discussions in the PR are harder to keep track of later on. Also include the code and the additional cases, in particular the worst case analysis, from alternating bits. If you would prefer that someone else puts those details together (maybe me or @lemire), we can help, but it would be great if you could go that last extra mile :) I don't think the analysis of the "uncontinuous" case and whether to track empty / non-empty blocks is correct. I wouldn't say that import org.roaringbitmap._
val n = 2e5.toInt
val orig = new RoaringBitmap()
val flip1 = new RoaringBitmap()
(0 until n).foreach { idx =>
if (idx % 1000 == 0) {
flip1.add(idx)
} else {
orig.add(idx)
}
}
val flip2 = orig.clone()
flip2.flip(0,n)
val origOptimized = orig.clone()
origOptimized.runOptimize()
scala> orig.getSizeInBytes
res1: Int = 31374
scala> flip1.getSizeInBytes
res2: Int = 432
scala> flip2.getSizeInBytes
res4: Int = 432
scala> origOptimized.getSizeInBytes
res6: Int = 844 This shows that the flipped version is a bit better than the the version w/ just There are two other reasons we might want to use the flipped bits: (1) memory used while building the bitset and (2) time taken to build the bitset. However, both of those reasons are much less important. And in either case, we could only improve things in all cases by making two loops through the array -- one to count the non-zeros and decide which version is sparser, and then a second pass to create the bitset. This is way less important, so I wouldn't really worry about it now. Certainly if we just stick to always tracking the empty blocks its no worse than what we have now or what was in 1.5 and before. |
import org.roaringbitmap.RoaringBitmap | ||
|
||
import scala.collection.JavaConverters._ | ||
import scala.collection.mutable.ArrayBuffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update your import rules to place scale package before third parties (as before this PR)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
@yaooqinn This PR looks good to me overall. Since your patch basically revert #9243, I'd like to revert that first, then rebase this pr, to make the history more clear. We still need some changes to address SPARK-11016. @srowen Does this sound good to you? @lemire @squito @srowen Thanks to your all valuable discussion. cc @rxin @JoshRosen |
SGTM. |
@davies, |
@davies I don't necessarily think we have to revert first. The original change did fix SPARK-11016 and, eh, sort of fixed one type of issue described by SPARK-11271 but really introduced a different related potential problem. This is a further improvement that happens to walk back part of the change. Keep in mind the original PR is in branch 1.6 so I want to make sure we leave 1.6 in a good state if it's going to be cut soon. This PR does not include a fix for SPARK-11016. Adding it would technically cause that problem to occur again. I think it would have to include some of the changes proposed by Charles Allen in the JIRA. Not hard. |
@srowen If we are going to use RoaringBitmap, then #9243 does not really fix SPARK-11016. It introduce a performance regression for common workload (when only a few blocks are empty, the memory consumption of MapStatue is bigger than before), so I don't think branch 1.6 is in healthy state now. Since SPARK-11016 is a bug fix, it's still OK to fix even 1.6 preview is published. |
It did fix the problem, but only by making it a moot point. If roaring bitmap comes back then it just needs a real fix. I'm not arguing against this change of course just making sure it doesn't take a step backwards for two forwards. |
I talked to @davies offline just now -- he thinks it'd be more clear for the diff of this patch if it doesn't need to include the "revert" of the last patch. I don't have a super strong opinion, but given he is willing to spend the time, I'd say he should make the call since nobody else feels super strongly about it. |
And I think we should merge it into 1.6 too. I was busy when we were reviewing the previous patch but I did have some concerns about changing the data structure of a fundamental part of the shuffle code path. And that's why I never explicitly LGTMed the old patch. |
Sound good on all counts as long as the serialization fix can be worked in too. |
$ git push https://github.com/yaooqinn/spark.git mapstatus-roaring:test after git rebase, i can't push it to my own repo, failed with 403 |
@yaooqinn I rebased and didnot have any problem, could you try git@github.com: ? |
@yaooqinn I had fix the conflict for you (you still got the credit when merging it), also upgrade RoaringBitmap to 0.5.11. |
@davies thanks. |
Fix RoaringBitmap serialization with Kryo
This PR upgrade the version of RoaringBitmap to 0.5.10, to optimize the memory layout, will be much smaller when most of blocks are empty. This PR is based on #9661 (fix conflicts), see all of the comments at #9661 . Author: Kent Yao <yaooqinn@hotmail.com> Author: Davies Liu <davies@databricks.com> Author: Charles Allen <charles@allen-net.com> Closes #9746 from davies/roaring_mapstatus. (cherry picked from commit e33053e) Signed-off-by: Davies Liu <davies.liu@gmail.com>
This PR upgrade the version of RoaringBitmap to 0.5.10, to optimize the memory layout, will be much smaller when most of blocks are empty. This PR is based on #9661 (fix conflicts), see all of the comments at #9661 . Author: Kent Yao <yaooqinn@hotmail.com> Author: Davies Liu <davies@databricks.com> Author: Charles Allen <charles@allen-net.com> Closes #9746 from davies/roaring_mapstatus.
This PR upgrade the version of RoaringBitmap to 0.5.10, to optimize the memory layout, will be much smaller when most of blocks are empty. This PR is based on #9661 (fix conflicts), see all of the comments at apache/spark#9661 . Author: Kent Yao <yaooqinn@hotmail.com> Author: Davies Liu <davies@databricks.com> Author: Charles Allen <charles@allen-net.com> Closes #9746 from davies/roaring_mapstatus.
---------------Part I----------------
test cases BitSet v.s. RoaringBitmap
sparse case: for each task 10 blocks contains data, others dont
dense case: for each task most block contains data, few dont
test tool
test branches:
branch-1.5, master
memory usage
2.1 RoaringBitmap--sparse
Class Name | Objects | Shallow Heap | Retained Heap
org.apache.spark.scheduler.HighlyCompressedMapStatus| 4,095 | 131,040 | >= 34,135,920
my explaination: 4095 * short[4095-10] =4095 * 16 * 4085 / 8 ≈ 34,135,920
2.2.1 RoaringBitmap--full
org.apache.spark.scheduler.HighlyCompressedMapStatus| 4,095 | 131,040 | >= 360,360
my explaination:RoaringBitmap(0)
2.2.2 RoaringBitmap--very dense
org.apache.spark.scheduler.HighlyCompressedMapStatus| 4,095 | 131,040 | >= 1,441,440
my explaination:4095 * short[95] = 4095 * 16 * 95 / 8 = 778, 050 (+ others = 1441440)
2.3 BitSet--sparse
org.apache.spark.scheduler.HighlyCompressedMapStatus| 4,095 | 131,040 | >= 2,391,480
my explaination:4095 * 4095 =16,769,025 + (others = 2,391,480)
2.4 BitSet--full
org.apache.spark.scheduler.HighlyCompressedMapStatus| 4,095 | 131,040 | >= 2,391,480
my explaination:same as the above
conclusion
memory usage:( Retained Heap)
RoaringBitmap--full < RoaringBitmap--very dense < BitSet---full = BitSet--sparse < RoaringBitmap--sparse
In this specific case, for RoaringBitmap--sparse, use RoaringBitmap to mark non-empty blocks instead of empty ones. In this way, memory usage can stay low in all cases.
---------------Part II----------------
test cases
sparse case: 4085 empty --- S
dense case: 95 empty ---- D
test result
1 without
runOptimize
ortrim
2 with
runOptimze
andtrim
3 with
runOptinize
andtrim
, alsoisSparse
conclusion
runOptimize
, roaing saves a lot, especially in SrunOptimize
, separating sparse case seems not necessary in this 4095bit testseparating sparse case
idea?---- i don't have an env for such large tasks (HELP!)
---------------Part III----------------
For my questions in my last comment
continuous
uncontinuous
conclusion
In
uncontinuous
,runOptimize
failed and the size afterr.trim
neither became smaller as incontinuous