-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up SparseBitMatrix
use in RegionValues
.
#52250
Conversation
Here are the instruction count improvements exceeding 1%:
Here are the max-rss changes exceeding 1%:
|
Here are some measurement of how full the BitMatrix instances get, for
Note the very large one for (17) which dominates.
|
I just measured |
6556f64
to
63abbed
Compare
The max-rss increase for
This is 77.5 MiB, and it gets doubled because it gets cloned here:
I tried getting rid of that clone -- which would greatly reduce the max-rss increase -- by transferring ownership of the Anyway, even if the clone remains, some benchmarks take more memory but some take less, so it's basically a wash on that front, and the speed improvements are large enough to make this compelling. |
63abbed
to
7786a74
Compare
I was able to speed up |
Hmm, this change will interact poorly with @davidtwco's changes in #52190, because I think that in that context we don't know the number of region variables when we allocate the We might want a kind of hybrid -- maybe we want to modify (The family of bitset types also needs a bit of cleanup... this change though might allow us to remove the "buf vs slice" distinction which would simplify things.) |
@nnethercote note that the final values will probably be affected also by rebasing over #51987, which .. modifies that clone sort of. (The clone is removed, but a variant of it remains.) We could probably free the liveness matrix at some point, though it wouldn't affect peak memory usage. It would potentially require a bit of work on the diagnostic side. |
Probably worth testing, in any case. |
I just got "try" privileges, so I'm doing to test them in this PR. @bors try |
@nnethercote: 🔑 Insufficient privileges: not in try users |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
☔ The latest upstream changes (presumably #51987) made this pull request unmergeable. Please resolve the merge conflicts. |
OK, so #51987 has landed -- @nnethercote do you have thoughts on whether it makes sense to continue with this PR? |
BitMatrix
instead of SparseBitMatrix
in RegionValues
.SparseBitMatrix
use in RegionValues
.
Hmm, those results look great! One concern though: in the branch I'm working on, I'm growing the number of elements in the matrix on the fly, which I guess wouldn't be compatible with this change. I'm thinking about how to solve this -- one way might be to split up the matrices into pieces. So for example we could store one matrix for points and then a separate matrix for regions. |
If the number of rows is growing, it should be fine as is. If the number of columns is growing, then that's different... it should be possible to make the number of columns in |
It's the number of columns that changes, yes, but I suspect I may be able to finesse it by breaking things into two matrices -- one for "points" and one for "regions" -- and allocating them at separate times. (In other words, we'd wait to allocate the region matrix until we know its proper size.) To that end, I think we should probably land this PR, and I can try to rebase over it. |
@bors r+ |
📌 Commit 9bfd1c17620a88e8d24f3bcd7710976522c202ee has been approved by |
⌛ Testing commit 9bfd1c17620a88e8d24f3bcd7710976522c202ee with merge 4ebbaa1809e39d08dfc02e86b3c59612e61a7eed... |
💔 Test failed - status-appveyor |
Using a `BTreeMap` to represent rows in the bit matrix is really slow. This patch changes things so that each row is represented by a `BitVector`. This is a less sparse representation, but a much faster one. As a result, `SparseBitSet` and `SparseChunk` can be removed. Other minor changes in this patch. - It renames `BitVector::insert()` as `merge()`, which matches the terminology in the other classes in bitvec.rs. - It removes `SparseBitMatrix::is_subset()`, which is unused. - It reinstates `RegionValueElements::num_elements()`, which rust-lang#52190 had removed. - It removes a low-value `debug!` call in `SparseBitMatrix::add()`.
9bfd1c1
to
798209e
Compare
@bors retry |
Apparently I have "try" permissions but not "retry" permissions. @nikomatsakis, can you reapprove this? Thanks. |
@nnethercote retry works only if you haven't pushed anything new. Pushing a new commit does require re-r+. |
@bors delegate=nnethercote |
✌️ @nnethercote can now approve this pull request |
@bors r+ |
📌 Commit 798209e has been approved by |
Speed up `SparseBitMatrix` use in `RegionValues`. In practice, these matrices range from 10% to 90%+ full once they are filled in, so the dense representation is better. This reduces the runtime of Check Nll builds of `inflate` by 32%, and several other benchmarks by 1--5%. It also increases max-rss of `clap-rs` by 30% and a couple of others by up to 5%, while decreasing max-rss of `coercions` by 14%. I think the speed-ups justify the max-rss increases. r? @nikomatsakis
☀️ Test successful - status-appveyor, status-travis |
In practice, these matrices range from 10% to 90%+ full once they are
filled in, so the dense representation is better.
This reduces the runtime of Check Nll builds of
inflate
by 32%, andseveral other benchmarks by 1--5%.
It also increases max-rss of
clap-rs
by 30% and a couple of others byup to 5%, while decreasing max-rss of
coercions
by 14%. I think thespeed-ups justify the max-rss increases.
r? @nikomatsakis