-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BTreeMap/BTreeSet drain & retain #66747
Conversation
r? @rkruppe (rust_highfive has picked a reviewer for you, use r? to override) |
I don't know the BTree code at all and the stuff added here seems somewhat subtle, so I don't feel confident reviewing this PR. Perhaps @Gankra can review or suggest another reviewer? |
This 2nd commit doesn't change drain or retain, but increases code reuse and (just a wee bit) iterator performance. Instead of only retain sharing the guts of RangeMut::next_unchecked, all 7 iterators share the same 2 functions defined by 1 macro. This changes a detail in the iterators over immutable maps (i.e. Range::next_unchecked and Range::next_back_unchecked): they use ptr::read to copy the source handle, like all the rest does, instead of an implicit copy. This is some 5% faster (PS probably the 5% is due to the order swapping or coincidence, because I now think ptr::read is exactly the same as copy). There's no change in Secondly, this swaps the order in which the two immutable and owned iterators perform Comparing all benchmarks that compiled on the original code, with the same nightly build:
As you see, there's a serious and reproducible boost to the build_and_* tests, for which I have no explanation at all. |
By the way, the original, more modest commit is at https://github.com/ssomers/rust/tree/%2342849_v1 |
Do we need to do anything special to check this against miri? BTreeMap has historically been very good at running afoul of it. |
ok yeah sorry i don't have the spare bandwidth for this review |
r? @scottmcm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drain
is relatively easy and therefore reliable.I'm not quite sure that
retain
is trustworthy, given how noob I am, how much unsafe code there is all around, and that I don't quite understand why #58431 happened, the code comments there, or how it can be tested, although in theory this code respects the order introduced there.
For this reason I would suggest splitting up this PR -- the easy parts can be easily merged, while the hard parts can get a more focused review.
/// assert!(a.is_empty()); | ||
/// ``` | ||
#[unstable(feature = "btree_drain_retain", issue = "42849")] | ||
pub fn drain(&mut self) -> IntoIter<K, V> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should return a new Drain
type, just like HashMap::drain
does, even if that's still just a newtype wrapping IntoIter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect BTreeSet::drain
to remove a range of keys, much like Vec::drain
removes a range of indices. There are ways to implement this that are more efficient than repeated calls to remove
, but they are pretty complex. Nevertheless, I think it would be a mistake to commit to this signature for drain
, since range deletion is quite useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same expectation about drain and asked in #42849. I guess the signature should be discussed first in an RFC for drain specifically?
/// assert!(a.is_empty()); | ||
/// ``` | ||
#[unstable(feature = "btree_drain_retain", issue = "42849")] | ||
pub fn drain(&mut self) -> IntoIter<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also needs a new Drain
type.
let root2 = unsafe { ptr::read(&root).into_ref() }; | ||
let front = first_leaf_edge(root1); | ||
let back = last_leaf_edge(root2); | ||
IntoIter { front, back, length } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even simpler: mem::replace(self, BTreeMap::new()).into_iter()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right for the BTreeMap::new(), but isn't the mem::forget(self);
in into_iter going to wreck this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, into_iter
is intentionally deconstructing the map, just as it always would. It will only affect the one you pull out of self
, and the new()
one will remain untouched.
However, the fact that this is so simple reduces its value, as anyone could easily write that replace(...).into_iter()
already. A drain(range)
or drain(start, end)
would be more compelling, and not so trivial either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drain(start, end)
seems an odd alternative to me - even Vec and VecDeque have a RangeBounds
argument, while they don't have a range
members.
I do have my own odd alternative though: as a member of RangeMut
, but that simply cannot adjust length. So rather, a member of IterMut
, but that can't be initialized with bounds. Oh boy, this is hard...
Musical chairs 🥁 r? @cuviper |
Turns out that (probably) none of the 3 parts are easy. In anticipation of a choice over the signature of
or instead:
I have no idea what's best. |
It's not an easy judgement call, but the goal is to reduce the review footprint, especially when it involves so much unsafe code. If "sharing/opening up" existing code would show any benefit in its own, especially if it yields an overall simplification, then that sounds like a good first step. |
Here's my plan:
|
Attempt at providing and testing drain and retain members on BTreeMap and BTreeSet as requested in rust-lang/rfcs#1338.
drain
is relatively easy and therefore reliable.I'm not quite sure that
retain
is trustworthy, given how noob I am, how much unsafe code there is all around, and that I don't quite understand why #58431 happened, the code comments there, or how it can be tested, although in theory this code respects the order introduced there.According to this new benchmark,
retain
is worth the trouble even if you end up retaining nothing:Most of the time in these is spent building the set (I don't know how to separate that off) but _retain_nothing is faster than _pop_all (which is also unstable), and way faster than _remove_all (a stable way to naively implement retain). That's because it's optimized to remove from leaf nodes as long as they remain sufficiently full. It's probably easy to extend that to leaf nodes that are the root node and are therefore allowed to underflow, and it's probably possible when stealing, but I tried to minimize changes in existing code.