-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify BTree range_search
comments
#81312
Clarify BTree range_search
comments
#81312
Conversation
r? @shepmaster (rust-highfive has picked a reviewer for you, use r? to override) |
@@ -88,6 +87,7 @@ where | |||
(_, Unbounded) => max_node.last_edge(), | |||
}; | |||
|
|||
// This can't be used for safety. The `PartialOrd` impl can change between calls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is btree's own PartialOrd
(defined in node.rs, until #81094 reorganizes it away) and it is about the only real safety we have. If the condition fails, the iterator would never hit the other end and run off the parent of the root node (which is None) without a safe unwrap - that last safety has been "optimized" away).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. This is a merge conflict waiting to happen.
@shepmaster This PR is ready for review when you have time. |
Actually, if you're waiting on #81094, that might be better. |
☔ The latest upstream changes (presumably #81094) made this pull request unmergeable. Please resolve the merge conflicts. |
r? @m-ou-se |
@@ -94,7 +94,7 @@ impl<BorrowType: marker::BorrowType, K, V> NodeRef<BorrowType, K, V, marker::Lea | |||
K: Borrow<Q>, | |||
R: RangeBounds<Q>, | |||
{ | |||
// WARNING: Inlining these variables would be unsound (#81138) | |||
// It might be unsound to inline these variables if this logic changes (#81138). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good to tone down the warning, but I don't know what you mean with "if this logic changes". What logic? It might be bad to inline these variables without any other change, because successfully passing the match statement might fool you to believe the start and end actually used were fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What logic?
I meant the code in the rest of the method. This might be better:
// Inlining these variables should be avoided (#81138).
The rest of the comment already explains why inlining can cause a safety issue.
// SAFETY: This panic is used for safety, so external impls can't be called here. The | ||
// comparison is done with integers for that reason. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite confused here. If with "external impls" you mean the K: Ord
implementation, sure we can call it, we have to call it, we just had the find functions call it profusely. But we can't blindly trust it to implement a total order , and even if it does, it may not correspond to the .Q: Ord
implementation that we used to check the range bounds
The comparison is done with integers because those are what we end up with, what we need to identify positions in the data structure. For me, the question is: why do we compare and panic at all? Because if we didn't, and if the Ord
implementations are invalid or inconsistent, or until #81138 if the range
implementation was adversarial, we could easily and silently return a result that would lead callers to loop beyond the end of the tree, which they do unsafely with unwrap_unchecked
, because they rely on a promise delivered here but documented nowhere.
I thought "SAFETY:" was for documenting why calling something marked unsafe is in fact safe, and here it's kind of the opposite.
So I would add API documentation about the promise here and elsewhere (not sure if I could or should squeeze that into this PR), and change this comment to something like:
We've called an external `Ord` implementation and can't trust it implements a total order
nor (if type K != Q) that it corresponds to the Ord
implementation we used to check the range bounds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out I'm pretty wrong, after noticing there is no K: Ord
declared here. An inconsistency between K: Ord
and Q: Ord
doesn't play up here, only Q: Ord
is ever used. We assume that for various types of Q
used over time, the same order among keys prevails. We only use K: Ord
directly in some places like insert
, because we actually move the key into the map so there is no room to play for a Q
type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure we can call it, we have to call it, we just had the find functions call it profusely.
We can call it, but the result can't be used for a safety check. The user impl could return anything, which makes it useless.
The comparison is done with integers because those are what we end up with
Yes, but it's really because the Ord
impl can't be assumed to be consistent or performant. Otherwise, the integers shouldn't be necessary IIUC.
I thought "SAFETY:" was for documenting why calling something marked unsafe is in fact safe, and here it's kind of the opposite.
I debated putting it here. However, it does document that the panic allows using unsafe code later with user impls involved.
We've called an external
Ord
implementation and can't trust it implements a total order
This would be okay with me, but it should say something about the panic providing safety.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can call it, but the result can't be used for a safety check.
Again, I don't understand why you write that. We have called it, we needed to call it, it's the purpose of the function we're discussing. Yes, the user impl could return anything, but that doesn't make it useless. We need to know what it says in case it's not trying to pull our leg.
Otherwise, the integers shouldn't be necessary IIUC.
The purpose of this function is to translate a pair of bounds (let's say a pair of keys) to (a node and) a pair of integers, edge indices, indices into various arrays. To translate from the problem domain to the solution domain, in terms of the btree data structure.
I debated putting it here. However, it does document that the panic allows using unsafe code later with user impls involved.
In my view, the fact that later code is calling unsafe code under some assumption about the result is not an unsafety problem in this function and not the only reason to have the panic. Even if there was no unsafe code outside, without a panic countermeasures, the outside code could erroneously access key-value pairs outside the range. Well, that's up for debate, because if your Ord
stinks, there is no inside and outside a range. In any case, I think this function should define (which I tried in #83147) and honour its contract, and then callers can exploit those guarantees with any unsafe shenanigans they like.
We've called an external
Ord
implementation and can't trust it implements a total orderThis would be okay with me, but it should say something about the panic providing safety.
Actually, come to think of it, that's not even true. AFAIK the panic used to provide safety (to code outside the function) in the old code, but no longer after #81094. The line below the main panic (if lower_edge_idx < upper_edge_idx {
) is what provides adherence to the contract / external safety. If we'd leave out all the panics from this function, the whole btree code would be just as safe, it would merely return results as weird as the Ord
implementation, instead of barking at users.
All the panics in this function are only there to fulfill an additional service this function delivers, diagnosing user error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to know what it says in case it's not trying to pull our leg.
I agree. All I'm saying is that we can't make a safety decision based on it. For example, you can't decide if a range can be passed to slice::get_unchecked
based on a user impl.
I think this function should define (which I tried in #83147) and honour its contract, and then callers can exploit those guarantees with any unsafe shenanigans they like.
You wrote the new code, so whatever you prefer is good with me. The purpose of this comment was to document what part of the implementation of this function makes it safe, since that wasn't obvious when we talked about the function before your PR.
If we'd leave out all the panics from this function, the whole btree code would be just as safe, it would merely return results as weird as the
Ord
implementation, instead of barking at users.
I didn't realize this. In this case, I see no reason to have this comment. I'll remove it and fix the other one as we discussed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also didn't realize this, so it was a fruitful discussion.
@m-ou-se This PR is now ready for review. |
Thanks! @bors r+ rollup |
📌 Commit 35a2096 has been approved by |
☀️ Test successful - checks-actions |
These comments were added by #81169. However, the soundness issue might not be exploitable here, so the comments should be updated.
cc @ssomers