-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add iterator constructors and accessors from/to raw pointers #37921
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
Ok.. Request for comment! Does this need a bigger design to be accepted? Big topic: The ZST (zero-sized type) special case. It's not actually an important use case, but it's an aspect of the slice iterator anyway. I imagine most users won't have to deal with this case, but when they do, it needs more detailed docs so that it is possible to write correct code. |
cc @rust-lang/libs |
This seems reasonable to me. Could you elaborate more on the ZST special case? I don't think I quite grok it. |
Oh yeah, I didn't intend to that ZST would be understandable from the PR alone. If you look at the implementation of the slice iterator, basically every part of it has different behaviour for ZST vs others. Regular elements
ZST:
|
Looks nice! I mentioned on IRC that I can imagine setters for the pointers being useful, but I just realized that you can just a well reconstruct the whole iterator instead. |
What low level code wants to do this? The standard library doesn't seem to need it (it's not using it in this patch). It seems like that needs slice iterators should just create a slice. |
The patch for PR #30740 first was using the slice iterator for one part, and raw pointers for the ascii fast path. Then @dotdash came along and realized that the constant conversion back and forth between slice and iterator lead to some losses, and rewrote the function to only use slices (current formulation), which was an improvement. So that's one example where this would be useful. Second use case is https://docs.rs/odds/0.2.25/odds/slice/unalign/struct.UnalignedIter.html It right now uses a replacement custom iterator ( What I like here is the abstraction UnalignedIter and that it allows some blocked algorithms in safe code. It has a neat thing with the lossless hand-off from the main iterator to the tail end (.tail() method). I think I've found that using the same pointer through the whole iteration is more efficient than splitting the slice up front, and doing two or three separate iterations. This idea can be used both for the unaligned load iterator, or for a three-split for an aligned block as well. The motivation for UnalignedIter is a safe abstraction for this pseudo code let mut ptr: *const u8;
let end_block; // A whole number of blocks from ptr
let real_end;
while ptr != end_block {
ptr = ptr.offset(block_size);
}
while ptr != real_end {
ptr = ptr.offset(1);
} The example usage of UnalignedIter is: fn unalign_count_ones(data: &[u8]) -> u32 {
let mut total = 0;
let mut iter = UnalignedIter::<u64>::from_slice(data);
total += (&mut iter).map(|x| x.count_ones()).sum();
total += iter.tail().map(|x| x.count_ones()).sum();
total
} It can also be plugged in into the current memchr fallback code to reproduce about the same algorithm, but with a high level and safe interface (:wink: burntsushi). |
Here’s a question: consider a Going through slice avoids this problem altogether, but allowing the end pointer to be specificed may potentially lead to unintended behaviour IMHO. |
@nagisa It's certainly possible to provide bad input for this Slices have |
You cannot provide
That’s not true. As a counterexample alignment of Perhaps the documentation for function ought to mention this case; something along the lines of this function only accepting pointers such that |
Yep, the alignment part wasn't true. |
For slices's from_raw_parts we put in a length as an integer instead. So a specific criticism against it would be that it's vulnerable to arithmetic wraparound bugs if you subtract things the wrong way, for example. |
My gut here would be to push on It sounds like LLVM doesn't optimize Is there extra functionality we want here though like unaligned accesses and such? @nagisa does have a good point that we currently have a static guarantee that the start/end pointers are aligned, which we could have in theory forgotten to document (although now we explicitly could as part of the |
I've made a test case that can be a “benchmark” for I don't think that's very realistic, but at least it's good to formulate the target. I would prefer to have access to the raw parts of something as fundamental as the slice iterator, because it is useful for a lot of low level code. I've got a reimplementation of the slice iterator in an external crate with some extra features (which is just as well, third party crates are our experimentation & innovation space). This is also why I can say that it has a useful effect to have the raw parts constructor. |
@bluss hm I'm not sure I understand the play link? The optimized LLVM IR looks optimal in all cases there? |
|
Ah yes, they do differ in what looks like an instruction or two. So this still isn't answering my previous question though. So far we've confirmed that LLVM doesn't optimize this well (which may just be an LLVM bug). What I'm curious about is the motivation of this API beyond "nice to have", e.g.:
I'm assuming that any performance difference is an LLVM bug that's fixable. Those sorts of problems shouldn't force us to expose internals of the standard library. I'm also still very worried about the ZST problem, |
It does answer the llvm question that brson and you asked, that this fills a niche that is not reached through optimized slice::from_raw_parts. The slice iterator shouldn't deal with unaligned pointers. Std::slice::from_raw_parts has the same precondition that the raw pointer is well aligned. Take any raw pointer API and this will be a precondition unless otherwise stated IMO. Rust in general does make this assumption itself (the |
I don't think this is just nice to have, I've motivated that it's needed to lower overhead of some low level abstractions. |
Ok so it sounds like we're basically at a point where:
@bluss I'm curious about your thoughts on the ZST issue? Do you think we can paper over it here and the perf improvements are worth it? I'm also curious, do you think the perf differences here are simply LLVM bugs to fix, or are they perhaps more fundamental? |
I don't know anything definite if this optimization can be improved or not, I'm hoping that a contributor with better llvm knowledge can help answer that. My thought is to not expect too much; it would be about computing the same end pointer in two or multiple different ways, and expecting the compiler to see that they are equal. Reliance on the optimizer is reliance on llvm, impacts debug performance, and makes code vulnerable to optimization regressions. This API handles ZST correctly in the way that ZST iterators round trip correctly if you get the start/end pointer and make a new iterator with that. So what can go wrong: Creating a zst iterator manually (use a slice), stepping through raw pointer to ZST manually (already need special handling). I think that it is close to a non-issue, because these won't be used with ZST elements, but most often with specific types.
To fulfill Rust's promise of Zero cost abstractions 😄 I think we can make awesome things with this. As shown in a previous comment, I looked at the two consecutive raw pointer loops and tried to figure out, how can we express this in safe Rust? And this PR is part of the result, it's a needed part. |
cc @ranma42 What do you think about the optimization question? It is a question of |
I think it should be possible to optimise the computation. Apparently LLVM does not recognise that it is computing the same value twice. It might be possible to improve this on the LLVM side or to find out a different way to express the same values so that the equivalence is proved by the current optimisation pipeline. I will try to experiment with this. |
Gasp, I had a pretty long comment explaining what was going on, but a misclick erased it (I should stop writing long comments in the browser). Long story short, I worked on a slightly simpler case ( Additionally the version of LLVM currently used by Rust does not include yet the patch from https://github.com/llvm-project/llvm/commit/cf6e9a81f676b3e3885f86af704e834ef5c04264, which would remove the bound checking (and panic handling) in the |
Thanks for the investigation @ranma42! To clarify, does "LLVM tip" optimize this the same way? Or is it still missing a "hypothetical optimization" to get these two examples to have equivalent codegen? |
@alexcrichton rust-llvm + https://github.com/llvm-project/llvm/commit/cf6e9a81f676b3e3885f86af704e834ef5c04264 optimises away the bound check from In order to get all of the functions in the simplified example to optimise completely, another patch like the one here is needed. I will try to improve it a little (for example it definitely needs a testcase) before proposing it upstream, but it already passes the LLVM test suite :D I think that in order to fully optimise the code by @bluss other patches might be needed. It looks like LLVM does not infer the alignment of the slice pointers (or Rust does not provide this information?) and manually realigns the length by adding an |
I just want to emphasize that the bounds check wasn't even present in the code I originally asked about, so it's a separate thing. |
@bluss Yes, sorry, I added it in the code I was testing because |
Interesting! So the GEP optimization still doesn't bring the two |
No, for the original example ( It should be possible to optimise away the |
LLVM patch for GEP is here |
llvm improvements are awesome! |
Awesome, thanks for that patch @ranma42! Do you know if those extra optimizations can be implemented in LLVM, or do they rely on the source? |
Sorry for the late reply. Apparently LLVM has some of the needed optimisations, but they need to be extended. Specifically, it looks like some GEP patterns only match on signed operations. There might be some additional interaction with |
oh the alignment question is a bit more clear when comparing to UnalignedIter I realized (see /pull/38352). The latter can't give out elements by reference, only copies. Since the slice iterators yield references, we know (follow up: do we?) they can only deal with aligned data. |
Would it be possible to cover these with some unit tests or write some compilable examples, or is it covered by some existing tests possibly ? |
Sure, they should get the regular kind of unit tests |
Oh sorry this accidentally fell off my radar. I would still be quite uncomfortable stabilizing functions like this, but it seems fine that if we want to test out the performance we land unstable versions and see how things go either by updating LLVM in the meantime or by pushing on APIs like this. Does that sounds ok? |
@bluss thoughts on my most recent comment? |
I'm somewhat sympathetic with making these functions available because it makes the intended optimization clearer, especially if getting llvm to optimize it is tricky. But landing them now as unstable seems like a reasonable starting place to me. |
@alexcrichton I'm sorry! I was meaning to answer twice, not sure what thoughts derailed my answers. I have accepted that the two cases makes it an API that's hard to use correctly with generic code, and that's not good. We have So before this derails again, how do we design an I'm ok with skipping these for now due to the complications. It is a shame that the slice iterator is still implemented with unstable-only fairy dust (specialization and |
Yeah it's a good point in general, and one that I'm not entirely sure how to solve. Given your thoughts though I'll close this for now (or at least that's my interpretation, feel free to reopen!) but do you want to move discussion over to an issue perhaps? |
Add a constructor for creating a slice iterator from a pair of start/end pointers. Also add accessors for getting the start/end pointers.
The motivation is that low level code wants to be able to create iterators losslessly from raw pointers, without computing an intermediate slice (computing length from the pointers and then back to pointer pair in
.iter()
).