-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add core::intrinsics::simd #118853
Add core::intrinsics::simd #118853
Conversation
Cc @rust-lang/opsem |
This comment has been minimized.
This comment has been minimized.
🤨 I forgot about that "fun" detail. Nevermind, I guess. |
actually, while we're here, does anyone know if these are actually incompatible signatures, or is that lint just overly strict...? |
doesn't that mean the rename should also happen in I can't imagine that that generic parameter names would matter here, but it still seems like a good idea to ensure the names are the same for clarity. |
Yeah that looks like an overeager lint to me. So it can be allowed temporarily until the names are back in sync.
|
library/core/src/intrinsics/simd.rs
Outdated
/// The bitmask is always packed into the smallest/first bits, but the order is LSB-first for | ||
/// little endian and MSB-first for big endian. | ||
/// In other words, the LSB corresponds to the first vector element for little endian, | ||
/// and the last vector element for big endian. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what this means.^^ There's too many different notions of "first" here. Also, there's two cases to consider (output a packed int vs output an array).
What about something like this:
No matter whether the output is an array or an unsigned integer, it is treated as a single contiguous list of bits. The bitmask is always packed on the least-significant side of the output, and padded with 0s in the most-significant bits. The order of the bits depends on endianess:
- On little endian, the least significant bit corresponds to the first vector element.
- On big endian, the least significant bit corresponds to the last vector element.
I think this also needs examples to have any chance of being comprehensible.
As always, please first state the types and then start discussing the details; without knowing what U
is the rest of this is even harder to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is much better. I took your text and added a simple example
446b0ce
to
0d1e39d
Compare
Co-authored-by: Ralf Jung <post@ralfj.de>
Co-authored-by: Ralf Jung <post@ralfj.de>
c8c07a5
to
d655dd6
Compare
This comment has been minimized.
This comment has been minimized.
I thought those intrinsics were added? Or is this something bootstrap compiler related? |
Yeah this probably needs |
Looking good. :) We can always refine this later if more things come up. @bors r+ |
Yep! And one more feature gate to remove |
☀️ Test successful - checks-actions |
Finished benchmarking commit (558ac1c): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 672.757s -> 673.009s (0.04%) |
/// | ||
/// `mask` must only contain `0` or `!0` values. | ||
#[cfg(not(bootstrap))] | ||
pub fn simd_masked_load<V, U, T>(mask: V, ptr: U, val: T) -> T; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In implementing this, I am confused. Isn't this equivalent to simd_gather
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gather accepts a vector of N pointers, where each element will be loaded from its corresponding pointer.
masked_load accepts a single pointer and all elements of the resulting vector (when unmasked) are loadded from a constant offset from that pointer. i.e the first element will be loaded from ptr
, second from ptr.offset(1)
, and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the documentation here is wrong. For masked_load it says
/// `U` must be a vector of pointers to the element type of `T`, with the same length as `T`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, looks like we missed this. I'll create a follow up PR soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intended to close rust-lang/portable-simd#381.
r? ralfjung