-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Struct target features RFC #3525
base: master
Are you sure you want to change the base?
Struct target features RFC #3525
Conversation
#[target_feature(enable = "avx")] | ||
#[derive(Clone, Copy, Debug)] | ||
pub struct Avx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that this must have language support to work, but it is not clear to me why this must be a language feature exposed to users? Maybe the standard library should be the one defining the interface to these structs, and hide the language feature that makes it work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same thought. When I asked on the Zulip thread for this pre-RFC/RFC I was told that such a design would essentially be fine if the target feature standard library structs could still compose into larger structs (in user code):
struct AvxFma(Avx, Fma);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, the composability idea was mentioned in the "future possibilities" of the RFC. i didn't know if it would be better to prioritize that or visibility to users
As a distraction, clang and gcc support multi versioning for functions: |
An advantage of the current proposal is that the indirection/specialization can be hoisted higher without an exclusive entry point, with inlining freely supported beneath. |
I love the concept of integrating target features with the type system. It's a creative way to solve the problem of statically guaranteeing you have already detected a given feature. How do you anticipate this scaling for code that has a half-dozen or more different optimized versions, for a few different targets, and wants to handle both compile-time and runtime detection? Could you give a sketch of how you think such code could look? Using a generic for the feature type avoids duplication at the call site and the called function. What I'm wondering about is how the detection scales. Would it potentially make sense for the standard library to have a single magic That wouldn't preclude us from also having individual types for code that wants to statically guarantee a particular feature. |
i don't know if the way i envision it is scaling up in user code is something like this (simplified), which is based on the design i chose for my simd project https://docs.rs/pulp/0.18.8/pulp/trait.Simd.html with a bit more work, the code can be made even generic over the data type, and this scales up quite well and forms the basis of this is only an example, and alternative designs may be possible. so im just sharing what already works for me use core::arch::{Scalar, x86_64};
use core::mem::transmute;
use bytemuck::Pod;
trait F64Simd {
type f64s: Copy + Pod; // plus any other traits a user might want
pub fn add_f64s(self, a: Self::f64s, b: Self::f64s) -> Self::f64s;
}
impl F64Simd for Scalar {
type f64s = f64;
#[inline]
pub fn add_f64s(self, a: Self::f64s, b: Self::f64s) -> Self::f64s {
a + b
}
}
impl F64Simd for x86_64::Avx {
type f64s = [f64; 4];
#[inline]
pub fn add_f64s(self, a: Self::f64s, b: Self::f64s) -> Self::f64s {
unsafe { x86_64::_mm256_add_pd(transmute(a), transmute(b)) }
}
}
impl F64Simd for x86_64::Avx512f {
type f64s = [f64; 8];
#[inline]
pub fn add_f64s(self, a: Self::f64s, b: Self::f64s) -> Self::f64s {
unsafe { x86_64::_mm512_add_pd(transmute(a), transmute(b)) }
}
}
pub fn add_comptime<S: F64Simd>(simd: S, dst: &mut [f64], a: &[f64], b: &[f64]) {
// assume the slice length is a multiple of the register size for simplicity
let dst = bytemuck::cast_slice_mut::<f64, S::f64s>(dst);
let a = bytemuck::cast_slice::<f64, S::f64s>(a);
let b = bytemuck::cast_slice::<f64, S::f64s>(b);
for ((dst, &a), &b) in dst.iter_mut().zip(a).zip(b) {
*dst = simd.add_f64s(a, b);
}
}
pub fn add_runtime(dst: &mut [f64], a: &[f64], b: &[f64]) {
if let Some(simd) = Avx512f::try_new() {
return add_comptime(simd, dst, a, b);
}
if let Some(simd) = Avx::try_new() {
return add_comptime(simd, dst, a, b);
}
return add_comptime(Scalar, dst, a, b);
} |
This RFC is based on assumption, that |
i am not making that assumption. In fact note that the dispatch happens outside the loop, which means that we only check the availability once before using a vectorized impl on the whole slice if needed, the dispatch can be moved further from the inner loop if there are multiple layers and you want everything to get inlined |
FWIW I really like this proposal, and I think it has a lot of potential for safe low-level SIMD. I am taking a stab at implementing it to see how the resulting code would look like, as well as possible issues (for example: how does this interact with the ABI of the function? What about function pointers?). Will update when I have something somewhat more close to being mergeable :-) |
This RFC is exactly what I want, and I've even created a lot of the same structure myself by hand to get something similar in today's Rust. I have types that correspond to SIMD implementations of my algorithm's primitives that are unsafe to construct and then parameterize my algorithm with those types. As pseudocode, it looks like trait Primitives {
fn operation_1(&self);
fn operation_2(&self);
}
mod scalar {
pub struct Scalar;
impl Primitives for Scalar {
fn operation_1(&self) {}
fn operation_2(&self) {}
}
}
mod neon {
pub struct Neon(());
impl Neon {
unsafe fn new_unchecked() -> Self { Self(()) }
}
impl Primitives for Neon {
fn operation_1(&self) { unsafe { operation_1_neon() } }
fn operation_2(&self) { unsafe { operation_2_neon() } }
}
// Annoying
#[target_feature(enable = "neon")]
unsafe fn operation_1_neon(&self) {}
// Annoying
#[target_feature(enable = "neon")]
unsafe fn operation_2_neon() {}
}
struct MyAwesomeHasher;
impl std::hash::Hasher for MyAwesomeHasher {
fn write(&mut self, bytes: &[u8]) {
// Annoying
#[target_feature(enable = "neon")]
unsafe fn do_neon(primitives: impl Primitives, this: &mut MyAwesomeHasher, bytes: &[u8]) {
write_common(primitives, this, bytes)
}
// Annoying
fn do_scalar(primitives: impl Primitives, this: &mut MyAwesomeHasher, bytes: &[u8]) {
write_common(primitives, this, bytes)
}
if is_aarch64_feature_detected("neon") {
unsafe { do_neon(neon::Neon::new_unchecked(), self, bytes) }
} else {
do_scalar(scalar::Scalar::new(), self, bytes)
}
}
// Ditto all that for `Hasher::finish`
}
fn write_common(primitives: impl Primitives, bytes: &[u8]) {}
// Assume everything has an inline on it, some are `inline(always)`. I've annotated a few spots with "annoying" where I have to step out of my normal Rust flow and do something janky just to be able to use trait Primitives {
fn operation_1(&self);
fn operation_2(&self);
}
mod scalar {
pub struct Scalar;
impl Primitives for Scalar {
fn operation_1(&self) {}
fn operation_2(&self) {}
}
}
mod neon {
#[target_feature(enable = "neon")]
pub struct Neon(());
// Yay! Get the unsafe constructor for free
// Yay! No longer have to pull bodies out to new functions
impl Primitives for Neon {
fn operation_1(&self) { }
fn operation_2(&self) { }
}
}
// Yay! Now I know that my SIMD usage in my code will always be NEON or scalar or ...
enum MyAwesomeHasher {
Neon(MyAwesomeHasherRaw<neon::Neon>),
Scalar(MyAwesomeHasherRaw<scalar::Scalar>),
}
impl MyAwesomeHasher {
fn new() -> Self {
if is_aarch64_feature_detected("neon") {
unsafe { Self::Neon(MyAwesomeHasherRaw(neon::Neon::new_unchecked())) }
} else {
Self::Scalar(MyAwesomeHasherRaw(scalar::Scalar::new()))
}
}
}
impl std::hash::Hasher for MyAwesomeHasher {
// Assume a delegating call in here, enum-dispatch style
}
// Yay! I can now give callers an easy way to force a specific SIMD
// implementation.
struct MyAwesomeHasherRaw<P>(P);
impl<P: Primitive> std::hash::Hasher for MyAwesomeHasherRaw<P> {
fn write(&mut self, bytes: &[u8]) {
// Yay! No longer have to have the little shim functions
// Yay! No longer have to pull bodies out to new functions
}
// Ditto all that for `Hasher::finish`
} I've very excited to see how this progresses! Thank you for the awesome RFC @sarah-ek ! |
Implement a first version of RFC 3525: struct target features This PR is an attempt at implementing rust-lang/rfcs#3525, behind a feature gate `struct_target_features`. There's obviously a few tasks that ought to be done before this is merged; in no particular order: - add proper error messages - add tests - create a tracking issue for the RFC - properly serialize/deserialize the new target_features field in `rmeta` (assuming I even understood that correctly :-)) That said, as I am definitely not a `rustc` expert, I'd like to get some early feedback on the overall approach before fixing those things (and perhaps some pointers for `rmeta`...), hence this early PR :-) Here's an example piece of code that I have been using for testing - with the new code, the calls to intrinsics get correctly inlined: ```rust #![feature(struct_target_features)] use std::arch::x86_64::*; /* // fails to compile #[target_feature(enable = "avx")] struct Invalid(u32); */ #[target_feature(enable = "avx")] struct Avx {} #[target_feature(enable = "sse")] struct Sse(); /* // fails to compile extern "C" fn bad_fun(_: Avx) {} */ /* // fails to compile #[inline(always)] fn inline_fun(_: Avx) {} */ trait Simd { fn do_something(&self); } impl Simd for Avx { fn do_something(&self) { unsafe { println!("{:?}", _mm256_setzero_ps()); } } } impl Simd for Sse { fn do_something(&self) { unsafe { println!("{:?}", _mm_setzero_ps()); } } } struct WithAvx { #[allow(dead_code)] avx: Avx, } impl Simd for WithAvx { fn do_something(&self) { unsafe { println!("{:?}", _mm256_setzero_ps()); } } } #[inline(never)] fn dosomething<S: Simd>(simd: &S) { simd.do_something(); } fn main() { /* // fails to compile Avx {}; */ if is_x86_feature_detected!("avx") { let avx = unsafe { Avx {} }; dosomething(&avx); dosomething(&WithAvx { avx }); } if is_x86_feature_detected!("sse") { dosomething(&unsafe { Sse {} }) } } ``` Tracking: - rust-lang#129107
Implement a first version of RFC 3525: struct target features This PR is an attempt at implementing rust-lang/rfcs#3525, behind a feature gate `struct_target_features`. There's obviously a few tasks that ought to be done before this is merged; in no particular order: - add proper error messages - add tests - create a tracking issue for the RFC - properly serialize/deserialize the new target_features field in `rmeta` (assuming I even understood that correctly :-)) That said, as I am definitely not a `rustc` expert, I'd like to get some early feedback on the overall approach before fixing those things (and perhaps some pointers for `rmeta`...), hence this early PR :-) Here's an example piece of code that I have been using for testing - with the new code, the calls to intrinsics get correctly inlined: ```rust #![feature(struct_target_features)] use std::arch::x86_64::*; /* // fails to compile #[target_feature(enable = "avx")] struct Invalid(u32); */ #[target_feature(enable = "avx")] struct Avx {} #[target_feature(enable = "sse")] struct Sse(); /* // fails to compile extern "C" fn bad_fun(_: Avx) {} */ /* // fails to compile #[inline(always)] fn inline_fun(_: Avx) {} */ trait Simd { fn do_something(&self); } impl Simd for Avx { fn do_something(&self) { unsafe { println!("{:?}", _mm256_setzero_ps()); } } } impl Simd for Sse { fn do_something(&self) { unsafe { println!("{:?}", _mm_setzero_ps()); } } } struct WithAvx { #[allow(dead_code)] avx: Avx, } impl Simd for WithAvx { fn do_something(&self) { unsafe { println!("{:?}", _mm256_setzero_ps()); } } } #[inline(never)] fn dosomething<S: Simd>(simd: &S) { simd.do_something(); } fn main() { /* // fails to compile Avx {}; */ if is_x86_feature_detected!("avx") { let avx = unsafe { Avx {} }; dosomething(&avx); dosomething(&WithAvx { avx }); } if is_x86_feature_detected!("sse") { dosomething(&unsafe { Sse {} }) } } ``` Tracking: - rust-lang#129107
Implement a first version of RFC 3525: struct target features This PR is an attempt at implementing rust-lang/rfcs#3525, behind a feature gate `struct_target_features`. There's obviously a few tasks that ought to be done before this is merged; in no particular order: - add proper error messages - add tests - create a tracking issue for the RFC - properly serialize/deserialize the new target_features field in `rmeta` (assuming I even understood that correctly :-)) That said, as I am definitely not a `rustc` expert, I'd like to get some early feedback on the overall approach before fixing those things (and perhaps some pointers for `rmeta`...), hence this early PR :-) Here's an example piece of code that I have been using for testing - with the new code, the calls to intrinsics get correctly inlined: ```rust #![feature(struct_target_features)] use std::arch::x86_64::*; /* // fails to compile #[target_feature(enable = "avx")] struct Invalid(u32); */ #[target_feature(enable = "avx")] struct Avx {} #[target_feature(enable = "sse")] struct Sse(); /* // fails to compile extern "C" fn bad_fun(_: Avx) {} */ /* // fails to compile #[inline(always)] fn inline_fun(_: Avx) {} */ trait Simd { fn do_something(&self); } impl Simd for Avx { fn do_something(&self) { unsafe { println!("{:?}", _mm256_setzero_ps()); } } } impl Simd for Sse { fn do_something(&self) { unsafe { println!("{:?}", _mm_setzero_ps()); } } } struct WithAvx { #[allow(dead_code)] avx: Avx, } impl Simd for WithAvx { fn do_something(&self) { unsafe { println!("{:?}", _mm256_setzero_ps()); } } } #[inline(never)] fn dosomething<S: Simd>(simd: &S) { simd.do_something(); } fn main() { /* // fails to compile Avx {}; */ if is_x86_feature_detected!("avx") { let avx = unsafe { Avx {} }; dosomething(&avx); dosomething(&WithAvx { avx }); } if is_x86_feature_detected!("sse") { dosomething(&unsafe { Sse {} }) } } ``` Tracking: - rust-lang#129107
Implement a first version of RFC 3525: struct target features This PR is an attempt at implementing rust-lang/rfcs#3525, behind a feature gate `struct_target_features`. There's obviously a few tasks that ought to be done before this is merged; in no particular order: - add proper error messages - add tests - create a tracking issue for the RFC - properly serialize/deserialize the new target_features field in `rmeta` (assuming I even understood that correctly :-)) That said, as I am definitely not a `rustc` expert, I'd like to get some early feedback on the overall approach before fixing those things (and perhaps some pointers for `rmeta`...), hence this early PR :-) Here's an example piece of code that I have been using for testing - with the new code, the calls to intrinsics get correctly inlined: ```rust #![feature(struct_target_features)] use std::arch::x86_64::*; /* // fails to compile #[target_feature(enable = "avx")] struct Invalid(u32); */ #[target_feature(enable = "avx")] struct Avx {} #[target_feature(enable = "sse")] struct Sse(); /* // fails to compile extern "C" fn bad_fun(_: Avx) {} */ /* // fails to compile #[inline(always)] fn inline_fun(_: Avx) {} */ trait Simd { fn do_something(&self); } impl Simd for Avx { fn do_something(&self) { unsafe { println!("{:?}", _mm256_setzero_ps()); } } } impl Simd for Sse { fn do_something(&self) { unsafe { println!("{:?}", _mm_setzero_ps()); } } } struct WithAvx { #[allow(dead_code)] avx: Avx, } impl Simd for WithAvx { fn do_something(&self) { unsafe { println!("{:?}", _mm256_setzero_ps()); } } } #[inline(never)] fn dosomething<S: Simd>(simd: &S) { simd.do_something(); } fn main() { /* // fails to compile Avx {}; */ if is_x86_feature_detected!("avx") { let avx = unsafe { Avx {} }; dosomething(&avx); dosomething(&WithAvx { avx }); } if is_x86_feature_detected!("sse") { dosomething(&unsafe { Sse {} }) } } ``` Tracking: - rust-lang/rust#129107
Adding some comments / proposed changes to the RFC text coming from implementation experience (see rust-lang/rust#129881 (comment)):
I'd be willing to update the RFC to reflect these changes, but I am not sure what's the best way to do so. |
im fine with updating the rfc in the next few days. allowing annotations on non-unit structs and compatibility with target feature 1.1 sound good to me if recursing into arbitrary types isn't required, then i think we can also skip |
I don't have a super strong opinion, but allowing Another thing that came up in implementation: it makes life a lot easier to require that functions that get features from their arguments be annotated with |
sounds good! i'll update the wording with that in mind |
imo it would have much nicer ergonomics to not require a // the user shouldn't need to care that `FastVec` enables a target feature using `OpaqueTag`
// to enable running the arithmetic quickly and still have decent inlining
fn do_some_math<OpaqueTag>(v: FastVec<f32, OpaqueTag>, v2: FastVec<f32, OpaqueTag>) -> f32 {
(v * v2).reduce_sum()
} |
I agree that not requiring an attribute might be more ergonomic. However, implementation-wise, I believe there's some consensus that doing so would make MIR inlining of generics pretty much unfeasible, so that seems like a hard sell. Moreover, there are some concerns about how "magically" adding target features might break inlining of the feature'd function into other functions, causing surprising performance issues, which seem to me a reasonable argument towards being more explicit even without the MIR inlining issues. |
…eriting target features
changes: changed implicit feature inheritance to opt-in with this can also be used on structs and tuple structs to inherit the target features of their members it's not clear to me if references should be implicitly inheriting the features, as they may potentially point to uninit. |
Because of issues like rust-lang/rust#116558, I think it's dangerous to implicitly add target features to a function based on user-defined structs without an attribute making it clear that the target features are changing. I think the indirection involved in creating compositions of these structs is a bit risky too. Maybe this RFC should explore what it would take to have those structs defined by std, or justify why that isn't possible. |
Note that those issues are getting fixed -- see rust-lang/rust#127731. |
It may in fact be fixed in rust-calling-rust, but you may still care about what target features are being used for any variety of reasons. The attribute seems to be presented as an alternative currently, not as part of the main proposal:
Either way, I think having the types part of std is generally more approachable, and this RFC should at least address that |
having the base types be part of std is possible, but that would still require composability to be possible on the user side if we want people to be able to mix and match, which is needed for non trivial simd stuff the target feature inheritance is opt-in in the current version. the proposed alternative is to further narrow that down by specifying which parameters the features are inherited from |
the composition is also opt-in, for what it's worth |
ah damn, i forgot to update the link one sec |
should be good now |
I pulled down the in-progress PR and tried it against my code. Things worked out nicely and I was able to remove some boilerplate / levels of indirection by inlining what were previously free functions back into trait methods. I had a few points of feedback. I'm not sure if these are more relevant to the RFC or the PR, but I'll put them here to be on the safe side:
Thanks for the good work! |
I think we should expect references to point to a valid instance of the pointee type; just because that may not be a validity requirement shouldn't mean it can't be a safety requirement -- aka. if your function accepts references to invalid instances of a type (assuming those references aren't instantly UB to even create), your function needs to be |
Small detail, this makes the function
You don't really get UB in safe code, because safe code can't create an unsafe value (here a reference pointing to an invalid value), even though calling the function is safe, you still call it with an unsafe value. And whoever produced this unsafe value had to make sure all safe code manipulating this value, does so without UB.
Yes, that's one option: it is UB to pass an unsafe value to a function that enables target features based on that value. Another option could be to simply only enable target features if the struct is passed by value. No nesting allowed (including below references or as a field of a bigger type). Do we have motivating examples why this would be restrictive? |
The main example would be to let an hypothetical AvxU32Vector implement std::ops::PartialCmp. |
The reason I brought it up was to be able to write code along these lines: fn dispatch<R>(f: impl FnOnce(&dyn Vector) -> R) -> R {
#[cfg(all(target_arch = "aarch64", feature = "std"))]
{
if std::arch::is_aarch64_feature_detected!("neon") {
// Safety: We just ensured we have the NEON feature
return unsafe { f(&neon::Impl) };
}
}
// ...other platforms...
f(&scalar::Impl)
}
impl<S> hash::Hasher for RawHasher<S>
where
S: FixedBuffer,
{
#[inline]
fn write(&mut self, mut input: &[u8]) {
dispatch_f(|vector| { /* code that is parameterized over the SIMD implementation */ })
}
} If non-lifetime binders stabilizes and allows writing something like |
I expect that It could be done using: trait Callback {
type Output;
fn callback<V: Vector>(self, v: V) -> Self::Output;
}
fn dispatch<R>(c: impl Callback<Output = R>) -> R {
#[cfg(all(target_arch = "aarch64", feature = "std"))]
{
if std::arch::is_aarch64_feature_detected!("neon") {
// Safety: We just ensured we have the NEON feature
return unsafe { c.callback(neon::Impl) };
}
}
// ...other platforms...
c.callback(scalar::Impl)
}
impl<S> hash::Hasher for RawHasher<S>
where
S: FixedBuffer,
{
#[inline]
fn write(&mut self, input: &[u8]) {
struct Cb<S, I>(S, I);
impl<S: FixedBuffer> Callback for Cb<&'_ mut RawHasher<S>, &'_ [u8]> {
type Output = ();
fn callback<V: Vector>(self, v: V) {
// code that is parameterized over the SIMD implementation
}
}
dispatch(Cb(self, input))
}
} |
Thanks! So this really leaves us with it being unsound to implement a And somehow I believe that this doesn't need any saying. This is a consequence of the semantics of |
An interesting case might be to allow dispatch-agnostic code to specialize an implementation for a specific target feature set, without having a runtime branch. For example, a library could allow users to pass in a struct as discussed here, getting the benefits of better code generation, but it may want to specialize the implementation if, say, AVX is available. If that detail is hidden from user code, I'm not sure it would be possible to do that specialization with the RFC as is it now without an additional runtime branch. If the standard library provides base structs like |
Rendered