Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Struct target features RFC #3525

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

sarah-quinones
Copy link

@sarah-quinones sarah-quinones commented Nov 12, 2023

@sarah-quinones sarah-quinones changed the title Struct target features Struct target features RFC Nov 12, 2023
@ehuss ehuss added the T-lang Relevant to the language team, which will review and decide on the RFC. label Nov 12, 2023
Comment on lines +91 to +93
#[target_feature(enable = "avx")]
#[derive(Clone, Copy, Debug)]
pub struct Avx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that this must have language support to work, but it is not clear to me why this must be a language feature exposed to users? Maybe the standard library should be the one defining the interface to these structs, and hide the language feature that makes it work?

Copy link
Contributor

@Lokathor Lokathor Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same thought. When I asked on the Zulip thread for this pre-RFC/RFC I was told that such a design would essentially be fine if the target feature standard library structs could still compose into larger structs (in user code):

struct AvxFma(Avx, Fma);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, the composability idea was mentioned in the "future possibilities" of the RFC. i didn't know if it would be better to prioritize that or visibility to users

@tschuett
Copy link

As a distraction, clang and gcc support multi versioning for functions:
https://maskray.me/blog/2023-02-05-function-multi-versioning

@jedbrown
Copy link

Compilers largely don't know the semantics of ifunc and are very conservative. Ifunc defeats most interprocedural optimizations. We can see that the target_clones function foo is not inlined into foo_plus_1. Fortunately, functions called by a target_clones function are still inlinable.

An advantage of the current proposal is that the indirection/specialization can be hoisted higher without an exclusive entry point, with inlining freely supported beneath.

@joshtriplett
Copy link
Member

joshtriplett commented Mar 3, 2024

I love the concept of integrating target features with the type system. It's a creative way to solve the problem of statically guaranteeing you have already detected a given feature.

How do you anticipate this scaling for code that has a half-dozen or more different optimized versions, for a few different targets, and wants to handle both compile-time and runtime detection? Could you give a sketch of how you think such code could look?

Using a generic for the feature type avoids duplication at the call site and the called function. What I'm wondering about is how the detection scales.

Would it potentially make sense for the standard library to have a single magic fn detect_features -> impl Simd that does all the detection and returns a static but opaque type for the program to pass around?

That wouldn't preclude us from also having individual types for code that wants to statically guarantee a particular feature.

@sarah-quinones
Copy link
Author

i don't know if fn detect_features -> impl Simd would be that great of an option for the runtime detection use-case. it would have to return a type known at compile-time, so it can only enable the features that were already enabled at comptime, not those detected at runtime.

the way i envision it is scaling up in user code is something like this (simplified), which is based on the design i chose for my simd project https://docs.rs/pulp/0.18.8/pulp/trait.Simd.html

with a bit more work, the code can be made even generic over the data type, and this scales up quite well and forms the basis of faer, which is a high perf linear algebra library https://docs.rs/faer-core/0.17.1/faer_core/group_helpers/struct.SimdFor.html

this is only an example, and alternative designs may be possible. so im just sharing what already works for me

use core::arch::{Scalar, x86_64};
use core::mem::transmute;
use bytemuck::Pod;

trait F64Simd {
    type f64s: Copy + Pod; // plus any other traits a user might want
    pub fn add_f64s(self, a: Self::f64s, b: Self::f64s) -> Self::f64s;
}

impl F64Simd for Scalar {
    type f64s = f64;

	#[inline]
	pub fn add_f64s(self, a: Self::f64s, b: Self::f64s) -> Self::f64s {
		a + b
	}
}

impl F64Simd for x86_64::Avx {
    type f64s = [f64; 4];

	#[inline]
	pub fn add_f64s(self, a: Self::f64s, b: Self::f64s) -> Self::f64s {
		unsafe { x86_64::_mm256_add_pd(transmute(a), transmute(b)) }
	}
}

impl F64Simd for x86_64::Avx512f {
    type f64s = [f64; 8];

	#[inline]
	pub fn add_f64s(self, a: Self::f64s, b: Self::f64s) -> Self::f64s {
		unsafe { x86_64::_mm512_add_pd(transmute(a), transmute(b)) }
	}
}

pub fn add_comptime<S: F64Simd>(simd: S, dst: &mut [f64], a: &[f64], b: &[f64]) {
	// assume the slice length is a multiple of the register size for simplicity
	let dst = bytemuck::cast_slice_mut::<f64, S::f64s>(dst);
	let a = bytemuck::cast_slice::<f64, S::f64s>(a);
	let b = bytemuck::cast_slice::<f64, S::f64s>(b);

	for ((dst, &a), &b) in dst.iter_mut().zip(a).zip(b) {
		*dst = simd.add_f64s(a, b);
	}
}

pub fn add_runtime(dst: &mut [f64], a: &[f64], b: &[f64]) {
	if let Some(simd) = Avx512f::try_new() {
		return add_comptime(simd, dst, a, b);
	}
	if let Some(simd) = Avx::try_new() {
		return add_comptime(simd, dst, a, b);
	}
	return add_comptime(Scalar, dst, a, b);
}

@VitWW
Copy link

VitWW commented Mar 4, 2024

This RFC is based on assumption, that try_new_avx512f() would be simpler and faster then is_x86_feature_detected!("avx512f") , but would it?

@sarah-quinones
Copy link
Author

sarah-quinones commented Mar 4, 2024

i am not making that assumption. In fact Avx512f::try_new could just be implemented as if is_x86_feature_detected!("avx512f") { Some(Self) } else { None }

note that the dispatch happens outside the loop, which means that we only check the availability once before using a vectorized impl on the whole slice

if needed, the dispatch can be moved further from the inner loop if there are multiple layers and you want everything to get inlined

@veluca93
Copy link

FWIW I really like this proposal, and I think it has a lot of potential for safe low-level SIMD.

I am taking a stab at implementing it to see how the resulting code would look like, as well as possible issues (for example: how does this interact with the ABI of the function? What about function pointers?).

Will update when I have something somewhat more close to being mergeable :-)

@shepmaster
Copy link
Member

This RFC is exactly what I want, and I've even created a lot of the same structure myself by hand to get something similar in today's Rust. I have types that correspond to SIMD implementations of my algorithm's primitives that are unsafe to construct and then parameterize my algorithm with those types. As pseudocode, it looks like

trait Primitives {
    fn operation_1(&self);
    fn operation_2(&self);
}

mod scalar {
    pub struct Scalar;

    impl Primitives for Scalar {
        fn operation_1(&self) {}
        fn operation_2(&self) {}
    }
}

mod neon {
    pub struct Neon(());

    impl Neon {
        unsafe fn new_unchecked() -> Self { Self(()) }
    }

    impl Primitives for Neon {
        fn operation_1(&self) { unsafe { operation_1_neon() } }
        fn operation_2(&self) { unsafe { operation_2_neon() } }        
    }

    // Annoying
    #[target_feature(enable = "neon")]
    unsafe fn operation_1_neon(&self) {}

    // Annoying
    #[target_feature(enable = "neon")]
    unsafe fn operation_2_neon() {}
}

struct MyAwesomeHasher;

impl std::hash::Hasher for MyAwesomeHasher {
    fn write(&mut self, bytes: &[u8]) {
        // Annoying
        #[target_feature(enable = "neon")]
        unsafe fn do_neon(primitives: impl Primitives, this: &mut MyAwesomeHasher, bytes: &[u8]) {
            write_common(primitives, this, bytes)
        }

        // Annoying
        fn do_scalar(primitives: impl Primitives, this: &mut MyAwesomeHasher, bytes: &[u8]) {
            write_common(primitives, this, bytes)
        }

        if is_aarch64_feature_detected("neon") {
            unsafe { do_neon(neon::Neon::new_unchecked(), self, bytes) }
        } else {
            do_scalar(scalar::Scalar::new(), self, bytes)
        }
    }

    // Ditto all that for `Hasher::finish`
}

fn write_common(primitives: impl Primitives, bytes: &[u8]) {}

 // Assume everything has an inline on it, some are `inline(always)`.

I've annotated a few spots with "annoying" where I have to step out of my normal Rust flow and do something janky just to be able to use target_feature that is limited to a function. Having it be able to be attached to a type and then beneficially infect places it is used will be so much nicer. If I understand it well, it would shorten my code dramatically while making it look more like idiomatic Rust:

trait Primitives {
    fn operation_1(&self);
    fn operation_2(&self);
}

mod scalar {
    pub struct Scalar;

    impl Primitives for Scalar {
        fn operation_1(&self) {}
        fn operation_2(&self) {}
    }
}

mod neon {
    #[target_feature(enable = "neon")]
    pub struct Neon(());

    // Yay! Get the unsafe constructor for free

    // Yay! No longer have to pull bodies out to new functions
    impl Primitives for Neon {
        fn operation_1(&self) { }
        fn operation_2(&self) { }        
    }
}

// Yay! Now I know that my SIMD usage in my code will always be NEON or scalar or ... 
enum MyAwesomeHasher {
    Neon(MyAwesomeHasherRaw<neon::Neon>),
    Scalar(MyAwesomeHasherRaw<scalar::Scalar>),
}

impl MyAwesomeHasher {
    fn new() -> Self {
        if is_aarch64_feature_detected("neon") {
            unsafe { Self::Neon(MyAwesomeHasherRaw(neon::Neon::new_unchecked())) }
        } else {
            Self::Scalar(MyAwesomeHasherRaw(scalar::Scalar::new()))
        }
    }
}

impl std::hash::Hasher for MyAwesomeHasher {
    // Assume a delegating call in here, enum-dispatch style    
}

// Yay! I can now give callers an easy way to force a specific SIMD
// implementation.
struct MyAwesomeHasherRaw<P>(P);

impl<P: Primitive> std::hash::Hasher for MyAwesomeHasherRaw<P> {
    fn write(&mut self, bytes: &[u8]) {
        // Yay! No longer have to have the little shim functions
        // Yay! No longer have to pull bodies out to new functions
    }

    // Ditto all that for `Hasher::finish`
}

I've very excited to see how this progresses! Thank you for the awesome RFC @sarah-ek !

bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 28, 2024
Implement a first version of RFC 3525: struct target features

This PR is an attempt at implementing rust-lang/rfcs#3525, behind a feature gate `struct_target_features`.

There's obviously a few tasks that ought to be done before this is merged; in no particular order:
- add proper error messages
- add tests
- create a tracking issue for the RFC
- properly serialize/deserialize the new target_features field in `rmeta` (assuming I even understood that correctly :-))

That said, as I am definitely not a `rustc` expert, I'd like to get some early feedback on the overall approach before fixing those things (and perhaps some pointers for `rmeta`...), hence this early PR :-)

Here's an example piece of code that I have been using for testing - with the new code, the calls to intrinsics get correctly inlined:
```rust
#![feature(struct_target_features)]

use std::arch::x86_64::*;

/*
// fails to compile
#[target_feature(enable = "avx")]
struct Invalid(u32);
*/

#[target_feature(enable = "avx")]
struct Avx {}

#[target_feature(enable = "sse")]
struct Sse();

/*
// fails to compile
extern "C" fn bad_fun(_: Avx) {}
*/

/*
// fails to compile
#[inline(always)]
fn inline_fun(_: Avx) {}
*/

trait Simd {
    fn do_something(&self);
}

impl Simd for Avx {
    fn do_something(&self) {
        unsafe {
            println!("{:?}", _mm256_setzero_ps());
        }
    }
}

impl Simd for Sse {
    fn do_something(&self) {
        unsafe {
            println!("{:?}", _mm_setzero_ps());
        }
    }
}

struct WithAvx {
    #[allow(dead_code)]
    avx: Avx,
}

impl Simd for WithAvx {
    fn do_something(&self) {
        unsafe {
            println!("{:?}", _mm256_setzero_ps());
        }
    }
}

#[inline(never)]
fn dosomething<S: Simd>(simd: &S) {
    simd.do_something();
}

fn main() {
    /*
    // fails to compile
    Avx {};
    */

    if is_x86_feature_detected!("avx") {
        let avx = unsafe { Avx {} };
        dosomething(&avx);
        dosomething(&WithAvx { avx });
    }
    if is_x86_feature_detected!("sse") {
        dosomething(&unsafe { Sse {} })
    }
}
```

Tracking:

- rust-lang#129107
bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 28, 2024
Implement a first version of RFC 3525: struct target features

This PR is an attempt at implementing rust-lang/rfcs#3525, behind a feature gate `struct_target_features`.

There's obviously a few tasks that ought to be done before this is merged; in no particular order:
- add proper error messages
- add tests
- create a tracking issue for the RFC
- properly serialize/deserialize the new target_features field in `rmeta` (assuming I even understood that correctly :-))

That said, as I am definitely not a `rustc` expert, I'd like to get some early feedback on the overall approach before fixing those things (and perhaps some pointers for `rmeta`...), hence this early PR :-)

Here's an example piece of code that I have been using for testing - with the new code, the calls to intrinsics get correctly inlined:
```rust
#![feature(struct_target_features)]

use std::arch::x86_64::*;

/*
// fails to compile
#[target_feature(enable = "avx")]
struct Invalid(u32);
*/

#[target_feature(enable = "avx")]
struct Avx {}

#[target_feature(enable = "sse")]
struct Sse();

/*
// fails to compile
extern "C" fn bad_fun(_: Avx) {}
*/

/*
// fails to compile
#[inline(always)]
fn inline_fun(_: Avx) {}
*/

trait Simd {
    fn do_something(&self);
}

impl Simd for Avx {
    fn do_something(&self) {
        unsafe {
            println!("{:?}", _mm256_setzero_ps());
        }
    }
}

impl Simd for Sse {
    fn do_something(&self) {
        unsafe {
            println!("{:?}", _mm_setzero_ps());
        }
    }
}

struct WithAvx {
    #[allow(dead_code)]
    avx: Avx,
}

impl Simd for WithAvx {
    fn do_something(&self) {
        unsafe {
            println!("{:?}", _mm256_setzero_ps());
        }
    }
}

#[inline(never)]
fn dosomething<S: Simd>(simd: &S) {
    simd.do_something();
}

fn main() {
    /*
    // fails to compile
    Avx {};
    */

    if is_x86_feature_detected!("avx") {
        let avx = unsafe { Avx {} };
        dosomething(&avx);
        dosomething(&WithAvx { avx });
    }
    if is_x86_feature_detected!("sse") {
        dosomething(&unsafe { Sse {} })
    }
}
```

Tracking:

- rust-lang#129107
bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 28, 2024
Implement a first version of RFC 3525: struct target features

This PR is an attempt at implementing rust-lang/rfcs#3525, behind a feature gate `struct_target_features`.

There's obviously a few tasks that ought to be done before this is merged; in no particular order:
- add proper error messages
- add tests
- create a tracking issue for the RFC
- properly serialize/deserialize the new target_features field in `rmeta` (assuming I even understood that correctly :-))

That said, as I am definitely not a `rustc` expert, I'd like to get some early feedback on the overall approach before fixing those things (and perhaps some pointers for `rmeta`...), hence this early PR :-)

Here's an example piece of code that I have been using for testing - with the new code, the calls to intrinsics get correctly inlined:
```rust
#![feature(struct_target_features)]

use std::arch::x86_64::*;

/*
// fails to compile
#[target_feature(enable = "avx")]
struct Invalid(u32);
*/

#[target_feature(enable = "avx")]
struct Avx {}

#[target_feature(enable = "sse")]
struct Sse();

/*
// fails to compile
extern "C" fn bad_fun(_: Avx) {}
*/

/*
// fails to compile
#[inline(always)]
fn inline_fun(_: Avx) {}
*/

trait Simd {
    fn do_something(&self);
}

impl Simd for Avx {
    fn do_something(&self) {
        unsafe {
            println!("{:?}", _mm256_setzero_ps());
        }
    }
}

impl Simd for Sse {
    fn do_something(&self) {
        unsafe {
            println!("{:?}", _mm_setzero_ps());
        }
    }
}

struct WithAvx {
    #[allow(dead_code)]
    avx: Avx,
}

impl Simd for WithAvx {
    fn do_something(&self) {
        unsafe {
            println!("{:?}", _mm256_setzero_ps());
        }
    }
}

#[inline(never)]
fn dosomething<S: Simd>(simd: &S) {
    simd.do_something();
}

fn main() {
    /*
    // fails to compile
    Avx {};
    */

    if is_x86_feature_detected!("avx") {
        let avx = unsafe { Avx {} };
        dosomething(&avx);
        dosomething(&WithAvx { avx });
    }
    if is_x86_feature_detected!("sse") {
        dosomething(&unsafe { Sse {} })
    }
}
```

Tracking:

- rust-lang#129107
github-actions bot pushed a commit to rust-lang/miri that referenced this pull request Aug 29, 2024
Implement a first version of RFC 3525: struct target features

This PR is an attempt at implementing rust-lang/rfcs#3525, behind a feature gate `struct_target_features`.

There's obviously a few tasks that ought to be done before this is merged; in no particular order:
- add proper error messages
- add tests
- create a tracking issue for the RFC
- properly serialize/deserialize the new target_features field in `rmeta` (assuming I even understood that correctly :-))

That said, as I am definitely not a `rustc` expert, I'd like to get some early feedback on the overall approach before fixing those things (and perhaps some pointers for `rmeta`...), hence this early PR :-)

Here's an example piece of code that I have been using for testing - with the new code, the calls to intrinsics get correctly inlined:
```rust
#![feature(struct_target_features)]

use std::arch::x86_64::*;

/*
// fails to compile
#[target_feature(enable = "avx")]
struct Invalid(u32);
*/

#[target_feature(enable = "avx")]
struct Avx {}

#[target_feature(enable = "sse")]
struct Sse();

/*
// fails to compile
extern "C" fn bad_fun(_: Avx) {}
*/

/*
// fails to compile
#[inline(always)]
fn inline_fun(_: Avx) {}
*/

trait Simd {
    fn do_something(&self);
}

impl Simd for Avx {
    fn do_something(&self) {
        unsafe {
            println!("{:?}", _mm256_setzero_ps());
        }
    }
}

impl Simd for Sse {
    fn do_something(&self) {
        unsafe {
            println!("{:?}", _mm_setzero_ps());
        }
    }
}

struct WithAvx {
    #[allow(dead_code)]
    avx: Avx,
}

impl Simd for WithAvx {
    fn do_something(&self) {
        unsafe {
            println!("{:?}", _mm256_setzero_ps());
        }
    }
}

#[inline(never)]
fn dosomething<S: Simd>(simd: &S) {
    simd.do_something();
}

fn main() {
    /*
    // fails to compile
    Avx {};
    */

    if is_x86_feature_detected!("avx") {
        let avx = unsafe { Avx {} };
        dosomething(&avx);
        dosomething(&WithAvx { avx });
    }
    if is_x86_feature_detected!("sse") {
        dosomething(&unsafe { Sse {} })
    }
}
```

Tracking:

- rust-lang/rust#129107
@veluca93
Copy link

Adding some comments / proposed changes to the RFC text coming from implementation experience (see rust-lang/rust#129881 (comment)):

  • Annotations can be added on non-unit structs: this allows i.e. to make wrapper around vector types that carry with them the required features
  • Similarly to target feature 1.1, constructing a target-feature struct is not unsafe if the function it is being constructed in has already the required features.
  • Not just functions accepting T inherit the corresponding features, but so do functions accepting &T, &mut T, &&T, etc. This is partially what is proposed in the "Future Possibilities" of this RFC; after a few conversations, I convinced myself that recurring into arbitrary types would potentially be confusing, so I propose not to go for this option at the moment.

I'd be willing to update the RFC to reflect these changes, but I am not sure what's the best way to do so.
It may also be helpful to wait until the implementation is (more) complete to see if other changes would be useful...

@sarah-quinones
Copy link
Author

@veluca93

im fine with updating the rfc in the next few days.

allowing annotations on non-unit structs and compatibility with target feature 1.1 sound good to me

if recursing into arbitrary types isn't required, then i think we can also skip &T and &mut T for now. since these types are likely getting passed by value anyway. what do you think?

@veluca93
Copy link

@veluca93

im fine with updating the rfc in the next few days.

allowing annotations on non-unit structs and compatibility with target feature 1.1 sound good to me

if recursing into arbitrary types isn't required, then i think we can also skip &T and &mut T for now. since these types are likely getting passed by value anyway. what do you think?

I don't have a super strong opinion, but allowing &T allows things like using &self as a receiver on such types, which in turn allows them to be non-copy, which might be nice, in particular if the annotated structs are not unit and carry some nontrivial semantic meaning.

Another thing that came up in implementation: it makes life a lot easier to require that functions that get features from their arguments be annotated with #[target_feature(from_args)]. It also makes the code a bit more explicit, which IMO is nice.

@sarah-quinones
Copy link
Author

sounds good! i'll update the wording with that in mind

@programmerjake
Copy link
Member

Another thing that came up in implementation: it makes life a lot easier to require that functions that get features from their arguments be annotated with #[target_feature(from_args)]. It also makes the code a bit more explicit, which IMO is nice.

imo it would have much nicer ergonomics to not require a #[target_feature(from_args)] annotation, since it allows you to just use types without having to worry about how they actually work internally since I expect that in a lot of instances the types enabling target features is more of an implementation detail, e.g.:

// the user shouldn't need to care that `FastVec` enables a target feature using `OpaqueTag`
// to enable running the arithmetic quickly and still have decent inlining
fn do_some_math<OpaqueTag>(v: FastVec<f32, OpaqueTag>, v2: FastVec<f32, OpaqueTag>) -> f32 {
    (v * v2).reduce_sum()
}

@veluca93
Copy link

Another thing that came up in implementation: it makes life a lot easier to require that functions that get features from their arguments be annotated with #[target_feature(from_args)]. It also makes the code a bit more explicit, which IMO is nice.

imo it would have much nicer ergonomics to not require a #[target_feature(from_args)] annotation, since it allows you to just use types without having to worry about how they actually work internally since I expect that in a lot of instances the types enabling target features is more of an implementation detail, e.g.:

// the user shouldn't need to care that `FastVec` enables a target feature using `OpaqueTag`
// to enable running the arithmetic quickly and still have decent inlining
fn do_some_math<OpaqueTag>(v: FastVec<f32, OpaqueTag>, v2: FastVec<f32, OpaqueTag>) -> f32 {
    (v * v2).reduce_sum()
}

I agree that not requiring an attribute might be more ergonomic. However, implementation-wise, I believe there's some consensus that doing so would make MIR inlining of generics pretty much unfeasible, so that seems like a hard sell.

Moreover, there are some concerns about how "magically" adding target features might break inlining of the feature'd function into other functions, causing surprising performance issues, which seem to me a reasonable argument towards being more explicit even without the MIR inlining issues.

@sarah-quinones
Copy link
Author

changes:

changed implicit feature inheritance to opt-in with #[target_feature(inherit)]

this can also be used on structs and tuple structs to inherit the target features of their members

it's not clear to me if references should be implicitly inheriting the features, as they may potentially point to uninit.

@calebzulawski
Copy link
Member

Because of issues like rust-lang/rust#116558, I think it's dangerous to implicitly add target features to a function based on user-defined structs without an attribute making it clear that the target features are changing. I think the indirection involved in creating compositions of these structs is a bit risky too. Maybe this RFC should explore what it would take to have those structs defined by std, or justify why that isn't possible.

@veluca93
Copy link

Because of issues like rust-lang/rust#116558, I think it's dangerous to implicitly add target features to a function based on user-defined structs without an attribute making it clear that the target features are changing. I think the indirection involved in creating compositions of these structs is a bit risky too. Maybe this RFC should explore what it would take to have those structs defined by std, or justify why that isn't possible.

Note that those issues are getting fixed -- see rust-lang/rust#127731.
At the moment this RFC proposes that an attribute always be used anyway, which should make things clear enough.

@calebzulawski
Copy link
Member

calebzulawski commented Oct 21, 2024

Because of issues like rust-lang/rust#116558, I think it's dangerous to implicitly add target features to a function based on user-defined structs without an attribute making it clear that the target features are changing. I think the indirection involved in creating compositions of these structs is a bit risky too. Maybe this RFC should explore what it would take to have those structs defined by std, or justify why that isn't possible.

Note that those issues are getting fixed -- see rust-lang/rust#127731. At the moment this RFC proposes that an attribute always be used anyway, which should make things clear enough.

It may in fact be fixed in rust-calling-rust, but you may still care about what target features are being used for any variety of reasons.

The attribute seems to be presented as an alternative currently, not as part of the main proposal:

This RFC suggests that all of the input parameters of times_two are scanned during monomorphization, and target features are inherited from them appropriately. The alternative is to explicitly mark which parameters times_two is allowed to inherit target features from. Perhaps through the use of a second attribute.
...
It is not clear if there are any advantages to this approach, other than being more explicit.

Either way, I think having the types part of std is generally more approachable, and this RFC should at least address that

@sarah-quinones
Copy link
Author

having the base types be part of std is possible, but that would still require composability to be possible on the user side if we want people to be able to mix and match, which is needed for non trivial simd stuff

the target feature inheritance is opt-in in the current version. the proposed alternative is to further narrow that down by specifying which parameters the features are inherited from

@sarah-quinones
Copy link
Author

the composition is also opt-in, for what it's worth

@sarah-quinones
Copy link
Author

ah damn, i forgot to update the link

one sec

@sarah-quinones
Copy link
Author

should be good now

@shepmaster
Copy link
Member

I pulled down the in-progress PR and tried it against my code. Things worked out nicely and I was able to remove some boilerplate / levels of indirection by inlining what were previously free functions back into trait methods.

I had a few points of feedback. I'm not sure if these are more relevant to the RFC or the PR, but I'll put them here to be on the safe side:

  • I think it's considered good style to include a # Safety section in the documentation for an unsafe function that explains how to use the function safely. When I added #[target_feature(enable = ...)] to a unit struct, I was originally confused about where I should put that documentation as I no longer had a function. I wondered if the compiler would put it somewhere for me, then eventually (while writing this comment) realized I should put it on the struct's documentation myself.
  • The usage of #[target_feature(from_args)] worked well for me as it's a little breadcrumb in the safety chain (and works well with my usages of deny(unsafe_op_in_unsafe_fn)). However, I wonder if it should be even more precise and specify exactly which argument(s) are supposed to participate. For example: #[target_feature(from_args(self))] or #[target_feature(from_args(lhs, rhs))].
  • I'm not clear about if a reference to a target-feature structure counts as opting in, but I would like it to. That is, I hope that #[target_feature(from_args)] fn foo(_: Avx) is treated the same as #[target_feature(from_args)] fn foo(_: &Avx).

Thanks for the good work!

@programmerjake
Copy link
Member

it's not clear to me if references should be implicitly inheriting the features, as they may potentially point to uninit.

I think we should expect references to point to a valid instance of the pointee type; just because that may not be a validity requirement shouldn't mean it can't be a safety requirement -- aka. if your function accepts references to invalid instances of a type (assuming those references aren't instantly UB to even create), your function needs to be unsafe (otherwise you get UB in safe code) and explicitly document that it accepts invalid references.
We can just say it's your fault for unsafely creating a bad reference in the first place if you then pass it to a function that enables target features based on that reference and those target features aren't actually available at runtime.

@ia0
Copy link

ia0 commented Oct 22, 2024

your function needs to be unsafe

Small detail, this makes the function robust not unsafe (see the unsafe mental model).

(otherwise you get UB in safe code)

You don't really get UB in safe code, because safe code can't create an unsafe value (here a reference pointing to an invalid value), even though calling the function is safe, you still call it with an unsafe value. And whoever produced this unsafe value had to make sure all safe code manipulating this value, does so without UB.

We can just say it's your fault for unsafely creating a bad reference in the first place if you then pass it to a function that enables target features based on that reference and those target features aren't actually available at runtime.

Yes, that's one option: it is UB to pass an unsafe value to a function that enables target features based on that value.

Another option could be to simply only enable target features if the struct is passed by value. No nesting allowed (including below references or as a field of a bigger type). Do we have motivating examples why this would be restrictive?

@veluca93
Copy link

The main example would be to let an hypothetical AvxU32Vector implement std::ops::PartialCmp.

@shepmaster
Copy link
Member

Do we have motivating examples why this would be restrictive?

The reason I brought it up was to be able to write code along these lines:

fn dispatch<R>(f: impl FnOnce(&dyn Vector) -> R) -> R {
    #[cfg(all(target_arch = "aarch64", feature = "std"))]
    {
        if std::arch::is_aarch64_feature_detected!("neon") {
            // Safety: We just ensured we have the NEON feature
            return unsafe { f(&neon::Impl) };
        }
    }

    // ...other platforms...

    f(&scalar::Impl)
}

impl<S> hash::Hasher for RawHasher<S>
where
    S: FixedBuffer,
{
    #[inline]
    fn write(&mut self, mut input: &[u8]) {
        dispatch_f(|vector| { /* code that is parameterized over the SIMD implementation */ })
    }
}

If non-lifetime binders stabilizes and allows writing something like fn dispatch(f: impl for<V: Vector> FnOnce(V)), then I wouldn't care as much. Today, I'm restricted to either using a macro (my deployed solution) or write dispatch taking &dyn Vector (the code above).

@programmerjake
Copy link
Member

programmerjake commented Oct 22, 2024

fn dispatch<R>(f: impl FnOnce(&dyn Vector) -> R) -> R {

I expect that dyn will defeat target features based on the concrete type.

It could be done using:

trait Callback {
    type Output;
    fn callback<V: Vector>(self, v: V) -> Self::Output;
}

fn dispatch<R>(c: impl Callback<Output = R>) -> R {
    #[cfg(all(target_arch = "aarch64", feature = "std"))]
    {
        if std::arch::is_aarch64_feature_detected!("neon") {
            // Safety: We just ensured we have the NEON feature
            return unsafe { c.callback(neon::Impl) };
        }
    }

    // ...other platforms...

    c.callback(scalar::Impl)
}

impl<S> hash::Hasher for RawHasher<S>
where
    S: FixedBuffer,
{
    #[inline]
    fn write(&mut self, input: &[u8]) {
        struct Cb<S, I>(S, I);
        impl<S: FixedBuffer> Callback for Cb<&'_ mut RawHasher<S>, &'_ [u8]> {
            type Output = ();
            fn callback<V: Vector>(self, v: V) {
                // code that is parameterized over the SIMD implementation
            }
        }
        dispatch(Cb(self, input))
    }
}

@ia0
Copy link

ia0 commented Oct 22, 2024

Thanks! So this really leaves us with it being unsound to implement a #[target_feature(inherit)] function accepting unsafe values for its #[target_feature] parameters.

And somehow I believe that this doesn't need any saying. This is a consequence of the semantics of #[target_feature(inherit)] (which could/should have opsem support). We could make it explicit in some documentation that a #[target_feature(inherit)] function must only take safe values for its #[target_feature] argument, but this is theoretically not needed (not a rule, just a deduction).

@tomcur
Copy link

tomcur commented Dec 17, 2024

An interesting case might be to allow dispatch-agnostic code to specialize an implementation for a specific target feature set, without having a runtime branch. For example, a library could allow users to pass in a struct as discussed here, getting the benefits of better code generation, but it may want to specialize the implementation if, say, AVX is available. If that detail is hidden from user code, I'm not sure it would be possible to do that specialization with the RFC as is it now without an additional runtime branch.

If the standard library provides base structs like Sse2, Avx, those could provide safe methods like Avx::try_from_target_features<T>(value: T) -> Option<Self>. After monomorphization and with sufficiently aware optimizing, it is in principle statically known whether that's Some or None, allowing the library to specialize for some subset of features without runtime branching. I think the method could in principle be const.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.