-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add a scalable representation to allow support for scalable vectors #3268
base: master
Are you sure you want to change the base?
Conversation
I think a more general definition of an "opaque" type would be useful. This is a type which can exist in a register but not in memory, specifically:
Other that ARM and RISC-V scalable vectors, this would also be useful to represent reference types in WebAssembly. These are opaque references to objects which can only be used as local variables or function arguments and can't be written to WebAssembly memory. |
ARM SVE uses
|
I noticed that seeing the vector length pseudoregister at runtime was considered undefined behavior. For RISC-V, rather than masking out elements that aren't used, it seems to primarily focus on setting the VL register, which is an actual register that needs to be modified when switching between different vector types. It also let's you change the actual "register size" by grouping together multiple physical registers, which is used either to save instructions or to facilitate type conversions. (ie casting from a u16 vector to a u32 vector puts the result across 2 contiguous vector registers, which can then be used as though they're one register.) |
@boomshroom "That vscale is constant -- that the number of elements in a scalable vector does not change during program execution -- is baked into the accepted scalable vector type proposal from top to bottom and in fact was one of the conditions for its acceptance" - https://lists.llvm.org/pipermail/llvm-dev/2019-October/135560.html It might just be a case of changing the wording so that it's more clear that causing @Amanieu Just to be clear though, are you asking me to transform this into a more general RFC for opaque types, or just mention them? |
ARM offers ACLEs, which can read the vscale. I have an array of floats, then I read them with ACLE SVE. Do SVE types ever exist in memory or only in registers? |
I don't think this needs to be a general RFC on opaque types, but more details on how scalable vectors differ from normal types would be nice to have. |
There are SVE registers. The calling convention can probably pass scalable vectors on the stack. Then it will be vscale * 1 bytes. It has to be a fixed size. |
If you have too much time, you can actually play with a SVE box: |
One selling point of SVE is: if you use ARM ACLE SVE intrinsics and you follow the rules, then your program will run on 256-bit and 2048-bit hardware. ARM SVE are plain Cray vectors. I believe the RISC-V scalable vectors are more elaborate. |
I'm honestly a bit confused by this RFC. I understand the benefits of SVE and what it is, but I'm not 100% sure what it's asking. Specifically, it seems like it's suggesting stabilising Like, I'm sold on the idea of having scalable vectors in stdlib, but unsure about both what the RFC is proposing, and the potential implementation. |
> wc -l arm_sve.h
24043 arm_sve.h |
@Amanieu Mostly agree with #3268 (comment), just had a couple notes:
|
@tschuett This is an RFC, not IRC. Please only leave productive comments that advance the state of the conversation instead of non-contributing allusions that have no clear meaning. I can't even tell if your remark is critical or supportive. |
Sorry for my misbehaviour. I am supportive of adding scalable vectors to Rust. Because of type inference you cannot see that the |
The real questions is whether you want to make scalable vectors target-dependent (SVE, RISC-V). |
Imho scalable vectors should be target independent, the compiler backend will simply pick a suitable constant for vscale at compile time if not otherwise supported. |
Note that vscale is a LLVM thing and should not be part of the RFC. LLVM assumes the vscale is an unknown but constant value during the execution of the program. The real value is hardware dependent. |
I think it should not be dismissed just because it's a LLVM thing: every other compiler will have a similar constant simply because they need to represent scalable vectors as some multiple of an element count, that multiple is vscale. Also, there should be variants for vectors like llvm's https://reviews.llvm.org/D53695
|
Do you want to expose this in Rust or should it be a an implementation detail of the compiler? |
imho @rust-lang/project-portable-simd should expose scalable vector types with vscale, an additional multiplier, and an element type -- perhaps by exposing a wrapper struct that also contains the number of valid elements (like |
One important thing that imho this RFC needs to be usable by portable-simd is for the element type and the multiplier to be able to be generics: #[repr(simd, scalable(MUL))]
struct ScalableVector<T, const MUL: usize>([T; 0]); portable-simd's exposed wrapper type might be: pub struct ScalableSimd<T, const MUL: usize>
where
T: ElementType,
ScalableMul<MUL>: SupportedScalableMul,
{
len: u32, // exposed as usize, but realistically u32 is big enough
value: ScalableVector<T, MUL>,
} |
How about this notation (without the 4): #[repr(simd, scalable)]
#[derive(Clone, Copy)]
pub struct svfloat32_t {
_ty: [f32; 0],
} It is a target-indent scalable vector of |
@tschuett My intention was that the feature proposed by this RFC would be target independent, and the rustc implementation would be target independent. |
Honestly my RISC-V knowledge is limited. If you say that I agree with your vscale vector examples. Maybe you can query LLVM for information about targets. |
For reference, IBM is also working on a scalable vector ISA: |
`Sized` (or both). Once returning of unsized is allowed this part of the rule | ||
would be superseded by that mechanism. It's worth noting that, if any other | ||
types are created that are `Copy` but not `Sized` this rule would apply to | ||
those. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember that Rust has generics, so I can e.g. write a function fn foo<T: Copy>(x: &T) -> T
. The RFC seems to say this is allowed, because the return type is Copy
. But for most types T
and most ABIs this can't be implemented.
You can't just say in a sentence that you allow unsized return values. That's a major language feature that needs significant design work on its own.
I think what you actually want is some extremely special cases where specifically these scalable vector types are allowed as return values, but in a non-compositional way. There is no precedent for anything like this in Rust so it needs to be fairly carefully described and discussed.
… r=Amanieu Stabilize Ratified RISC-V Target Features Stabilization PR for the ratified RISC-V target features. This stabilizes some of the target features tracked by #44839. This is also a part of #114544 and eventually needed for the RISC-V part of rust-lang/rfcs#3268. There is a similar PR for the the stdarch crate which can be found at rust-lang/stdarch#1476. This was briefly discussed on Zulip (https://rust-lang.zulipchat.com/#narrow/stream/250483-t-compiler.2Frisc-v/topic/Stabilization.20of.20RISC-V.20Target.20Features/near/394793704). Specifically, this PR stabilizes the: * Atomic Instructions (A) on v2.0 * Compressed Instructions (C) on v2.0 * ~Double-Precision Floating-Point (D) on v2.2~ * ~Embedded Base (E) (Given as `RV32E` / `RV64E`) on v2.0~ * ~Single-Precision Floating-Point (F) on v2.2~ * Integer Multiplication and Division (M) on v2.0 * ~Vector Operations (V) on v1.0~ * Bit Manipulations (B) on v1.0 listed as `zba`, `zbc`, `zbs` * Scalar Cryptography (Zk) v1.0.1 listed as `zk`, `zkn`, `zknd`, `zkne`, `zknh`, `zkr`, `zks`, `zksed`, `zksh`, `zkt`, `zbkb`, `zbkc` `zkbx` * ~Double-Precision Floating-Point in Integer Register (Zdinx) on v1.0~ * ~Half-Precision Floating-Point (Zfh) on v1.0~ * ~Minimal Half-Precision Floating-Point (Zfhmin) on v1.0~ * ~Single-Precision Floating-Point in Integer Register (Zfinx) on v1.0~ * ~Half-Precision Floating-Point in Integer Register (Zhinx) on v1.0~ * ~Minimal Half-Precision Floating-Point in Integer Register (Zhinxmin) on v1.0~ r? `@Amanieu`
I wonder if the proposal for "claimable" types with automatic claim can be used to overcome the issue of |
The current plan in the implementation PR (rust-lang/rust#118917) is for scalable vector types to not implement either My understanding is that this RFC is going to be rewritten to match the new implementation plan. |
That sounds potentially quite hacky... but in the end it'll be up to @rust-lang/types to decide whether that is acceptable. An interesting part of this will be properly working out the MIR semantics, ideally by implementing them in the interpreter. |
Citing myself from a comment on the draft PR (rust-lang/rust#118917 (comment)):
|
Could someone explain the key difference between ARM SVE and RISC-V Vectors here? |
From the language's point of view they can be treated mostly identically. The platform-specific intrinsics expose types whose size is only known at runtime and can be computed by reading a CPU register ( |
In eddy's comment it sounded like there would be some relevant difference, that's why I was asking.
Also to be clear, this is not a real register, right? It is a per-CPU constant that can be inspected?
|
On RISC-V there is the read only |
To clarify, on RISC-V the size of vector registers is indicated by
I hope this clarifies some of the confusion around vector lengths in RISC-V. |
That sounds like ARM and RISCV are very similar. So I don't understand which concern eddyb was referring to.
|
Wait, so they exposed types to C whose implicit copies can cause the equivalent of context-save/restore operations? I wasn't even aware there were (from https://github.com/riscvarchive/riscv-v-spec/releases/download/v1.0/riscv-v-spec-1.0.pdf)
On wide enough vector machines, accidentally triggering this functionality seems to be able to easily increase the amount of data transferred by e.g. an order of magnitude. That said, I concede RVV's To turn this around, you could argue you can do the same thing on x86: run Anyway, it's a neat trick, and if C intrinsics used it first that removes a lot of non-language-design concerns (like what values are not allowed to change during program execution, or between its threads etc.), I'm just mildly skeptical it's also a good idea without e.g. separate types for "SSA-only (also just saw rust-lang/rust#46571 (comment) which talks about even allowing |
@eddyb sorry I am completely confused by this comment. Could you take 5 steps back and explain what this means for Rust with some context? :) |
Say you have a RVV implementation with a register file comparable to AVX512 ( The main benefit of the dynamic vector length is that you can do very simple vector loops (without scalar versions of those loops for elements that "don't fit", like SIMD architectures tend to need). However, this is only a clear win (over a scalar loop) if e.g. The fact (unknown to me before today) that the C types you need to use to refer to the hardware registers (i.e. connecting vector-producing instructions to vector-consuming instructions), can also be stored to memory, means that, with a Rust version of those types and intrinsics, you're one implicit borrow away (e.g. a And this gets worse the larger the implementation choice of vector width, even if appropriate usage (keeping everything in registers) doesn't see a penalty. I don't expect such accidents in C functions like Arguably this is still an issue even when most/all of the width of the hardware registers is in use, but in a different scenario, e.g. if you have some arithmetic-heavy computation that can remain in registers (for a lot of operations, relative to initial/final memory accesses), that's very different from causing additional memory traffic e.g. for each of those arithmetic operations (OTOH, that's closer to how we optimize local variables to use registers instead of the stack, so it might not be a realistic concern). If you're familiar with Then the "the C intrinsics types can also be stored to memory" choice is like implementing The context-switching aspect I mentioned can be useful on its own, in the sense that the kernel doesn't have to care about how you were using the register file, and can just save & restore it in bulk (at some cost, but context-switching is already not "free", and ideally not something a thread willingly triggers itself often). Anyway, I really don't have a strong preference here, and besides being surprised by the RVV C intrinsics deciding to include a footgun (which is the only reason I had to look up the RVV specs and reply at all), I had mostly hoped to end up with some kind of solution that could also cover functionality that can't fall back to some excessive worst-case memory size (such as wasm If that plan goes forward as described (and if I've understood everything correctly, as a bystander), "load/exec/run-time constant" (or w/e a good name for the concept is) means:
I really didn't plan to write more than a couple sentences about any of this, and I'm sure there's far more qualified people to discuss ARM SVE and RVV with, feel free to hide my comments if that makes more sense etc. |
That was quite helpful, thanks. :) (The RFC sadly lacks a lot of context, but I already left comments about that a while ago.) |
As an aside: Actually in RISC-V one can choose to combine up to 8 registers into register groups on which the individual instructions operate, so if |
I was wondering about LMUL (as one more factor in making the "shape" of the active vector register file dynamic), but didn't mention it since
Ohh, I see, so in your example, the relevant types behave (memory-wise) like That's arguably another layer of surprise, since my expectation was that LMUL applied to operations, not types, but realistically, at the programming language level, I used to think misuse of such "your non-dependent typesystem cannot express the runtime relationships" intrinsics would result in (unfortunate) late compilation errors (comparable to some aspects of inline (While you can use something like |
But the summary is, "a type whose size is a runtime constant determined when the program starts" (and references to this type can be thin) is sufficient and reasonable for both the ARM and RISC-V variant of scalable vectors? |
that's how I understood it. So imo an MVP could be to not allow them on the stack at all, but require them to be heap allocated for now. This likely defeats their purpose as every operation first needs to move it into registers from the heap, but it's an incremental way forward that requires little language features. #![feature(extern_types)]
extern {
type ScalableVector;
}
impl ScalableVector {
pub fn len() -> usize {
/* sufficiently advanced assembly */
}
pub fn new(data: Vec<f32>) -> Box<dyn Self> {
unsafe { Box::from_raw(Box::into_raw(data.into_boxed_slice()) as *mut f32 as *mut dyn Self) }
}
} Ideally we'd have a central place that defines |
well, as long as no one makes a processor where some cores have different vlen than others...like what happened with icache line size, causing lots of pain for Mono |
Sufficient? Yes. Whether "necessary" or "reasonable" is debatable (arguably not necessary if most usage can avoid memory altogether, i.e. only appear in expression output and local variable types). If nothing else works, sure, it's an "acceptable compromise" at best.
This feels very strange from the "these are not real data types, just abstract tokens to connect operations and allow the backend to do register allocation etc." point of view I started from, but I could see how one could arrive there starting at "these are weird In practice, heap allocation doesn't really make sense to bring up, and making
I've brought this up already, TL;DR is that C intrinsics already requiring |
well, we can either
This way we can start out either with the always-heap version or the always-externref version and then figure out the other side. The backend can choose to spill registers to stack if it wants, but that's a backend thing we don't have to care about. Having an explicit separation allows us to avoid the weird semantics of "externref up to a point then randomly you can take addresses of it". |
We could also say this type cannot instantiate generic parameters, so its odd nature cannot cause trouble in unsuspecting code. For only using it inside computations that should be enough?
|
I'm sorry for stirring this back up just days too early, I suggest interested parties wait for @JamieCunliffe and/or @davidtwco to announce the next version of this RFC, or at least something more concrete. |
This comment was marked as off-topic.
This comment was marked as off-topic.
We’ve opened #3729 that would provide a way for scalable vectors to work without special cases in the type system. |
Strong +1 to this approach as it pertains to implementation of scalable vectors. |
A proposal to add an additional representation to be used with
simd
to allow for scalable vectors to be used.Rendered