-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add simd_extmul_* support for x64 #3084
Conversation
ba3d91f
to
71574cd
Compare
71574cd
to
6ccd1d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me with the minor comments about inlining and reordering the lowerings. @akirilov-arm, the code_translator.rs
part should look OK for the aarch64 backend, right?
insn: swiden1_high, | ||
input: 0, | ||
}, | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why swiden_input
is needed: couldn't the InsnInput
s be created inline?
dst, | ||
)); | ||
} | ||
_ => panic!("Unsupported extmul_low_signed type"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I thought that this matching was too greedy--that it would match sequences that it shouldn't be and then crash because the types are wrong. I think now that it is OK because swiden_high
only really allows the following types: I8X16
, I16X8
, I32X4
. I think it might be worthwhile to mention this assumption in some comments at the top so that future readers aren't confused.
v0 = swiden.i64
@@ -1662,7 +1662,348 @@ fn lower_insn_to_regs<C: LowerCtx<I = Inst>>( | |||
|
|||
Opcode::Imul => { | |||
let ty = ty.unwrap(); | |||
if ty == types::I64X2 { | |||
|
|||
// First check for ext_mul_* instructions. Where possible ext_mul_* lowerings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the order of the lowerings should be reversed: scalar lowerings first, then vector lowerings, then pattern matching lowerings. As it stands in this PR, every imul.i64
is going to require a bunch of matching calls and type comparisons before it ever gets lowered. I think scalar multiplication is going to be the most common, then plain vector multiplication, so I would think we would want to make those common paths shorter. (Not sure exactly how much this affects compile time but hopefully you see what I mean).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure .. I agree. That was my original thought but I never reverted the initial implementation. Upon looking at this again I see why I naturally put the extmul first. It is because when we share all these instructions with imul the current checks such as types::I64x2, and ty.lane_count() > 1 are entered before we have a chance to check for an opcode source. I think the best we can do is have a matches_input_any at the top and have a branch from there. That is what I will do before merging. I supposed if there is another/better solution we can refactor later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note, the code must remain at the top as the fall through target since I am using "if let Some() .." to check for op sources and there is no if not let Some() ..". Still there is just one branch to get to the other lowerings.
@abrown I will let @sparker-arm comment because he has done the same changes in PR #3070. |
384b69b
to
02abbe8
Compare
No description provided.