Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow SIMD-returning calls as arguments #74184

Merged
merged 2 commits into from
Aug 24, 2022

Conversation

SingleAccretion
Copy link
Contributor

@SingleAccretion SingleAccretion commented Aug 18, 2022

As of this change we handle all relevant ABI scenarios.

  1. Windows x64:
    • SIMD8: returned and passed as TYP_LONG, fine.
    • SIMD12 / SIMD16 / SIMD32: returned and passed via a return buffer, fine.
  2. Unix x64:
    • SIMD8: returned and passed in one FP register, fine.
    • SIMD12 / SIMD16, Vector4: returned and passed in two FP registers, fine.
    • SIMD16, Vector128 / SIMD32: returned via a return buffer, passed on stack, fine.
  3. x86:
    • SIMD8: can be returned via two registers or a return buffer (and is always passed on stack), both are fine.
    • SIMD12/SIMD16/SIMD32: returned via a return buffer, passed on stack, fine.
  4. ARM64:
    • SIMD8, Vector2: returned in two FP registers (and passed as such or TYP_LONG under Windows varargs), fine.
    • SIMD8, Vector64: returned in one FP register, can be passed as such or as TYP_LONG under Windows varargs.
      The latter case is now handled correctly in Lowering::LowerArg.
    • SIMD12: returned in three FP registers, passed as such or in two integer registers under Windows varargs, fine.
    • SIMD16, Vector4: returned in four FP registers, passed as such, or in two integer registers under Windows varargs, fine.
    • SIMD16, Vector128: returned in one FP register, passed as such, or in two integer registers under Windows varargs, fine
      (morph will decompose the varargs case into a FIELD_LIST via a temp).

Fixes #74126.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 18, 2022
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Aug 18, 2022
@ghost
Copy link

ghost commented Aug 18, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

As of this change we handle all relevant ABI scenarios.

  1. Windows x64:
    • SIMD8: returned and passed as TYP_LONG, fine.
    • SIMD12 / SIMD16 / SIMD32: returned and passed via a return buffer, fine.
  2. Unix x64:
    • SIMD8: returned and passed in one FP register, fine.
    • SIMD12 / SIMD16, Vector4: returned and passed in two FP registers, fine.
    • SIMD16, Vector128 / SIMD32: returned and passed via a return buffer, fine.
  3. x86:
    • SIMD8: can be returned via two registers or a return buffer (and is always passed on stack), both are fine.
    • SIMD12/SIMD16/SIMD32: returned and passed via a return buffer, fine.
  4. ARM64:
    • SIMD8, Vector2: returned in two FP registers (and passed as such or TYP_LONG under Windows varargs), fine.
    • SIMD8, Vector64: returned in one FP register, can be passed as such or as TYP_LONG under Windows varargs.
      The latter case is now handled correctly in Lowering::LowerArg.
    • SIMD12: returned in three FP registers, passed as such or in two integer registers under Windows varargs, fine.
    • SIMD16, Vector4: returned in four FP registers, passed as such, or in two integer registers under Windows varargs, fine.
    • SIMD16, Vector128: returned in one FP register, passed as such, or in two integer registers under Windows varargs, fine
      (morph will decompose the varargs case into a FIELD_LIST via a temp).

Fixes #74126.

Author: SingleAccretion
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

As of this change we handle all relevant ABI scenarios.

1) Windows x64:
   - SIMD8: returned and passed as "TYP_LONG", fine.
   - SIMD12 / SIMD16 / SIMD32: returned and passed via a return buffer, fine.
2) Unix x64:
   - SIMD8: returned and passed in one FP register, fine.
   - SIMD12 / SIMD16, Vector4: returned and passed in two FP registers, fine.
   - SIMD16, Vector128 / SIMD32: returned and passed via a return buffer, fine.
3) x86:
   - SIMD8: can be returned via two registers or a return buffer (and is always passed on stack), both are fine.
   - SIMD12/SIMD16/SIMD32: returned via a return buffer, passed on stack, fine.
4) ARM64:
   - SIMD8, Vector2: returned in two FP registers (and passed as such or "TYP_LONG" under Windows varargs), fine.
   - SIMD8, Vector64: returned in one FP register, can be passed as such or as "TYP_LONG" under Windows varargs.
     The latter case is now handled correctly in "Lowering::LowerArg".
   - SIMD12: returned in three FP registers, passed as such or in two integer registers under Windows varargs, fine.
   - SIMD16, Vector4: returned in four FP registers, passed as such, or in two integer registers under Windows varargs, fine.
   - SIMD16, Vector128: returned in one FP register, passed as such, or in two integer registers under Windows varargs, fine
     (morph will decompose the varargs case into a `FIELD_LIST` via a temp).
@SingleAccretion
Copy link
Contributor Author

SPMI failure is pre-existing (empty OSX ARM64 contexts?), Installer Build and Test coreclr Linux_musl_arm64 Debug failure is infrastructure-related:

git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --prune-tags --progress --no-recurse-submodules origin --depth=20 +2cbdbb3a894090a60d09337496df7943a38a10ab
error: RPC failed; curl 18 transfer closed with outstanding read data remaining
fatal: expected flush after ref listing

No diffs.

I am assuming we'll want to backport this to 7.0.

@JulieLeeMSFT
Copy link
Member

@BruceForstall, please triage a milestone.

@BruceForstall BruceForstall added this to the 7.0.0 milestone Aug 22, 2022
@BruceForstall
Copy link
Member

cc @tannergooding @dotnet/jit-contrib

@tannergooding
Copy link
Member

tannergooding commented Aug 22, 2022

Windows x64:
SIMD8: returned and passed as TYP_LONG, fine.
SIMD12 / SIMD16 / SIMD32: returned and passed via a return buffer, fine.

Just noting that SIMD16/SIMD32 being returned via an output buffer is an "it depends" scenario.

While Vector4 is a TYP_SIMD16 and a "user-defined struct" and therefore should be returned via an output buffer, Vector128<T> and Vector256<T> are the __m128 and __m256 types and are actually returned in XMM0/YMM0: https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#return-values (YMM0 isn't explicitly listed, but checking the MSVC output for such a scenario shows its the case).

Vector<T> doesn't have a well-defined ABI and is "managed only". We're likely better off, codegen wise, treating it like __m128/__m256 and also having it return in register.

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple question

@@ -1419,7 +1419,7 @@ GenTree* Lowering::LowerFloatArg(GenTree** pArg, CallArg* callArg)
break;
}
GenTree* node = use.GetNode();
if (varTypeIsFloating(node))
if (varTypeUsesFloatReg(node))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be varTypeUsesFloatArgReg()? Effectively it's no difference, except for LoongArch64.

Unrelated, but the code below seems odd:

                if (node->TypeGet() == TYP_DOUBLE)
                {
                    currRegNumber = REG_NEXT(REG_NEXT(currRegNumber));
                    regIndex += 2;
                }
                else
                {
                    currRegNumber = REG_NEXT(currRegNumber);
                    regIndex += 1;
                }

I would expect the TYPE_DOUBLE == 2 registers code to only apply to arm32, but it's not ifdef'ed that way.

Copy link
Contributor Author

@SingleAccretion SingleAccretion Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect the TYPE_DOUBLE == 2 registers code to only apply to arm32, but it's not ifdef'ed that way.

Indeed, I thought the same but decided against #ifdefing it in this change to keep the scope down.

It so happens that we will never have DOUBLE here on ARM64 (which is the case of interest) because morph will construct the FIELD_LIST with LONGs. Looking what happens under Linux ARM soft FP, I see the same, so it seems likely that this code is actually dead.

Should this be varTypeUsesFloatArgReg() ?

I'd think it's better with varTypeUsesFloatReg. varTypeUsesFloatArgReg has the meaning of "can this type be used as an argument from an FP register file", while here we're asking the question of "does this node define an FP register".

@@ -1441,7 +1441,7 @@ GenTree* Lowering::LowerFloatArg(GenTree** pArg, CallArg* callArg)
// List fields were replaced in place.
return arg;
}
else if (varTypeIsFloating(arg))
else if (varTypeUsesFloatReg(arg))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@BruceForstall
Copy link
Member

Another question: this change is only in armarch (and LoongArch) code. But #74126 is a failure in win-x64. So how does this fix #74126?

Comment on lines +3833 to +3837
case GT_CALL:
// Argument lowering will deal with register file mismatches if needed.
assert(varTypeIsSIMD(origType));
break;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BruceForstall it's this change that fixes the original failure.

@BruceForstall BruceForstall merged commit a07f2e9 into dotnet:main Aug 24, 2022
@SingleAccretion SingleAccretion deleted the Simd-Calls-As-Args branch August 24, 2022 20:24
@BruceForstall
Copy link
Member

/backport to release/7.0

@github-actions
Copy link
Contributor

Started backporting to release/7.0: https://github.com/dotnet/runtime/actions/runs/2922063287

@BruceForstall
Copy link
Member

@SingleAccretion Feel free to amend #74520 with additional justification for porting the fix back to 7.0

@ghost ghost locked as resolved and limited conversation to collaborators Sep 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
4 participants