Arm64 vector ABI #23675

CarolEidt · 2019-04-02T23:38:04Z

Extend HFA support to support vectors as well as floating point types.
This requires that the JIT recognize vector types even during crossgen,
so that the ABI is supported consistently.

Fix #16022

CarolEidt · 2019-04-05T04:48:44Z

@dotnet/jit-contrib @tannergooding @sdmaclea @TamarChristinaArm @jkotas @jkoritzinsky
This is not yet ready to merge (in fact, there's a conflict with @jkoritzinsky 's recent commit), but I would like to get feedback at this point.

src/jit/compiler.h

src/jit/vartype.h

mikedn · 2019-04-05T05:48:01Z

src/jit/compiler.h

-    bool _isHfaArg : 1;    // True when the argument is an HFA type.
-    bool _isDoubleHfa : 1; // True when the argument is an HFA, with an element type of DOUBLE.
+    bool          _isHfaArg : 1;   // True when the argument is an HFA type.
+    unsigned char _hfaLogSize : 2; // The base 2 log of the HFA element size in 4-byte units (e.g. 0 for TYP_FLOAT)


Isn't there other terminology that could be used here? "size" most of the type refers to byte size. In call morphing code "size" also refers to the number of slots. And now here it refers to the number of 4-byte units.

How about _hfaLogWidth? Or _hfaLogRelSize for relative size?

Hmm, width is also used in some cases to denote byte size (some block ops code does that). But at least it's not used within LclVarDsc (which has size and exactSize) so it's a better alternative.

Though I'm not sure why this is needed to begin with. Is it just to "compress" the information down to 2 bits in the bitfield?

Yes, it's just to keep the information small as opposed to making it a var_types field, or creating a new enum.

Hmm, then leave it as it is. I have a feeling that this HFA info will end up in the ClassLayout thing I have in #21705 sooner or later.

In looking at my response, I'm thinking a new enum would be better - it could be called something like HfaElemKind and have values HFA_ELEM_FLOAT, HFA_ELEM_DOUBLE and HFA_ELEM_VECTOR128. And then we could rely on the C++ compiler to generate good code rather than trying to pre-optimize by using shifts. (I try to make it a principle to rely on the code generator and not pre-optimize, but I don't always manage to do so!)

Right, an enum should be fine as long as it has underlying type unsigned char to match the other bitfield members. Otherwise AFAIR VC++ does not pack data member of different types into the same byte of the bitfield.

TamarChristinaArm · 2019-04-05T07:34:16Z

Ah great, I'm out of office until Monday but will take a look first thing Monday morning!

src/jit/codegencommon.cpp

CarolEidt · 2019-04-09T23:41:08Z

@dotnet-bot test Windows_NT arm Cross Checked Innerloop Build and Test

CarolEidt · 2019-04-10T17:30:43Z

I believe I've incorporated all the feedback, and this is now green. I'm going to do some outerloop testing, but would appreciate any additional feedback on the changes.

CarolEidt · 2019-04-10T17:35:05Z

/azp run coreclr-outerloop

azure-pipelines · 2019-04-10T17:35:23Z

Azure Pipelines successfully started running 1 pipeline(s).

sdmaclea · 2019-04-10T19:13:39Z

Are there any tests which are testing this is working correctly? Last I knew the tests were missing.
Are we supporting interop with external libraries using HVA? Especially this class of tests.

tannergooding

Overall LGTM and looks to match the spec.

I think it is a bit odd that we aren't fully classifying everything independently as HVAs vs HFAs (it looked like we were sharing the logic a lot, where possible, since they are very similar), but I don't think it is something we need to block this PR on. Ultimately, I think it would be good to have all the logic more clearly defined/classified as it might make the code easier to follow and make it easier to bring new platforms online...

BruceForstall

A few comments/questions

BruceForstall · 2019-04-13T00:20:45Z

src/vm/callingconvention.h

@@ -50,8 +50,7 @@ struct ArgLocDesc
 #endif // UNIX_AMD64_ABI

 #if defined(_TARGET_ARM64_)
-    bool    m_isSinglePrecision;  // For determining if HFA is single or double
-                                  // precision
+    int      m_hfaFieldSize;      // Size of HFA field


It would be useful if the comment stated "... in bytes" (assuming that's what it is)

nit: this is never negative, right, so should this be declared unsigned?

Yes, that makes sense.

BruceForstall · 2019-04-13T00:26:18Z

src/vm/callingconvention.h

+                case ELEMENT_TYPE_R4: hfaFieldSize = 4; break;
+                case ELEMENT_TYPE_R8: hfaFieldSize = 8; break;
+#ifdef _TARGET_ARM64_
+                case ELEMENT_TYPE_VALUETYPE: hfaFieldSize = 16; break;


Seems odd that ELEMENT_TYPE_VALUETYPE implies hfaFieldSize = 16. Don't you need to somehow look up the value type size? I guess you're just overloading this existing type for this particular purpose?

This code (figuring out hfaFieldSize) is also duplicated 3 times; can it be extracted to a func?

I guess you're just overloading this existing type for this particular purpose?

Yes, it's a bit of a hack, but avoids having to create a new VM type.

BruceForstall · 2019-04-13T00:37:07Z

src/vm/class.h

@@ -580,7 +587,13 @@ class EEClassLayoutInfo
        void SetNativeHFAType(CorElementType hfaType)
        {
            LIMITED_METHOD_CONTRACT;
-            m_bFlags |= (hfaType == ELEMENT_TYPE_R4) ? e_R4_HFA : e_R8_HFA;
+            switch (hfaType)


Is this function ever called more than once?

Seems you could add assert(m_bFlags & e_HFATypeFlags == 0) if not.

or, change the assigns to something like m_bFlags = (m_bFlags & ~e_HFATypeFlags) | e_..._HFA

I believe it should never be called more than once. I'll add the assert.

BruceForstall · 2019-04-13T00:41:58Z

src/jit/compiler.h

+    HFA_ELEM_NONE,
+    HFA_ELEM_FLOAT,
+    HFA_ELEM_DOUBLE,
+    HFA_ELEM_SIMD16


Is HFA_ELEM_SIMD16 only arm64? Should it be #ifdef'ed, or is that too much? Maybe asserts someplace that other platforms don't see it?

It didn't seem worth ifdef'ing this out

BruceForstall · 2019-04-13T00:44:54Z

src/jit/compiler.h

+    //
+    // Return Value:
+    //    On Arm64 - Returns 1-4 indicating the number of register slots used by the HFA
+    //    On Arm32 - Returns the total number of single FP register slots used by the HFA, max is 8


This asymmetry is kind of annoying...

Yes, I agree, but it's pre-existing and I'm pretty sure that changing it would be non-trivial.

BruceForstall · 2019-04-13T00:45:53Z

src/jit/compiler.h

+                slots = lvExactSize >> 3;
+                break;
+            case HFA_ELEM_SIMD16:
+                slots = lvExactSize >> 4;


Should you add asserts that lvExactSize is a multiple of the appropriate size, e.g., assert(lvExactSize % 16 == 0)?

That makes sense.

Extend HFA support to support vectors as well as floating point types. This requires that the JIT recognize vector types even during crossgen, so that the ABI is supported consistently. Fix #16022

CarolEidt · 2019-04-14T14:16:34Z

The "Test pri0 Linux arm64 checked" leg failed with a timeout in /JIT/Methodical/tailcall/Desktop/_il_relthread-race/_il_relthread-race.sh
and "There was an I/O error during the build." on the Windows arm64 leg.

CarolEidt · 2019-04-14T14:16:48Z

test Windows_NT arm64 Cross Checked Innerloop Build and Test

CarolEidt · 2019-04-14T14:59:26Z

/azp run coreclr-outerloop

azure-pipelines · 2019-04-14T14:59:40Z

Azure Pipelines successfully started running 1 pipeline(s).

CarolEidt · 2019-04-14T18:23:28Z

test Windows_NT arm64 Cross Checked Innerloop Build and Test

CarolEidt · 2019-04-14T22:42:06Z

@BruceForstall @dotnet/jit-contrib - I've addressed all of the PR feedback, and the only remaining outerloop failures are the recently added test:
JIT/Methodical/largeframes/skip4/skippage4/skippage4.sh

I have the changes from #23881, but this test is a pri1 test and doesn't seem to have been run on that PR (or since). The failure is:

Unhandled Exception: System.TypeLoadException: Could not load type 'BigFrames.LargeStructWithRef' from assembly 'skippage4, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null' because it contains an object field at offset 65500 that is incorrectly aligned or overlapped by a non-object field.

Can I get another review?

CarolEidt · 2019-04-16T14:12:34Z

@dotnet/jit-contrib ping

briansull

Looks Good

CarolEidt · 2019-04-16T16:36:06Z

Thanks @briansull !

echesakov · 2019-04-16T22:38:14Z

@CarolEidt I am seeing 8 R2R Linux arm64 and Linux musl_arm64 tests failing during crossgen run on this commit validation https://dev.azure.com/dnceng/public/_build/results?buildId=157586&view=ms.vss-test-web.build-test-results-tab

CarolEidt · 2019-04-16T22:44:18Z

@echesakovMSFT - I'll take a look; are these not run in outerloop testing?

echesakov · 2019-04-16T22:53:01Z

@CarolEidt We don't run R2R tests in coreclr-outerloop, only during commit validations and official builds.

@RussKeldorph We probably should add coreclr-outerloop-r2r and coreclr-outerloop-r2r-jistress/gcstress build definitions for folks who want to trigger those and also have them running on schedule.

CarolEidt · 2019-04-16T23:01:30Z

@echesakovMSFT - can you remind me how to invoke a single test with R2R? Is it just the crossgen command with the appropriate "/Platform_Assemblies_Paths" option? Or is there a better way?

RussKeldorph · 2019-04-16T23:15:11Z

@CarolEidt To crossgen the test, set RunCrossGen=1 in the environment before running the test script. If you need the framework compiled like the R2R jobs, pass the crossgen option to build-test's command line.

* Support for Arm64 Vector ABI Extend HFA support to support vectors as well as floating point types. This requires that the JIT recognize vector types even during crossgen, so that the ABI is supported consistently. Also, fix and re-enable the disabled Arm64 Simd tests. Fix dotnet/coreclr#16022 Commit migrated from dotnet/coreclr@5be6b66

CarolEidt force-pushed the Arm64VectorABI branch 6 times, most recently from 22909c0 to ac77102 Compare April 4, 2019 19:59

CarolEidt closed this Apr 4, 2019

CarolEidt reopened this Apr 4, 2019

CarolEidt force-pushed the Arm64VectorABI branch from ac77102 to 0203bc4 Compare April 5, 2019 04:45