Add support for AVX10.2, Add AVX10.2 API surface and template tests #111209

khushal1996 · 2025-01-08T18:39:45Z

This PR covers implementation of approved AVX10.2 APIs along with addition of corresponding template tests and lowering support in JIT.
It enables ymm embedded rounding which comes with AVX10.2

Testing overview
We follow a multi-step testing plan to verify the encoding correctness and the semantic correctness.

Testing results will be presented below.

Emitter unit tests
In codgenxarch.cpp, similar to genAmd64EmitterUnitTestsSse2, we used the JitLateDisasm feature to insert instructions to encode as unit tests for emitter, and LateDisasm will invoke LLVM to disasm the code stream, this gave us the chance to cross validate the disassembly from JIT and LLVM. The output of this step is to verify the emit paths are generating "correct" code that would not trigger #UD or have wrong semantics.

Note that we are using a custom coredistools.dll which uses a recent LLVM that supports AVX10.2 decoding.

SuperPMI
In this step, we would run the SuperPMI pipeline to get the asmdiffs,; the inputs are all the MCH files. This step will give us the chance to check if there is any assertion failure or internal error within JIT and since the pipeline will invoke coredistools.dll as well, so we can verify the encoding correctness in a larger scope.

To ensure the new changes will not hit the existing code path in terms of throughput, we ran asmdiffs with base JIT to be the main branch where changes are based on, and diff JIT to be the one with all the changes.

JIT unit tests
The 2 steps mentioned above are mainly verifying the encoding correctness of the generated binary code. In this step we have used the existing CoreCLR unit test set: JIT and run it in the Intel SDE emulator with AVX10.2 on and off.

Testing results

Run Emitter tests

Result of emitter tests using LLVM disassembler (left - JIT emitted code, right - LLVM disassembler output)

Run superpmi using JITLateDisasm to check for errors
No Decode failures observed in superpmi log.

Running superpmi without JITLateDisasm to check for assert errors
No assertion failures or asm diffs observed

[11:10:47] Running asm diffs of D:\Base_repos\runtime\artifacts\spmi\mch\64146448-11b1-4f94-b1f2-edce91fbcb33.windows.x64\aspnet.run.windows.x64.checked.mch
[11:10:47] Invoking: D:\Base_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\superpmi.exe -a -v ewi -f C:\Users\kmodi\AppData\Local\Temp\2\tmp98kfiph0\aspnet.run.windows.x64.checked.mch_fail.mcl -details C:\Users\kmodi\AppData\Local\Temp\2\tmp98kfiph0\aspnet.run.windows.x64.checked.mch_details.csv -jitoption force JitEnableNoWayAssert=1 -jitoption force JitNoForceFallback=1 -jitoption force JitAlignLoops=0 -jit2option force JitEnableNoWayAssert=1 -jit2option force JitNoForceFallback=1 -jit2option force JitAlignLoops=0 -p D:\Base_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll D:\Git_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll D:\Base_repos\runtime\artifacts\spmi\mch\64146448-11b1-4f94-b1f2-edce91fbcb33.windows.x64\aspnet.run.windows.x64.checked.mch
[11:11:03] SuperPMI encountered missing data for 33 out of 129205 contexts
[11:11:03] Running asm diffs of D:\Base_repos\runtime\artifacts\spmi\mch\64146448-11b1-4f94-b1f2-edce91fbcb33.windows.x64\benchmarks.run.windows.x64.checked.mch
[11:11:03] Invoking: D:\Base_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\superpmi.exe -a -v ewi -f C:\Users\kmodi\AppData\Local\Temp\2\tmp98kfiph0\benchmarks.run.windows.x64.checked.mch_fail.mcl -details C:\Users\kmodi\AppData\Local\Temp\2\tmp98kfiph0\benchmarks.run.windows.x64.checked.mch_details.csv -jitoption force JitEnableNoWayAssert=1 -jitoption force JitNoForceFallback=1 -jitoption force JitAlignLoops=0 -jit2option force JitEnableNoWayAssert=1 -jit2option force JitNoForceFallback=1 -jit2option force JitAlignLoops=0 -p D:\Base_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll D:\Git_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll D:\Base_repos\runtime\artifacts\spmi\mch\64146448-11b1-4f94-b1f2-edce91fbcb33.windows.x64\benchmarks.run.windows.x64.checked.mch
[11:11:09] SuperPMI encountered missing data for 13 out of 28757 contexts
[11:11:09] Running asm diffs of D:\Base_repos\runtime\artifacts\spmi\mch\64146448-11b1-4f94-b1f2-edce91fbcb33.windows.x64\benchmarks.run_pgo.windows.x64.checked.mch
[11:11:09] Invoking: D:\Base_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\superpmi.exe -a -v ewi -f C:\Users\kmodi\AppData\Local\Temp\2\tmp98kfiph0\benchmarks.run_pgo.windows.x64.checked.mch_fail.mcl -details C:\Users\kmodi\AppData\Local\Temp\2\tmp98kfiph0\benchmarks.run_pgo.windows.x64.checked.mch_details.csv -jitoption force JitEnableNoWayAssert=1 -jitoption force JitNoForceFallback=1 -jitoption force JitAlignLoops=0 -jit2option force JitEnableNoWayAssert=1 -jit2option force JitNoForceFallback=1 -jit2option force JitAlignLoops=0 -p D:\Base_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll D:\Git_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll D:\Base_repos\runtime\artifacts\spmi\mch\64146448-11b1-4f94-b1f2-edce91fbcb33.windows.x64\benchmarks.run_pgo.windows.x64.checked.mch
[11:11:22] SuperPMI encountered missing data for 62 out of 105618 contexts
[11:11:22] Running asm diffs of D:\Base_repos\runtime\artifacts\spmi\mch\64146448-11b1-4f94-b1f2-edce91fbcb33.windows.x64\benchmarks.run_tiered.windows.x64.checked.mch
[11:11:22] Invoking: D:\Base_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\superpmi.exe -a -v ewi -f C:\Users\kmodi\AppData\Local\Temp\2\tmp98kfiph0\benchmarks.run_tiered.windows.x64.checked.mch_fail.mcl -details C:\Users\kmodi\AppData\Local\Temp\2\tmp98kfiph0\benchmarks.run_tiered.windows.x64.checked.mch_details.csv -jitoption force JitEnableNoWayAssert=1 -jitoption force JitNoForceFallback=1 -jitoption force JitAlignLoops=0 -jit2option force JitEnableNoWayAssert=1 -jit2option force JitNoForceFallback=1 -jit2option force JitAlignLoops=0 -p D:\Base_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll D:\Git_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll D:\Base_repos\runtime\artifacts\spmi\mch\64146448-11b1-4f94-b1f2-edce91fbcb33.windows.x64\benchmarks.run_tiered.windows.x64.checked.mch
[11:11:27] SuperPMI encountered missing data for 2 out of 55912 contexts
[11:11:27] Running asm diffs of D:\Base_repos\runtime\artifacts\spmi\mch\64146448-11b1-4f94-b1f2-edce91fbcb33.windows.x64\coreclr_tests.run.windows.x64.checked.mch
[11:11:27] Invoking: D:\Base_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\superpmi.exe -a -v ewi -f C:\Users\kmodi\AppData\Local\Temp\2\tmp98kfiph0\coreclr_tests.run.windows.x64.checked.mch_fail.mcl -details C:\Users\kmodi\AppData\Local\Temp\2\tmp98kfiph0\coreclr_tests.run.windows.x64.checked.mch_details.csv -jitoption force JitEnableNoWayAssert=1 -jitoption force JitNoForceFallback=1 -jitoption force JitAlignLoops=0 -jit2option force JitEnableNoWayAssert=1 -jit2option force JitNoForceFallback=1 -jit2option force JitAlignLoops=0 -p D:\Base_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll D:\Git_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll D:\Base_repos\runtime\artifacts\spmi\mch\64146448-11b1-4f94-b1f2-edce91fbcb33.windows.x64\coreclr_tests.run.windows.x64.checked.mch
[11:13:16] SuperPMI encountered missing data for 149 out of 582221 contexts
[11:13:16] Running asm diffs of D:\Base_repos\runtime\artifacts\spmi\mch\64146448-11b1-4f94-b1f2-edce91fbcb33.windows.x64\libraries.crossgen2.windows.x64.checked.mch
[11:13:16] Invoking: D:\Base_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\superpmi.exe -a -v ewi -f C:\Users\kmodi\AppData\Local\Temp\2\tmp98kfiph0\libraries.crossgen2.windows.x64.checked.mch_fail.mcl -details C:\Users\kmodi\AppData\Local\Temp\2\tmp98kfiph0\libraries.crossgen2.windows.x64.checked.mch_details.csv -jitoption force JitEnableNoWayAssert=1 -jitoption force JitNoForceFallback=1 -jitoption force JitAlignLoops=0 -jit2option force JitEnableNoWayAssert=1 -jit2option force JitNoForceFallback=1 -jit2option force JitAlignLoops=0 -p D:\Base_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll D:\Git_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll D:\Base_repos\runtime\artifacts\spmi\mch\64146448-11b1-4f94-b1f2-edce91fbcb33.windows.x64\libraries.crossgen2.windows.x64.checked.mch
[11:13:36] SuperPMI encountered missing data for 3 out of 280377 contexts
[11:13:36] Running asm diffs of D:\Base_repos\runtime\artifacts\spmi\mch\64146448-11b1-4f94-b1f2-edce91fbcb33.windows.x64\libraries.pmi.windows.x64.checked.mch
[11:13:36] Invoking: D:\Base_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\superpmi.exe -a -v ewi -f C:\Users\kmodi\AppData\Local\Temp\2\tmp98kfiph0\libraries.pmi.windows.x64.checked.mch_fail.mcl -details C:\Users\kmodi\AppData\Local\Temp\2\tmp98kfiph0\libraries.pmi.windows.x64.checked.mch_details.csv -jitoption force JitEnableNoWayAssert=1 -jitoption force JitNoForceFallback=1 -jitoption force JitAlignLoops=0 -jit2option force JitEnableNoWayAssert=1 -jit2option force JitNoForceFallback=1 -jit2option force JitAlignLoops=0 -p D:\Base_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll D:\Git_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll D:\Base_repos\runtime\artifacts\spmi\mch\64146448-11b1-4f94-b1f2-edce91fbcb33.windows.x64\libraries.pmi.windows.x64.checked.mch
[11:14:05] SuperPMI encountered missing data for 68 out of 295086 contexts
[11:14:05] Running asm diffs of D:\Base_repos\runtime\artifacts\spmi\mch\64146448-11b1-4f94-b1f2-edce91fbcb33.windows.x64\libraries_tests.run.windows.x64.Release.mch
[11:14:05] Invoking: D:\Base_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\superpmi.exe -a -v ewi -f C:\Users\kmodi\AppData\Local\Temp\2\tmp98kfiph0\libraries_tests.run.windows.x64.Release.mch_fail.mcl -details C:\Users\kmodi\AppData\Local\Temp\2\tmp98kfiph0\libraries_tests.run.windows.x64.Release.mch_details.csv -jitoption force JitEnableNoWayAssert=1 -jitoption force JitNoForceFallback=1 -jitoption force JitAlignLoops=0 -jit2option force JitEnableNoWayAssert=1 -jit2option force JitNoForceFallback=1 -jit2option force JitAlignLoops=0 -p D:\Base_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll D:\Git_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll D:\Base_repos\runtime\artifacts\spmi\mch\64146448-11b1-4f94-b1f2-edce91fbcb33.windows.x64\libraries_tests.run.windows.x64.Release.mch
[11:15:22] SuperPMI encountered missing data for 447 out of 751895 contexts
[11:16:23] Running asm diffs of D:\Base_repos\runtime\artifacts\spmi\mch\64146448-11b1-4f94-b1f2-edce91fbcb33.windows.x64\libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch
[11:16:23] Invoking: D:\Base_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\superpmi.exe -a -v ewi -f C:\Users\kmodi\AppData\Local\Temp\2\tmp98kfiph0\libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch_fail.mcl -details C:\Users\kmodi\AppData\Local\Temp\2\tmp98kfiph0\libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch_details.csv -jitoption force JitEnableNoWayAssert=1 -jitoption force JitNoForceFallback=1 -jitoption force JitAlignLoops=0 -jit2option force JitEnableNoWayAssert=1 -jit2option force JitNoForceFallback=1 -jit2option force JitAlignLoops=0 -p D:\Base_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll D:\Git_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll D:\Base_repos\runtime\artifacts\spmi\mch\64146448-11b1-4f94-b1f2-edce91fbcb33.windows.x64\libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch
[11:17:17] SuperPMI encountered missing data for 208 out of 342818 contexts
[11:17:17] Running asm diffs of D:\Base_repos\runtime\artifacts\spmi\mch\64146448-11b1-4f94-b1f2-edce91fbcb33.windows.x64\realworld.run.windows.x64.checked.mch
[11:17:17] Invoking: D:\Base_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\superpmi.exe -a -v ewi -f C:\Users\kmodi\AppData\Local\Temp\2\tmp98kfiph0\realworld.run.windows.x64.checked.mch_fail.mcl -details C:\Users\kmodi\AppData\Local\Temp\2\tmp98kfiph0\realworld.run.windows.x64.checked.mch_details.csv -jitoption force JitEnableNoWayAssert=1 -jitoption force JitNoForceFallback=1 -jitoption force JitAlignLoops=0 -jit2option force JitEnableNoWayAssert=1 -jit2option force JitNoForceFallback=1 -jit2option force JitAlignLoops=0 -p D:\Base_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll D:\Git_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll D:\Base_repos\runtime\artifacts\spmi\mch\64146448-11b1-4f94-b1f2-edce91fbcb33.windows.x64\realworld.run.windows.x64.checked.mch
[11:17:23] SuperPMI encountered missing data for 11 out of 24824 contexts
[11:17:23] Running asm diffs of D:\Base_repos\runtime\artifacts\spmi\mch\64146448-11b1-4f94-b1f2-edce91fbcb33.windows.x64\smoke_tests.nativeaot.windows.x64.checked.mch
[11:17:23] Invoking: D:\Base_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\superpmi.exe -a -v ewi -f C:\Users\kmodi\AppData\Local\Temp\2\tmp98kfiph0\smoke_tests.nativeaot.windows.x64.checked.mch_fail.mcl -details C:\Users\kmodi\AppData\Local\Temp\2\tmp98kfiph0\smoke_tests.nativeaot.windows.x64.checked.mch_details.csv -jitoption force JitEnableNoWayAssert=1 -jitoption force JitNoForceFallback=1 -jitoption force JitAlignLoops=0 -jit2option force JitEnableNoWayAssert=1 -jit2option force JitNoForceFallback=1 -jit2option force JitAlignLoops=0 -p D:\Base_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll D:\Git_repos\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\clrjit.dll D:\Base_repos\runtime\artifacts\spmi\mch\64146448-11b1-4f94-b1f2-edce91fbcb33.windows.x64\smoke_tests.nativeaot.windows.x64.checked.mch
[11:17:27] SuperPMI encountered missing data for 2 out of 29727 contexts
[11:17:27] Asm diffs summary:
[11:17:27]   Summary Markdown file: D:\Base_repos\runtime\artifacts\spmi\diff_summary.1.md
[11:17:27]   Short Summary Markdown file: D:\Base_repos\runtime\artifacts\spmi\diff_short_summary.1.md
[11:17:27]   No asm diffs
[11:17:27] Finish time: 11:17:27
[11:17:27] Elapsed time: 0:06:39.678574

Run JIT subtree with AVX10.2 enabled / disabled

AVX10.2 enabled

AVX10.2 disabled

dotnet-issue-labeler · 2025-01-08T18:39:53Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

dotnet-issue-labeler · 2025-01-08T18:39:54Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

dotnet-policy-service · 2025-01-08T18:40:31Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

…Lower Avx10.2 nodes accordingly.

…DE." This reverts commit 067e31e.

… rounding

…embedded rounding" This reverts commit 493572f.

…DE." This reverts commit 067e31e.

This reverts commit 61719f8.

…rors

tannergooding · 2025-01-23T17:07:35Z

src/coreclr/jit/emitxarch.cpp

+//   Reserved for isas Avx10.1 and below
+//   Needs to be set to 0 for AVX10.2 adn above to indicate YMM embedded rounding


nit:

Suggested change

// Reserved for isas Avx10.1 and below

// Needs to be set to 0 for AVX10.2 adn above to indicate YMM embedded rounding

// Reserved for isas Avx10.1 and below

// Set to 0 on AVX10.2 and above for YMM embedded rounding support

Or something along those lines. The current wording makes it sound like it must be always set to 0, rather than only conditionally set if we're using embedded rounding for YMM sizes

I believe on Avx10.1 and below it is required to be set to 1 as well, so this should likely indicate that. Simply reserved doesn't imply a necessary state

I have pushed these changes in my latest commit

tannergooding · 2025-01-23T17:09:07Z

src/coreclr/jit/emitxarch.cpp

-    //      Scalar Double  Scalar Single  Packed Double
-    return ((b == 0xF2) || (b == 0xF3) || (b == 0x66));
+    //      Scalar Double  Scalar Single  Packed Double   No prefix
+    return ((b == 0xF2) || (b == 0xF3) || (b == 0x66) || (b == 0x00));


This change isn't necessary given the assert(b != 0 above

I have pushed these changes in my latest commit

tannergooding · 2025-01-23T17:13:25Z

src/coreclr/jit/emitxarch.cpp

@@ -2314,6 +2346,13 @@ emitter::code_t emitter::emitExtractEvexPrefix(instruction ins, code_t& code) co
            break;
        }

+        case 0x05:


Does this need to handle leadingBytes being [0x00, 0x04] and [0x06, 0x07]?

The higher assert seems to indicate it can be any of those, but you've only added 0x05

As of now we dont have those instructions. But yes, I think it is better to have to handle those as invalid as of now since we dont have those instructions yet.

I have pushed these changes in my latest commit

tannergooding · 2025-01-23T17:16:56Z

src/coreclr/jit/hwintrinsiclistxarch.h

+HARDWARE_INTRINSIC(AVX10v2_V512,    ConvertToVectorUInt32WithTruncationSaturation,                   64,               1,     {INS_invalid,           INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_vcvttps2udqs,       INS_vcvttpd2udqs},          HW_Category_SimpleSIMD,             HW_Flag_BaseTypeFromFirstArg|HW_Flag_EmbBroadcastCompatible|HW_Flag_EmbMaskingCompatible)
+HARDWARE_INTRINSIC(AVX10v2_V512,    ConvertToVectorUInt64WithTruncationSaturation,                   64,               1,     {INS_invalid,           INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_vcvttps2uqqs,       INS_vcvttpd2uqqs},          HW_Category_SimpleSIMD,             HW_Flag_BaseTypeFromFirstArg|HW_Flag_EmbBroadcastCompatible|HW_Flag_EmbMaskingCompatible)
+HARDWARE_INTRINSIC(AVX10v2_V512,    MinMax,                                                          64,               3,     {INS_invalid,           INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_vminmaxps,          INS_vminmaxpd},             HW_Category_IMM,                    HW_Flag_BaseTypeFromFirstArg|HW_Flag_EmbBroadcastCompatible|HW_Flag_EmbMaskingCompatible)
+HARDWARE_INTRINSIC(AVX10v2_V512,    MultipleSumAbsoluteDifferences,                                  64,               3,     {INS_vmpsadbw,          INS_invalid,            INS_invalid,            INS_vmpsadbw,           INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid},               HW_Category_IMM,                    HW_Flag_FullRangeIMM|HW_Flag_EmbMaskingCompatible) // TBD where should we put the instruction typ_byte or typ_ushort?


For the TBD, this has to match the tracked simdBaseType that the node has.

The node typically defaults to the base type of the SIMD return as that is generally unambiguous and matches the types used for the inputs. However, if there are conflicts there then we switch to the base type of the first or second argument, depending on which is "unique" and allows disambiguation (in the same way it would for overload resolution).

My bad. I missed the TBD comment here. So what you are saying is the simdBaseType of the node needs to be matched here.

I have pushed these changes in my latest commit

@tannergooding there seems to be a bug here and the template test for MultipleSumAbsoluteDifferences seems to be failing. It hits an assert because the simdBaseType was TYP_USHORT. Is that correct for this API? should the simdBaseType be TYP_BYTE or TYP_USHORT? from your comment above, I concluded that it needs to be TYP_BYTE since the API is

public static Vector512<ushort> MultipleSumAbsoluteDifferences(Vector512<byte> left, Vector512<byte> right, [ConstantExpected] byte mask) => MultipleSumAbsoluteDifferences(left, right, mask);

@khushal1996 We start with it undefined here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsic.cpp#L1694C28-L1694C43

If the return type is a struct, we then initialize simdBaseJitType and sizeBytes based on the information extracted from it: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsic.cpp#L1700; thus we get CORINFO_TYPE_USHORT and 64

We have a path that may adjust this here, by looking at the type of the input arguments: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsic.cpp#L1765

However, such a path only triggers if HW_Flag_BaseTypeFromFirstArg or HW_Flag_BaseTypeFromSecondArg is set: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsic.cpp#L732, otherwise it preserves the original value.

It can also be updated if HW_Flag_NormalizeSmallTypeToInt is set: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsic.cpp#L1847-L1851

Since the category and flags on AVX10v2_V512_MultipleSumAbsoluteDifferences are: HW_Category_IMM and HW_Flag_FullRangeIMM | HW_Flag_EmbMaskingCompatible, debugging should show that the base type is computed to be CORINFO_TYP_USHORT.

Is that not the case?

So from what you are saying, we can control the simdBaseJitType and we have to define an intrinsic based on that? There is no fixed way but multiple ways to define a HardwareIntrinsic?

For this, in a way,

HARDWARE_INTRINSIC(AVX10v2_V512, MultipleSumAbsoluteDifferences, 64, 3, {INS_invalid, INS_invalid, INS_invalid, INS_vmpsadbw, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_IMM, HW_Flag_FullRangeIMM|HW_Flag_EmbMaskingCompatible)

HARDWARE_INTRINSIC(AVX10v2_V512, MultipleSumAbsoluteDifferences, 64, 3, {INS_invalid, INS_vmpsadbw, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_IMM, HW_Flag_BaseTypeFromFirstArg|HW_Flag_FullRangeIMM|HW_Flag_EmbMaskingCompatible)

So both of them will be handled well and there can be more than one way to handle them?

So from what you are saying, we can control the simdBaseJitType and we have to define an intrinsic based on tha

Right, with the intent being that the defaults are good enough in most cases and we only deviate from that default when something requires it; such as needing a different base type to determine the correct instruction to emit.

So both of them will be handled well and there can be more than one way to handle them?

Yes, assuming of course that the difference between TYP_UBYTE and TYP_USHORT is otherwise irrelevant to the other phases. If you needed to key off the base type of the parameter such as to determine if embedded masking/containment was valid instead, that's a different story and you'd want the latter.

Got it thanks for the detailed explanation. I have identified the actual bug and will raise an external PR soon for the failing case. But before that I would also want to evaluate the case of embeddded masking since it is supported here. let me investigate and then raise an external PR.

src/coreclr/jit/instr.h

tannergooding · 2025-01-23T17:28:09Z

The changes overall LGTM. Just a couple small cleanup asks.

It should generally be good for secondary review, CC. @dotnet/jit-contrib

BruceForstall

Looks good to me except for one nit and one thing in lookupInstructionSet that looks like a bug.

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/Avx10v2.cs

src/coreclr/jit/hwintrinsicxarch.cpp

Co-authored-by: Bruce Forstall <brucefo@microsoft.com>

tannergooding · 2025-01-24T16:29:33Z

Thanks for all the work here @khushal1996!

BruceForstall · 2025-01-24T18:11:25Z

+1 Thanks @khushal1996!

khushal1996 · 2025-01-24T18:32:52Z

Thanks @tannergooding @BruceForstall

* main: (22 commits) Clean up Stopwatch a bit (dotnet#111834) JIT: Fix embedded broadcast simd size (dotnet#111638) Revert potential UB due to aliasing + more WB removals (dotnet#111733) re-enable acceleration of Vector512<long>.op_Multiply (dotnet#111832) Handle OSSL 3.4 change to SAN:othername formatting JIT: Fix stack allocated arrays for NativeAOT (dotnet#111827) JIT: enhance RBO inference for similar compares to constants (dotnet#111766) JIT: Don't run optSetBlockWeights when we have PGO data (dotnet#111764) [Android] Make sure RuntimeFlavor=CoreCLR when clr subset is specified (dotnet#111821) Change empty subject test certificate to include a critical SAN. Fix reversed code offsets in GcInfo (dotnet#111792) Swap some libraries areas between leads (dotnet#111816) Add left-handed spherical and cylindrical billboards (dotnet#109605) JIT: revise `optRelopImpliesRelop` to always set `reverseSense` (dotnet#111803) Fix Zip64ExtraField handling (dotnet#111802) Add build support for Android+CoreCLR (dotnet#110471) arm64: Add bic(s) compact encoding (dotnet#111452) JIT: Ensure `BBF_PROF_WEIGHT` flag is set when we have PGO data (dotnet#111780) Add support for AVX10.2, Add AVX10.2 API surface and template tests (dotnet#111209) JIT: Preliminary for enabling inlining late devirted calls (dotnet#111782) ...

dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI new-api-needs-documentation labels Jan 8, 2025

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jan 8, 2025

khushal1996 force-pushed the kcm-avx102-api-public-pr branch from ec7732d to 216999c Compare January 8, 2025 18:55

khushal1996 and others added 20 commits January 8, 2025 11:58

Add support for AVX10.2. Add AVX10.2 API surface and template tests. …

9ba454d

…Lower Avx10.2 nodes accordingly.

Add support and template tests for AVX10v2_V512

6aa4048

Add new coredistools.dll build from latest llvm repo

b0f4e6c

Limit JIT unit suite within the subsets which are stable in SDE.

8304105

Rename API as per latest API proposal discussions

64e328a

fix sample tests in handwritten project

08c7c26

Revert "Limit JIT unit suite within the subsets which are stable in S…

ef3101c

…DE." This reverts commit 067e31e.

Limit JIT unit suite within the subsets which are stable in SDE.

b4de426

Allow a prefix of 0x00 for AVX10.2 instructions.

a2aba38

Revert "Limit JIT unit suite within the subsets which are stable in S…

abac88e

…DE." This reverts commit 067e31e.

Limit JIT unit suite within the subsets which are stable in SDE.

154988b

remove developer comments from files

47f3e5a

Enable all template tests and enable ymm embedded rounding

e6004f5

Make emitter independent of ISa and based on insOpts for ymm embedded…

ae223f8

… rounding

Enable ymm embedded rounding based on architecture

885f1cb

Revert "Make emitter independent of ISa and based on insOpts for ymm …

12a5a26

…embedded rounding" This reverts commit 493572f.

Separate Avx10.2 unit testing framework from APX framework

161c3e9

Revert "Limit JIT unit suite within the subsets which are stable in S…

5de4944

…DE." This reverts commit 067e31e.

Revert "Add new coredistools.dll build from latest llvm repo"

2a9b3f8

This reverts commit 61719f8.

Fix formatting

83868ab

khushal1996 force-pushed the kcm-avx102-api-public-pr branch from 7f2b5ae to 83868ab Compare January 8, 2025 21:52

build-analysis bot mentioned this pull request Jan 9, 2025

ModuleNotFoundError: No module named 'pkg_resources' dotnet/dnceng#4756

Open

3 tasks

Use new keyword for class V512 to hide Avx10v1.V512 and correct CI er…

ca860a3

…rors

khushal1996 added 5 commits January 21, 2025 09:05

Format code

e6e0f4b

Handle sizePrefix = 0 case when decoding evex instruction

abad2f7

Add assert in appropriate places

a1e7cb4

Club similar instructions together in perf calculation in emitxarch

c88ee06

Run formatting

a53b88d

This was referenced Jan 21, 2025

Libraries test fails with "Method has zero rva" error #108641

Open

[WASI] Sockets - unknown handle index #108726

Open

khushal1996 added 2 commits January 22, 2025 12:05

Add assembly prints for debug assembly capturing for Avx10.2

e9ae4e2

Use correct size when running emitter tests

be2abc0

tannergooding reviewed Jan 23, 2025

View reviewed changes

src/coreclr/jit/instr.h Show resolved Hide resolved

Ad appropriate comments and make review changes

e8ff022

BruceForstall added avx10 Related to the AVX10 architecture and removed apx Related to the Intel Advanced Performance Extensions (APX) labels Jan 23, 2025

BruceForstall reviewed Jan 23, 2025

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/Avx10v2.cs Outdated Show resolved Hide resolved

src/coreclr/jit/hwintrinsicxarch.cpp Outdated Show resolved Hide resolved

khushal1996 mentioned this pull request Jan 24, 2025

Use Avx10.2 Instructions in Floating Point Conversions #111775

Merged

Apply suggestions from code review

deea887

Co-authored-by: Bruce Forstall <brucefo@microsoft.com>

build-analysis bot mentioned this pull request Jan 24, 2025

/root/helix/work/correlation/scripts/<hash>/execute.sh: Permission denied dotnet/dnceng#3412

Open

3 tasks

tannergooding approved these changes Jan 24, 2025

View reviewed changes

tannergooding merged commit 03b2d3d into dotnet:main Jan 24, 2025
165 of 169 checks passed

tannergooding mentioned this pull request Jan 31, 2025

[API Proposal]: Add AVX10v2 API to add Avx10.2 support #109083

Closed

khushal1996 mentioned this pull request Feb 6, 2025

Fix the failing template test for AVX10.2 #112252

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for AVX10.2, Add AVX10.2 API surface and template tests #111209

Add support for AVX10.2, Add AVX10.2 API surface and template tests #111209

khushal1996 commented Jan 8, 2025 •

edited

Loading

dotnet-issue-labeler bot commented Jan 8, 2025

dotnet-issue-labeler bot commented Jan 8, 2025

dotnet-policy-service bot commented Jan 8, 2025

tannergooding Jan 23, 2025

tannergooding Jan 23, 2025

khushal1996 Jan 23, 2025

tannergooding Jan 23, 2025

khushal1996 Jan 23, 2025

tannergooding Jan 23, 2025

khushal1996 Jan 23, 2025

khushal1996 Jan 23, 2025

tannergooding Jan 23, 2025

khushal1996 Jan 23, 2025

khushal1996 Jan 23, 2025

khushal1996 Feb 3, 2025

tannergooding Feb 4, 2025

khushal1996 Feb 4, 2025

tannergooding Feb 4, 2025

khushal1996 Feb 4, 2025 •

edited

Loading

tannergooding commented Jan 23, 2025

BruceForstall left a comment

tannergooding commented Jan 24, 2025

BruceForstall commented Jan 24, 2025

khushal1996 commented Jan 24, 2025

		// Reserved for isas Avx10.1 and below
		// Needs to be set to 0 for AVX10.2 adn above to indicate YMM embedded rounding

Add support for AVX10.2, Add AVX10.2 API surface and template tests #111209

Add support for AVX10.2, Add AVX10.2 API surface and template tests #111209

Conversation

khushal1996 commented Jan 8, 2025 • edited Loading

Testing results

Run Emitter tests

dotnet-issue-labeler bot commented Jan 8, 2025

dotnet-issue-labeler bot commented Jan 8, 2025

dotnet-policy-service bot commented Jan 8, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

khushal1996 Feb 4, 2025 • edited Loading

Choose a reason for hiding this comment

tannergooding commented Jan 23, 2025

BruceForstall left a comment

Choose a reason for hiding this comment

tannergooding commented Jan 24, 2025

BruceForstall commented Jan 24, 2025

khushal1996 commented Jan 24, 2025

khushal1996 commented Jan 8, 2025 •

edited

Loading

khushal1996 Feb 4, 2025 •

edited

Loading