[API Proposal]: Add AVX10v2 API to add Avx10.2 support #109083

DeepakRajendrakumaran · 2024-10-21T19:00:03Z

Background and motivation

Intel has announced the features available in the next version of Avx10 (10.2). In order to support this, .NET needs to expand the Avx10 library to include the new APIs.

Avx10.2 spec. Section 7 - 14 in this spec goes over the newly added instructions. A couple of interesting features here are MinMax and saturating conversions

As part of the original API Proposal, the proposed design was for future Avx10 versions to have their own classes which inherits from Avx10v1

API Proposal

namespace System.Runtime.Intrinsics.X86
{
    /// <summary>Provides access to X86 AVX10.1 hardware instructions via intrinsics</summary>
    [Intrinsic]
    [CLSCompliant(false)]
    public abstract class Avx10v2 : Avx10v1
    {
        internal Avx10v2() { }

        public static new bool IsSupported { get => IsSupported; }

        // VMINMAXPD xmm1{k1}{z}, xmm2, xmm3/m128/m64bcst, imm8
        public static Vector128<double> MinMax(Vector128<double> left, Vector128<double> right, [ConstantExpected] byte control) => MinMax(left, right, mode);
        
        // VMINMAXPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {sae}, imm8
        public static Vector256<double> MinMax(Vector256<double> left, Vector256<double> right, [ConstantExpected] byte control) => MinMax(left, right, mode);
        
        // VMINMAXPS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst, imm8
        public static Vector128<float> MinMax(Vector128<float> left, Vector128<float> right, [ConstantExpected] byte control) => MinMax(left, right, mode);
        
        // VMINMAXPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {sae}, imm8
        public static Vector256<float> MinMax(Vector256<float> left, Vector256<float> right, [ConstantExpected] byte control) => MinMax(left, right, mode);
        
        // VMINMAXSD xmm1{k1}{z}, xmm2, xmm3/m64 {sae}, imm8
        public static double MinMaxScalar(Vector128<double> left, Vector128<double> right, [ConstantExpected] byte control) => MinMaxScalar(left, right, mode);

        // VMINMAXSS xmm1{k1}{z}, xmm2, xmm3/m32 {sae}, imm8
        public static float MinMaxScalar(Vector128<float> left, Vector128<float> right, [ConstantExpected] byte control) => MinMaxScalar(left, right, mode);

        // VADDPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
        public static Vector256<double> Add(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Add(left, right, mode);

        // VADDPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
        public static Vector256<float> Add(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Add(left, right, mode);
                
        // VDIVPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
        public static Vector256<double> Divide(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Divide(left, right, mode);
        
        // VDIVPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
        public static Vector256<float> Divide(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Divide(left, right, mode);
        
        // VCVTPS2IBS xmm1{k1}{z}, xmm2/m128/m32bcst
        public static Vector128<int> ConvertToByteWithSaturationAndWidenToInt32(Vector128<float> value) => ConvertToByteWithSaturationAndWidenToInt32(value);
                
        // VCVTPS2IBS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<int> ConvertToByteWithSaturationAndWidenToInt32(Vector256<float> value) => ConvertToByteWithSaturationAndWidenToInt32(value);
        
        // VCVTPS2IBS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<int> ConvertToByteWithSaturationAndWidenToInt32(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToByteWithSaturationAndWidenToInt32(value, mode);
        
        // VCVTPS2IUBS xmm1{k1}{z}, xmm2/m128/m32bcst
        public static Vector128<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector128<float> value) => ConvertToByteWithSaturationAndWidenToUInt32(value);
                
        // VCVTPS2IUBS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector256<float> value) => ConvertToByteWithSaturationAndWidenToUInt32(value);
        
        // VCVTPS2IUBS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToByteWithSaturationAndWidenToUInt32(value, mode);

        // VCVTTPS2IBS xmm1{k1}{z}, xmm2/m128/m32bcst
        public static Vector128<int> ConvertToByteWithTruncationSaturationAndWidenToInt32(Vector128<float> value) => ConvertToByteWithTruncationSaturationAndWidenToInt32(value);
                
        // VCVTTPS2IBS ymm1{k1}{z}, ymm2/m256/m32bcst {sae}
        public static Vector256<int> ConvertToByteWithTruncationSaturationAndWidenToInt32(Vector256<float> value) => ConvertToVector256SByteWithTruncationSaturation(value);
                
        // VCVTTPS2IUBS xmm1{k1}{z}, xmm2/m128/m32bcst
        public static Vector128<uint> ConvertToByteWithTruncationSaturationAndWidenToUInt32(Vector128<float> value) => ConvertToByteWithTruncationSaturationAndWidenToUInt32(value);
                
        // VCVTTPS2IUBS ymm1{k1}{z}, ymm2/m256/m32bcst {sae}
        public static Vector256<uint> ConvertToByteWithTruncationSaturationAndWidenToUInt32(Vector256<float> value) => ConvertToByteWithTruncationSaturationAndWidenToUInt32(value);
        
        // VMOVD xmm1, xmm2/m32
        public static Vector128<uint> ConvertToVector128UInt32(Vector128<uint> value) => ConvertToVector128UInt32(value);
        
        // VMOVW xmm1, xmm2/m16
        public static Vector128<ushort> ConvertToVector128UInt16(Vector128<ushort> value) => ConvertToVector128UInt16(value);
        
        //The below instructions are those where 
        //embedded rouding support have been added 
        //to the existing API

        // VCVTDQ2PS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<float> ConvertToVector256Single(Vector256<int> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Single(value, mode);
        
        // VCVTPD2DQ xmm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector128<int> ConvertToVector128Int32(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128Int32(value, mode);

        // VCVTPD2PS xmm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector128<float> ConvertToVector128Single(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128Single(value, mode);

        // VCVTPD2QQ ymm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector256<long> ConvertToVector256Int64(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Int64(value, mode);

        // VCVTPD2UDQ xmm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector128<uint> ConvertToVector128UInt32(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128UInt32(value, mode);

        // VCVTPD2UQQ ymm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector256<ulong> ConvertToVector256UInt64(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256UInt64(value, mode);

        // VCVTPS2DQ ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<int> ConvertToVector256Int32(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Int32(value, mode);

        // VCVTPS2QQ ymm1{k1}{z}, xmm2/m128/m32bcst {er}
        public static Vector256<long> ConvertToVector256Int64(Vector128<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Int64(value, mode);

        // VCVTPS2UDQ ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<uint> ConvertToVector256UInt32(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256UInt32(value, mode);

        // VCVTPS2UQQ ymm1{k1}{z}, xmm2/m128/m32bcst {er}
        public static Vector256<ulong> ConvertToVector256UInt64(Vector128<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256UInt64(value, mode);

        // VCVTQQ2PS xmm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector128<float> ConvertToVector128Single(Vector256<ulong> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128Single(value, mode);
        
        // VCVTQQ2PD ymm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector256<double> ConvertToVector256Double(Vector256<ulong> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Double(value, mode);
        
        // VCVTUDQ2PS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<float> ConvertToVector256Single(Vector256<uint> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Single(value, mode);
        
        // VCVTUQQ2PS xmm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector128<float> ConvertToVector128Single(Vector256<long> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128Single(value, mode);
        
        // VCVTUQQ2PD ymm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector256<double> ConvertToVector256Double(Vector256<long> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Double(value, mode);
        
        // VMULPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
        public static Vector256<double> Multiply(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Multiply(left, right, mode);
        
        // VMULPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
        public static Vector256<float> Multiply(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Multiply(left, right, mode);
        
        // VSCALEFPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
        public static Vector256<double> Scale(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Scale(left, right, mode);
        
        // VSCALEFPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
        public static Vector256<float> Scale(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Scale(left, right, mode);
        
        // VSQRTPD ymm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector256<double> Sqrt(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Sqrt(value, mode);
        
        // VSQRTPS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<float> Sqrt(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Sqrt(value, mode);
        
        // VSUBPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
        public static Vector256<double> Subtract(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Subtract(left, right, mode);
        
        // VSUBPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
        public static Vector256<float> Subtract(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Subtract(left, right, mode);
        
        [Intrinsic]
        public new abstract class X64 : Avx10v1.X64
        {
            internal X64() { }

            public static new bool IsSupported { get => IsSupported; }
        }

        [Intrinsic]
        public abstract class V512 : Avx10v1.V512
        {
            internal V512() { }

            public static new bool IsSupported { get => IsSupported; }
    
            // VMINMAXPD zmm1{k1}{z}, zmm2, zmm3/m512/m64bcst {sae}, imm8
            public static Vector512<double> MinMax(Vector512<double> left, Vector512<double> right, [ConstantExpected] byte control) => MinMax(left, right, mode);
            
            // VMINMAXPS zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst {sae}, imm8
            public static Vector512<float> MinMax(Vector512<float> left, Vector512<float> right, [ConstantExpected] byte control) => MinMax(left, right, mode);
            
            // VCVTPS2IBS zmm1{k1}{z}, zmm2/m512/m32bcst {er}
            public static Vector512<int> ConvertToByteWithSaturationAndWidenToInt32(Vector512<float> value) => ConvertToByteWithSaturationAndWidenToInt32(value);
            
            // VCVTPS2IBS zmm1{k1}{z}, zmm2/m512/m32bcst {er}
            public static Vector512<int> ConvertToByteWithSaturationAndWidenToInt32(Vector512<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToByteWithSaturationAndWidenToInt32(value, mode);
            
            // VCVTPS2IUBS zmm1{k1}{z}, zmm2/m512/m32bcst {er}
            public static Vector512<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector512<float> value) => ConvertToByteWithSaturationAndWidenToUInt32(value);
            
            // VCVTPS2IUBS zmm1{k1}{z}, zmm2/m512/m32bcst {er}
            public static Vector512<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector512<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToByteWithSaturationAndWidenToUInt32(value, mode);

            // VCVTTPS2IUBS zmm1{k1}{z}, zmm2/m512/m32bcst {sae}
            public static Vector512<int> ConvertToByteWithTruncationSaturationAndWidenToInt32(Vector512<float> value) => ConvertToByteWithTruncationSaturationAndWidenToInt32(value);
                        
            // VCVTTPS2IUBS zmm1{k1}{z}, zmm2/m512/m32bcst {sae}
            public static Vector512<uint> ConvertToByteWithTruncationSaturationAndWidenToUInt32(Vector512<float> value) => ConvertToByteWithTruncationSaturationAndWidenToUInt32(value);
            
            // This is a 512 extension of previously existing 128/26 inrinsic
            // VMPSADBW zmm1{k1}{z}, zmm2, zmm3/m512, imm8
            public static Vector512<ushort> MultipleSumAbsoluteDifferences(Vector512<byte> left, Vector512<byte> right, [ConstantExpected] byte mask) => MultipleSumAbsoluteDifferences(left, right, mask);

            [Intrinsic]
            public new abstract class X64 : Avx10v1.V512.X64
            {
                internal X64() { }

                public static new bool IsSupported { get => IsSupported; }
            }
        }
    }
}

API Usage

Vector128<float> v1 = Vector512.Create((float)someParam1);
Vector128<float> v2 = Vector512.Create((float)someParam2);
if (Avx10v2.IsSupported()) {
  Vector128<float> v3 = Avx10v2.MinMaxVector(v1, v2, 0b00000000);
  // etc
}

Alternative Designs

No response

Risks

No response

The text was updated successfully, but these errors were encountered:

dotnet-policy-service · 2024-10-21T19:18:31Z

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

DeepakRajendrakumaran · 2024-10-23T00:03:25Z

The following instructions which are part of Avx10.2 are not mentioned above. These fall under mostly 2 groups - 16 bit floating point and FMA instructions

`


Instructions Skipped - 
Entire section 7 in AVX10.2 manual

Parts of Section 8 in AVX10.2 manual
- VCOMXSH
- VUCOMXSH

Entire Section 9 in AVX10.2 manual

Parts of Section 10 in AVX10.2 manual
- VDPPHPS

Parts of Section 11 in AVX10.2 manual
- VMINMAXNEPBF16

Parts of Section 12 in AVX10.2 manual
- VADDPH
- VCMPPH
- VCVTDQ2PH
- VCVTPD2PH
- VCVTPH2DQ
- VCVTPH2PD
- VCVTPH2PS
- VCVTPH2PSX
- VCVTPH2QQ
- VCVTPH2UDQ
- VCVTPH2UQQ
- VCVTPH2UW
- VCVTPH2W
- VCVTPS2PH
- VCVTPS2PHX
- VCVTQQ2PH
- VCVTTPH2DQ
- VCVTTPH2QQ
- VCVTTPH2UDQ
- VCVTTPH2UQQ
- VCVTTPH2UW
- VCVTTPH2W
- VCVTUDQ2PH
- VCVTUQQ2PH
- VCVTUW2PH
- VCVTW2PH
- VDIVPH
- VFCMADDCPH
- VFCMULCPH
- VFMADD132PD - Prior instructions dont exist
- VFMADD132PH
- VFMADD132PS - Prior instructions dont exist
- VFMADD213PD - Prior instructions dont exist
- VFMADD213PH
- VFMADD213PS - Prior instructions dont exist
- VFMADD231PD - Prior instructions dont exist
- VFMADD231PH
- VFMADD231PS - Prior instructions dont exist
- VFMADDCPH
- VFMADDSUB132PD - Prior instructions dont exist
- VFMADDSUB132PH
- VFMADDSUB132PS - Prior instructions dont exist
- VFMADDSUB213PD - Prior instructions dont exist
- VFMADDSUB213PH
- VFMADDSUB213PS - Prior instructions dont exist
- VFMADDSUB231PD - Prior instructions dont exist
- VFMADDSUB231PH
- VFMADDSUB231PS - Prior instructions dont exist
- VFMSUB132PD - Prior instructions dont exist
- VFMSUB132PH
- VFMSUB132PS - Prior instructions dont exist
- VFMSUB213PD - Prior instructions dont exist
- VFMSUB213PH
- VFMSUB213PS - Prior instructions dont exist
- VFMSUB231PD - Prior instructions dont exist
- VFMSUB231PH
- VFMSUB231PS - Prior instructions dont exist
- VFMSUBADD132PD - Prior instructions dont exist
- VFMSUBADD132PH
- VFMSUBADD132PS - Prior instructions dont exist
- VFMSUBADD213PD - Prior instructions dont exist
- VFMSUBADD213PH
- VFMSUBADD213PS - Prior instructions dont exist
- VFMSUBADD231PD - Prior instructions dont exist
- VFMSUBADD231PH
- VFMSUBADD231PS - Prior instructions dont exist
- VFMULCPH
- VFNMADD132PD - Prior instructions dont exist
- VFNMADD132PH
- VFNMADD132PS - Prior instructions dont exist
- VFNMADD213PD - Prior instructions dont exist
- VFNMADD213PH
- VFNMADD213PS - Prior instructions dont exist
- VFNMADD231PD - Prior instructions dont exist
- VFNMADD231PH
- VFNMADD231PS - Prior instructions dont exist
- VFNMSUB132PD - Prior instructions dont exist
- VFNMSUB132PH
- VFNMSUB132PS - Prior instructions dont exist
- VFNMSUB213PD - Prior instructions dont exist
- VFNMSUB213PH
- VFNMSUB213PS - Prior instructions dont exist
- VFNMSUB231PD - Prior instructions dont exist
- VFNMSUB231PH
- VFNMSUB231PS - Prior instructions dont exist
- VGETEXPPH
- VGETMANTPH
- VMAXPH
- VMINPH
- VMULPH
- VREDUCEPH
- VRNDSCALEPH
- VSQRTPH
- VSUBPH

Parts of Section 13 in AVX10.2 manual
- VCVT[,T]NEBF162I[,U]BS
- VCVT[,T]PH2I[,U]BS

tannergooding · 2024-10-23T20:44:27Z

Haven't finished going through the list, but as initial feedback:

MinMaxVector should be named just MinMax
MinMax should instead be named MinMaxScalar
The various Compare*Enhanced APIs are unnecessary, we can implicitly use these instructions for the existing Compare* APIs, since its simply setting different flags allowing more optimal codegen for subsequent branches or conditional moves
It'd be helpful to separate out (such as via a separate code block or proposal) the "new instruction forms" where they aren't new concepts, but rather just new overloads of existing APIs (typically taking V256<T> and FloatRoundingMode)
For APIs like ConvertWithSaturationPackedFloatToSignedByteInteger, we want to use the .NET type names, so Single is preferred over Float, SByte over SignedByte, etc
- for signed integers we have SByte, Int16, Int32, Int64
- for unsigned integers we have Byte, UInt16, UInt32, UInt64
- for floating-point we have Half, Single, Double
For APIs like ConvertWithSaturationPackedFloatToSignedByteInteger, we probably want to more closely parity the existing names like ConvertToVector128Int32WithTruncation and so would call it ConvertToVector128ByteWithSaturation

DeepakRajendrakumaran · 2024-10-24T00:24:53Z

Haven't finished going through the list, but as initial feedback:

MinMaxVector should be named just MinMax

MinMax should instead be named MinMaxScalar

The various Compare*Enhanced APIs are unnecessary, we can implicitly use these instructions for the existing Compare* APIs, since its simply setting different flags allowing more optimal codegen for subsequent branches or conditional moves

It'd be helpful to separate out (such as via a separate code block or proposal) the "new instruction forms" where they aren't new concepts, but rather just new overloads of existing APIs (typically taking V256<T> and FloatRoundingMode)

For APIs like ConvertWithSaturationPackedFloatToSignedByteInteger, we want to use the .NET type names, so Single is preferred over Float, SByte over SignedByte, etc

for signed integers we have SByte, Int16, Int32, Int64

for unsigned integers we have Byte, UInt16, UInt32, UInt64

for floating-point we have Half, Single, Double

For APIs like ConvertWithSaturationPackedFloatToSignedByteInteger, we probably want to more closely parity the existing names like ConvertToVector128Int32WithTruncation and so would call it ConvertToVector128ByteWithSaturation

Thank you. I will leave you a comment when I have made all required changes.

khushal1996 · 2024-10-24T14:30:20Z

@tannergooding Thanks for the review. About the nomenclature for Convert APIs, for something like // VCVTPS2IBS ymm1{k1}{z}, ymm2/m256/m32bcst {er}, should we use ConvertToVector128ByteWithSaturation or ConvertToVector128SByteInt16WithSaturation? Because the instruction description is something like -->

These instructions convert four, eight or sixteen packed single-precision floating-point values in the
source operand to four, eight or sixteen signed or unsigned byte integers in the destination operand.
The downconverted 8-bit result is written inplace at the lower 8-bit of the corresponding 32-bit element.
The upper 3 bytes are zeroed. VCVTPS2IBS converts single-precision floating point elements into signed byte integer elements.

Let me know what you think.

tannergooding · 2024-10-24T14:51:13Z

I'll need to think about it more.

It is important we document the behavior which is conversion to byte so that users understand what the API is doing.
It is then important we document the return type of Vector128<int> so that it doesn't cause issues with overload resolution, since you cannot overload by return type.

It's functionally doing a ConvertToVector128ByteWithSaturationAndWidenToVector128Int32, which is a very verbose name.

khushal1996 · 2024-10-24T15:47:13Z

True. I was thinking on similar lines ConvertToVector128ByteWithSaturationAndWidenToVector128Int32 but wanted to keep it a little shorter and also describe that it widens to int32. How about ConvertToVector128SByteWithSaturationWidenToInt32? alteast we can remove the vector128 after widen.

DeepakRajendrakumaran · 2024-10-24T23:19:08Z

I have updated the names and made the other changes. For the Widen ones, let me know how you want us to update those. The ones this might apply to are
Accumulated*DotProduct*

and
convert intrinsics where widening is happening

DeepakRajendrakumaran · 2024-10-30T16:20:51Z

I'll need to think about it more.

It is important we document the behavior which is conversion to byte so that users understand what the API is doing. It is then important we document the return type of Vector128<int> so that it doesn't cause issues with overload resolution, since you cannot overload by return type.

It's functionally doing a ConvertToVector128ByteWithSaturationAndWidenToVector128Int32, which is a very verbose name.

Hi Tanner - have you decided on how you want the 'widen' API's to be named?

tannergooding · 2024-10-30T16:36:55Z

I think we should default to the verbose name, which is the most consistent with our other APIs and the least problematic.

We'll likely discuss some of the alternatives in API review and it wouldn't hurt to have them listed.

Notably we have Vector128<int> ConvertToInt32(Vector128<float> value) in SSE2 (and similar for Int64/UInt32/UInt654 in other ISAs), so something like ConvertToByteWithSaturationAndWidenToInt32 might be a feasible shorter name that won't conflict, but I don't think we could get much shorter otherwise.

DeepakRajendrakumaran · 2024-10-31T00:44:29Z

I think we should default to the verbose name, which is the most consistent with our other APIs and the least problematic.

We'll likely discuss some of the alternatives in API review and it wouldn't hurt to have them listed.

Notably we have Vector128<int> ConvertToInt32(Vector128<float> value) in SSE2 (and similar for Int64/UInt32/UInt654 in other ISAs), so something like ConvertToByteWithSaturationAndWidenToInt32 might be a feasible shorter name that won't conflict, but I don't think we could get much shorter otherwise.

I've updated the names for these, Do you think it makes sense to have 'Widen' in the name for the accumulated dot product ones as well?

DeepakRajendrakumaran · 2024-11-12T23:01:25Z

I think we should default to the verbose name, which is the most consistent with our other APIs and the least problematic.
We'll likely discuss some of the alternatives in API review and it wouldn't hurt to have them listed.
Notably we have Vector128<int> ConvertToInt32(Vector128<float> value) in SSE2 (and similar for Int64/UInt32/UInt654 in other ISAs), so something like ConvertToByteWithSaturationAndWidenToInt32 might be a feasible shorter name that won't conflict, but I don't think we could get much shorter otherwise.

I've updated the names for these, Do you think it makes sense to have 'Widen' in the name for the accumulated dot product ones as well?

@tannergooding What are the next steps for this?

tannergooding · 2024-11-19T19:25:24Z

I've filtered out the VPDPB[SU,UIU,SS]D[,S] instructions from the initial review as the names aren't correct

namespace System.Runtime.Intrinsics.X86
{
    /// <summary>Provides access to X86 AVX10.1 hardware instructions via intrinsics</summary>
    [Intrinsic]
    [CLSCompliant(false)]
    public abstract class Avx10v2 : Avx10v1
    {
        // VPDPBSSD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedByteDotProduct(vector128<sbyte> left, Vector128<sbyte> right) => AccumulatedByteDotProduct(left, right, acc);

        // VPDPBSUD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedByteDotProduct(vector128<sbyte> left, Vector128<byte> right) => AccumulatedByteDotProduct(left, right, acc);

        // VPDPBUUD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedByteDotProduct(vector128<byte> left, Vector128<byte> right) => AccumulatedSignedByteDotProduct(left, right, acc);

        // VPDPBSSD ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedByteDotProduct(Vector256<sbyte> left, Vector256<sbyte> right) => AccumulatedByteDotProduct(left, right, acc);

        // VPDPBSUD ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedByteDotProduct(Vector256<sbyte> left, Vector256<byte> right) => AccumulatedSignedByteDotProduct(left, right, acc);

        // VPDPBUUD ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedByteDotProduct(Vector256<byte> left, Vector256<byte> right) => AccumulatedByteDotProduct(left, right, acc);

        // VPDPBSSDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedByteDotProductWithSaturation(vector128<sbyte> left, Vector128<sbyte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);

        // VPDPBSUDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedByteDotProductWithSaturation(vector128<sbyte> left, Vector128<byte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);

        // VPDPBUUDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedByteDotProductWithSaturation(vector128<byte> left, Vector128<byte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);

        // VPDPBSSDS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedByteDotProductWithSaturation(Vector256<sbyte> left, Vector256<sbyte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);

        // VPDPBSUDS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedByteDotProductWithSaturation(Vector256<sbyte> left, Vector256<byte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);

        // VPDPBUUDS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedByteDotProductWithSaturation(Vector256<byte> left, Vector256<byte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);

        // VPDPWSUD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedInt16DotProduct(vector128<short> left, Vector128<ushort> right) => AccumulatedInt16DotProduct(left, right, acc);

        // VPDPWUSD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedInt16DotProduct(vector128<ushort> left, Vector128<short> right) => AccumulatedInt16DotProduct(left, right, acc);

        // VPDPWUUD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedInt16DotProduct(vector128<ushort> left, Vector128<ushort> right) => AccumulatedInt16DotProduct(left, right, acc);

        // VPDPWSUD ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedInt16DotProduct(Vector256<short> left, Vector256<ushort> right) => AccumulatedInt16DotProduct(left, right, acc);

        // VPDPWUSD ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedInt16DotProduct(Vector256<ushort> left, Vector256<short> right) => AccumulatedInt16DotProduct(left, right, acc);

        // VPDPWUUD ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedInt16DotProduct(Vector256<ushort> left, Vector256<ushort> right) => AccumulatedInt16DotProduct(left, right, acc);

        // VPDPWSUDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedInt16DotProductWithSaturation(vector128<short> left, Vector128<ushort> right) => AccumulatedInt16DotProductWithSaturation(left, right, acc);

        // VPDPWUSDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedInt16DotProductWithSaturation(vector128<ushort> left, Vector128<short> right) => AccumulatedInt16DotProductWithSaturation(left, right, acc);

        // VPDPWUUDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
        public static Vector128<int> AccumulatedInt16DotProductWithSaturation(vector128<ushort> left, Vector128<ushort> right) => AccumulatedInt16DotProductWithSaturation(left, right, acc);

        // VPDPWSUDS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedInt16DotProductWithSaturation(Vector256<short> left, Vector256<ushort> right) => AccumulatedSaturatedSignedShortDotProduct(left, right, acc);

        // VPDPWUSDS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedInt16DotProductWithSaturation(Vector256<ushort> left, Vector256<short> right) => AccumulatedSaturatedSignedShortDotProduct(left, right, acc);

        // VPDPWUUDS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
        public static Vector256<int> AccumulatedInt16DotProductWithSaturation(Vector256<ushort> left, Vector256<ushort> right) => AccumulatedInt16DotProductWithSaturation(left, right, acc);

        [Intrinsic]
        public abstract class V512 : Avx10v1.V512
        {   
            // VPDPWSUD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
            public static Vector512<int> AccumulatedInt16DotProduct(Vector512<short> left, Vector512<ushort> right) => AccumulatedInt16DotProduct(left, right, acc);

            // VPDPWUSD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
            public static Vector512<int> AccumulatedInt16DotProduct(Vector512<ushort> left, Vector512<short> right) => AccumulatedInt16DotProduct(left, right, acc);

            // VPDPWUUD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
            public static Vector512<int> AccumulatedInt16DotProduct(Vector512<ushort> left, Vector512<ushort> right) => AccumulatedInt16DotProduct(left, right, acc);

            // VPDPWSUDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
            public static Vector512<int> AccumulatedInt16DotProductWithSaturation(Vector512<short> left, Vector512<short> right) => AccumulatedInt16DotProductWithSaturation(left, right, acc);

            // VPDPWUSDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
            public static Vector512<int> AccumulatedInt16DotProductWithSaturation(Vector512<short> left, Vector512<ushort> right) => AccumulatedInt16DotProductWithSaturation(left, right, acc);

            // VPDPWUUDS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
            public static Vector512<int> AccumulatedInt16DotProductWithSaturation(Vector512<ushort> left, Vector512<ushort> right) => AccumulatedInt16DotProductWithSaturation(left, right, acc);

            // VPDPBSSD zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst
            public static Vector512<int> AccumulatedByteDotProduct(Vector512<sbyte> left, Vector512<sbyte> right) => AccumulatedSByteDotProduct(left, right, acc);

            // VPDPBSUD zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst
            public static Vector512<int> AccumulatedByteDotProduct(Vector512<sbyte> left, Vector512<byte> right) => AccumulatedSByteDotProduct(left, right, acc);

            // VPDPBUUD zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst
            public static Vector512<int> AccumulatedByteDotProduct(Vector512<byte> left, Vector512<byte> right) => AccumulatedSByteDotProduct(left, right, acc);

            // VPDPBSSDS zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst
            public static Vector512<int> AccumulatedByteDotProductWithSaturation(Vector512<sbyte> left, Vector512<sbyte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);

            // VPDPBSUDS zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst
            public static Vector512<int> AccumulatedByteDotProductWithSaturation(Vector512<sbyte> left, Vector512<byte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);

            // VPDPBUUDS zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst
            public static Vector512<int> AccumulatedByteDotProductWithSaturation(Vector512<byte> left, Vector512<byte> right) => AccumulatedByteDotProductWithSaturation(left, right, acc);
        }
    }
}

bartonjs · 2024-11-19T20:01:46Z

Video

ConvertToByteWithTruncationSaturationAndWidenToUInt32 => ConvertToByteWithTruncatedSaturationAndWidenToUInt32
- Truncation => Truncated
Vector128<uint> ConvertToVector128UInt32(Vector128<uint> value) and similar, change to "ConvertScalarTo..."

namespace System.Runtime.Intrinsics.X86
{
    /// <summary>Provides access to X86 AVX10.1 hardware instructions via intrinsics</summary>
    [Intrinsic]
    [CLSCompliant(false)]
    public abstract class Avx10v2 : Avx10v1
    {
        internal Avx10v2() { }

        public static new bool IsSupported { get => IsSupported; }

        // VMINMAXPD xmm1{k1}{z}, xmm2, xmm3/m128/m64bcst, imm8
        public static Vector128<double> MinMax(Vector128<double> left, Vector128<double> right, [ConstantExpected] byte control) => MinMax(left, right, mode);
        
        // VMINMAXPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {sae}, imm8
        public static Vector256<double> MinMax(Vector256<double> left, Vector256<double> right, [ConstantExpected] byte control) => MinMax(left, right, mode);
        
        // VMINMAXPS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst, imm8
        public static Vector128<float> MinMax(Vector128<float> left, Vector128<float> right, [ConstantExpected] byte control) => MinMax(left, right, mode);
        
        // VMINMAXPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {sae}, imm8
        public static Vector256<float> MinMax(Vector256<float> left, Vector256<float> right, [ConstantExpected] byte control) => MinMax(left, right, mode);
        
        // VMINMAXSD xmm1{k1}{z}, xmm2, xmm3/m64 {sae}, imm8
        public static Vector128<double> MinMaxScalar(Vector128<double> left, Vector128<double> right, [ConstantExpected] byte control) => MinMaxScalar(left, right, mode);

        // VMINMAXSS xmm1{k1}{z}, xmm2, xmm3/m32 {sae}, imm8
        public static Vector128<float> MinMaxScalar(Vector128<float> left, Vector128<float> right, [ConstantExpected] byte control) => MinMaxScalar(left, right, mode);

        // VADDPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
        public static Vector256<double> Add(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Add(left, right, mode);

        // VADDPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
        public static Vector256<float> Add(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Add(left, right, mode);
                
        // VDIVPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
        public static Vector256<double> Divide(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Divide(left, right, mode);
        
        // VDIVPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
        public static Vector256<float> Divide(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Divide(left, right, mode);
        
        // VCVTPS2IBS xmm1{k1}{z}, xmm2/m128/m32bcst
        public static Vector128<int> ConvertToByteWithSaturationAndWidenToInt32(Vector128<float> value) => ConvertToByteWithSaturationAndWidenToInt32(value);
                
        // VCVTPS2IBS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<int> ConvertToByteWithSaturationAndWidenToInt32(Vector256<float> value) => ConvertToByteWithSaturationAndWidenToInt32(value);
        
        // VCVTPS2IBS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<int> ConvertToByteWithSaturationAndWidenToInt32(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToByteWithSaturationAndWidenToInt32(value, mode);
        
        // VCVTPS2IUBS xmm1{k1}{z}, xmm2/m128/m32bcst
        public static Vector128<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector128<float> value) => ConvertToByteWithSaturationAndWidenToUInt32(value);
                
        // VCVTPS2IUBS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector256<float> value) => ConvertToByteWithSaturationAndWidenToUInt32(value);
        
        // VCVTPS2IUBS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToByteWithSaturationAndWidenToUInt32(value, mode);

        // VCVTTPS2IBS xmm1{k1}{z}, xmm2/m128/m32bcst
        public static Vector128<int> ConvertToByteWithTruncatedSaturationAndWidenToInt32(Vector128<float> value) => ConvertToByteWithTruncationSaturationAndWidenToInt32(value);
                
        // VCVTTPS2IBS ymm1{k1}{z}, ymm2/m256/m32bcst {sae}
        public static Vector256<int> ConvertToByteWithTruncatedSaturationAndWidenToInt32(Vector256<float> value) => ConvertToVector256SByteWithTruncationSaturation(value);
                
        // VCVTTPS2IUBS xmm1{k1}{z}, xmm2/m128/m32bcst
        public static Vector128<uint> ConvertToByteWithTruncatedSaturationAndWidenToUInt32(Vector128<float> value) => ConvertToByteWithTruncatedSaturationAndWidenToUInt32(value);
                
        // VCVTTPS2IUBS ymm1{k1}{z}, ymm2/m256/m32bcst {sae}
        public static Vector256<uint> ConvertToByteWithTruncatedSaturationAndWidenToUInt32(Vector256<float> value) => ConvertToByteWithTruncatedSaturationAndWidenToUInt32(value);
        
        // VMOVD xmm1, xmm2/m32
        public static Vector128<uint> ConvertScalarToVector128UInt32(Vector128<uint> value) => ConvertScalarToVector128UInt32(value);
        
        // VMOVW xmm1, xmm2/m16
        public static Vector128<ushort> ConvertScalarToVector128UInt16(Vector128<ushort> value) => ConvertScalarToVector128UInt16(value);
        
        //The below instructions are those where 
        //embedded rouding support have been added 
        //to the existing API

        // VCVTDQ2PS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<float> ConvertToVector256Single(Vector256<int> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Single(value, mode);
        
        // VCVTPD2DQ xmm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector128<int> ConvertToVector128Int32(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128Int32(value, mode);

        // VCVTPD2PS xmm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector128<float> ConvertToVector128Single(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128Single(value, mode);

        // VCVTPD2QQ ymm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector256<long> ConvertToVector256Int64(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Int64(value, mode);

        // VCVTPD2UDQ xmm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector128<uint> ConvertToVector128UInt32(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128UInt32(value, mode);

        // VCVTPD2UQQ ymm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector256<ulong> ConvertToVector256UInt64(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256UInt64(value, mode);

        // VCVTPS2DQ ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<int> ConvertToVector256Int32(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Int32(value, mode);

        // VCVTPS2QQ ymm1{k1}{z}, xmm2/m128/m32bcst {er}
        public static Vector256<long> ConvertToVector256Int64(Vector128<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Int64(value, mode);

        // VCVTPS2UDQ ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<uint> ConvertToVector256UInt32(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256UInt32(value, mode);

        // VCVTPS2UQQ ymm1{k1}{z}, xmm2/m128/m32bcst {er}
        public static Vector256<ulong> ConvertToVector256UInt64(Vector128<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256UInt64(value, mode);

        // VCVTQQ2PS xmm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector128<float> ConvertToVector128Single(Vector256<ulong> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128Single(value, mode);
        
        // VCVTQQ2PD ymm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector256<double> ConvertToVector256Double(Vector256<ulong> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Double(value, mode);
        
        // VCVTUDQ2PS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<float> ConvertToVector256Single(Vector256<uint> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Single(value, mode);
        
        // VCVTUQQ2PS xmm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector128<float> ConvertToVector128Single(Vector256<long> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector128Single(value, mode);
        
        // VCVTUQQ2PD ymm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector256<double> ConvertToVector256Double(Vector256<long> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToVector256Double(value, mode);
        
        // VMULPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
        public static Vector256<double> Multiply(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Multiply(left, right, mode);
        
        // VMULPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
        public static Vector256<float> Multiply(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Multiply(left, right, mode);
        
        // VSCALEFPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
        public static Vector256<double> Scale(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Scale(left, right, mode);
        
        // VSCALEFPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
        public static Vector256<float> Scale(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Scale(left, right, mode);
        
        // VSQRTPD ymm1{k1}{z}, ymm2/m256/m64bcst {er}
        public static Vector256<double> Sqrt(Vector256<double> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Sqrt(value, mode);
        
        // VSQRTPS ymm1{k1}{z}, ymm2/m256/m32bcst {er}
        public static Vector256<float> Sqrt(Vector256<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Sqrt(value, mode);
        
        // VSUBPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst {er}
        public static Vector256<double> Subtract(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Subtract(left, right, mode);
        
        // VSUBPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst {er}
        public static Vector256<float> Subtract(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => Subtract(left, right, mode);
        
        [Intrinsic]
        public new abstract class X64 : Avx10v1.X64
        {
            internal X64() { }

            public static new bool IsSupported { get => IsSupported; }
        }

        [Intrinsic]
        public abstract class V512 : Avx10v1.V512
        {
            internal V512() { }

            public static new bool IsSupported { get => IsSupported; }
    
            // VMINMAXPD zmm1{k1}{z}, zmm2, zmm3/m512/m64bcst {sae}, imm8
            public static Vector512<double> MinMax(Vector512<double> left, Vector512<double> right, [ConstantExpected] byte control) => MinMax(left, right, mode);
            
            // VMINMAXPS zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst {sae}, imm8
            public static Vector512<float> MinMax(Vector512<float> left, Vector512<float> right, [ConstantExpected] byte control) => MinMax(left, right, mode);
            
            // VCVTPS2IBS zmm1{k1}{z}, zmm2/m512/m32bcst {er}
            public static Vector512<int> ConvertToByteWithSaturationAndWidenToInt32(Vector512<float> value) => ConvertToByteWithSaturationAndWidenToInt32(value);
            
            // VCVTPS2IBS zmm1{k1}{z}, zmm2/m512/m32bcst {er}
            public static Vector512<int> ConvertToByteWithSaturationAndWidenToInt32(Vector512<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToByteWithSaturationAndWidenToInt32(value, mode);
            
            // VCVTPS2IUBS zmm1{k1}{z}, zmm2/m512/m32bcst {er}
            public static Vector512<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector512<float> value) => ConvertToByteWithSaturationAndWidenToUInt32(value);
            
            // VCVTPS2IUBS zmm1{k1}{z}, zmm2/m512/m32bcst {er}
            public static Vector512<uint> ConvertToByteWithSaturationAndWidenToUInt32(Vector512<float> value, [ConstantExpected(Max = FloatRoundingMode.ToZero)] FloatRoundingMode mode) => ConvertToByteWithSaturationAndWidenToUInt32(value, mode);

            // VCVTTPS2IUBS zmm1{k1}{z}, zmm2/m512/m32bcst {sae}
            public static Vector512<int> ConvertToByteWithTruncatedSaturationAndWidenToInt32(Vector512<float> value) => ConvertToByteWithTruncatedSaturationAndWidenToInt32(value);
                        
            // VCVTTPS2IUBS zmm1{k1}{z}, zmm2/m512/m32bcst {sae}
            public static Vector512<uint> ConvertToByteWithTruncatedSaturationAndWidenToUInt32(Vector512<float> value) => ConvertToByteWithTruncatedSaturationAndWidenToUInt32(value);
            
            // This is a 512 extension of previously existing 128/26 inrinsic
            // VMPSADBW zmm1{k1}{z}, zmm2, zmm3/m512, imm8
            public static Vector512<ushort> MultipleSumAbsoluteDifferences(Vector512<byte> left, Vector512<byte> right, [ConstantExpected] byte mask) => MultipleSumAbsoluteDifferences(left, right, mask);

            [Intrinsic]
            public new abstract class X64 : Avx10v1.V512.X64
            {
                internal X64() { }

                public static new bool IsSupported { get => IsSupported; }
            }
        }
    }
}

khushal1996 · 2024-11-25T19:04:30Z

I will also like to discuss the following API

        // VMOVD xmm1, xmm2/m32
        public static Vector128<uint> ConvertToVector128UInt32(Vector128<uint> value) => ConvertToVector128UInt32(value);

        // VMOVW xmm1, xmm2/m16
        public static Vector128<ushort> ConvertToVector128UInt16(Vector128<ushort> value) => ConvertToVector128UInt16(value);

and would like to change them to

        public static unsafe void StoreLowDWord(byte* address, Vector128<uint> source) => StoreLowDWord(address, source);
        public static unsafe void StoreLowWord(byte* address, Vector128<ushort> source) => StoreLowWord(address, source);
        public static unsafe Vector128<uint> RetrieveLowDWord(byte* address) => RetrieveLowDWord(address);
        public static unsafe Vector128<ushort> RetrieveLowWord(byte* address) => RetrieveLowWord(address);

tannergooding · 2024-11-25T19:33:17Z

@khushal1996 the proposed signatures don't match the .NET naming conventions (we'd still use UInt32) but also don't cover all the functionality the underlying API supports

In particular we already expose the existing movd/movq variants that deal with general-purpose to/from SIMD and so which can already work with loading from or storing to memory. These notably have a signature similar to static Vector128<int> ConvertScalarToVector128Int32(int value)

Likewise while we expose some instructions like movss, where the managed signature is static Vector128<float> MoveScalar(Vector128<float> upper, Vector128<float> value), these preserve the upper bits.

The new movd/movw variants are most similar to the existing movq variant which deals with SIMD to SIMD and zero the upper bits. The latter is exposed in two ways today: MoveScalar for SIMD to/from SIMD and ConvertToVector128UInt64 which is general-purpose to/from SIMD. So it might be goodness for these ones to similarly be MoveScalar rather than ConvertToVector128*

khushal1996 · 2024-11-25T22:48:42Z

Thanks @tannergooding
To conclude, this is what you are proposing

public static Vector128<uint>MoveScalarUInt32(Vector128<uint>)

tannergooding · 2024-11-26T17:39:53Z

In this case, since its a move and no conversions are possible, it'd just be 2-4 MoveScalar overloads (depending on if we only want unsigned or also signed overloads).

khushal1996 · 2024-12-09T18:07:03Z

Thanks @tannergooding

Concluding this discussion with addition of following APIs

public static Vector128<uint>MoveScalar(Vector128<uint>)
public static Vector128<int>MoveScalar(Vector128<int>)
public static Vector128<ushort>MoveScalar(Vector128<ushort>)
public static Vector128<short>MoveScalar(Vector128<short>)

DeepakRajendrakumaran · 2024-12-11T19:14:01Z

Thanks @tannergooding

Concluding this discussion with addition of following APIs

public static Vector128<uint>MoveScalar(Vector128<uint>)
public static Vector128<int>MoveScalar(Vector128<int>)
public static Vector128<ushort>MoveScalar(Vector128<ushort>)
public static Vector128<short>MoveScalar(Vector128<short>)

@tannergooding Assuming we are planning on this, do we add this to the proposal somewhere now that the API name has changes from the original approved proposal?

tannergooding · 2024-12-11T19:27:58Z

We'd need to extract them to their own proposal. API review is done for the year so we should get to them sometime in January. We shouldn't be blocked on doing any work around the other approved APIs in the meantime, however.

khushal1996 · 2024-12-12T22:28:05Z

I think we should default to the verbose name, which is the most consistent with our other APIs and the least problematic.

We'll likely discuss some of the alternatives in API review and it wouldn't hurt to have them listed.

Notably we have Vector128<int> ConvertToInt32(Vector128<float> value) in SSE2 (and similar for Int64/UInt32/UInt654 in other ISAs), so something like ConvertToByteWithSaturationAndWidenToInt32 might be a feasible shorter name that won't conflict, but I don't think we could get much shorter otherwise.

Since the APIs listed in description and #109083 (comment) are different from what we agreed upon, just to confirm, for APIs for vcvtps2ibs, vcvtps2iubs, vcvttps2ibs, vcvttps2iubs; we decided to go with the following names respectively

ConvertToSByteWithSaturationAndWidenToInt32
ConvertToByteWithSaturationAndWidenToInt32
ConvertToSByteWithTruncationSaturationAndWidenToInt32
ConvertToByteWithTruncationSaturationAndWidenToInt32

Let me know if this is correct

tannergooding · 2024-12-12T23:00:33Z

They should be matching the naming, particularly the first two.

The latter two match the general format but were changed from WithTruncationSaturation to instead be WithTruncatedSaturation which API review felt relayed the concept better.

This should match what is under the approving comment here: #109083 (comment)

khushal1996 · 2024-12-12T23:20:48Z

They should be matching the naming, particularly the first two.

The latter two match the general format but were changed from WithTruncationSaturation to instead be WithTruncatedSaturation which API review felt relayed the concept better.

This should match what is under the approving comment here: #109083 (comment)

I think there is a mismatch between what the approving comment says and what actually we were discussing here #109083 (comment). Our discussion had no conclusion and the approving comment went in the wrong direction because of the original proposal.

There is no such thing as ConvertToByteWithSaturationAndWidenToUInt32. The result is always Int32 no matter what. The conversion is done to signed and unsigned byte and result is always zero extended integer.

tannergooding · 2024-12-13T03:25:17Z

The result is always Int32 no matter what

This is a bit of a nomenclature thing. The result of a byte being widened to Int32 and UInt32 is identical both bitwise and in the value represented after the widening.

It is only the result of an sbyte being widened where they differ as -1 becomes -1 as an Int32 and 4294967295 as a UInt32.

The conversion is done to signed and unsigned byte and result is always zero extended integer.

The spec gives the following:

The downconverted 8-bit result is written inplace at the lower 8-bit of the corresponding 32-bit element.
The upper 3bytes are zeroed.

VCVTPS2IBS converts single-precision floating point elements into signed byte integer elements. When a
conversion is inexact, floating-point precision exception is raised and the value returned is rounded
according to the rounding control bits in the MXCSR register or the embedded rounding control bits. If a
converted result cannot be represented in the destination format, the floating-point invalid exception is
raised, and if this exception is masked then: If value is too big, the INT_MAX value (2^(w-1)-1, where w
represents the number of bits in the destination format) is returned. If value is too small, the INT_MIN
value (2^(w-1)) is returned. For NaN, (0) is returned.

VCVTPS2IUBS converts single-precision floating point elements into un-signed byte integer elements.
Whenaconversion is inexact, floating-point precision exception is raised and the value returned is
rounded according to the rounding control bits in the MXCSR register or the embedded rounding control
bits. If a converted result cannot be represented in the destination format, the floating-point invalid
exception is raised, and if this exception is masked then: If value is too big, the UINT_MAX value (2^w-1,
where wrepresents the number of bits in the destination format) is returned. If value is too small, the
UINT_MIN value (0) is returned. For NaN, (0) is returned.

The nuance is then the "upper 3 bytes are zeroed" portion which means that ConvertToSByteWithSaturationAndWidenToInt32 is "incorrect" because we aren't widening. Instead it should be ConvertToSByteWithSaturationAndZeroExtendToInt32 which means that -1 becomes 255 as an Int32 or UInt32, it doesn't become -1 as some might expect. You therefore have to extract every 4th byte and still manually upcast to int to get a correct result, rather than simply extracting the whole 32-bit integer like you can with ConvertToByteWithSaturationAndWidenTo...

khushal1996 · 2024-12-13T11:44:16Z

The nuance is then the "upper 3 bytes are zeroed" portion which means that ConvertToSByteWithSaturationAndWidenToInt32 is "incorrect" because we aren't widening. Instead it should be ConvertToSByteWithSaturationAndZeroExtendToInt32

True. I will change the APIs to the following

ConvertToSByteWithSaturationAndZeroExtendToInt32
ConvertToByteWithSaturationAndZeroExtendToInt32
ConvertToSByteWithTruncatedSaturationAndZeroExtendToInt32
ConvertToByteWithTruncatedSaturationAndZeroExtendToInt32

tannergooding · 2025-01-31T18:54:44Z

@khushal1996 this proposal was fully handled in #111209 and so can be closed now, correct?

khushal1996 · 2025-02-03T08:18:03Z

@tannergooding yes, this proposal can be considered fully handled and we will have 2 proposals for the remaining APIs

khushal1996 · 2025-02-19T19:35:17Z

We'd need to extract them to their own proposal. API review is done for the year so we should get to them sometime in January. We shouldn't be blocked on doing any work around the other approved APIs in the meantime, however.

Hi @tannergooding . I was going through the pending APIs and we are yet to add these 4 APIs for AVX10.2. As discussed, I will open a new issue to discuss them and we can take it forward. Let me know if there are any concerns.

tannergooding · 2025-02-19T20:03:44Z

No concerns, thanks for following up!

DeepakRajendrakumaran added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Oct 21, 2024

dotnet-issue-labeler bot added the area-System.Runtime.Intrinsics label Oct 21, 2024

dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Oct 21, 2024

BruceForstall added the avx10 Related to the AVX10 architecture label Oct 23, 2024

jeffhandley added this to the 10.0.0 milestone Nov 3, 2024

jeffhandley removed the untriaged New issue has not been triaged by the area owner label Nov 3, 2024

tannergooding added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Nov 18, 2024

khushal1996 mentioned this issue Nov 19, 2024

Adding CPUID for AVX10.2 #109302

Merged

bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Nov 19, 2024

DeepakRajendrakumaran mentioned this issue Nov 20, 2024

[API Proposal]: Add dot product intrinsics to AVX10v2 API #110032

Open

BruceForstall mentioned this issue Dec 12, 2024

Intel architecture improvements for .NET 10 #108869

Open

39 tasks

khushal1996 mentioned this issue Jan 27, 2025

Add support for AVX10.2, Add AVX10.2 API surface and template tests #111209

Merged

tannergooding closed this as completed Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API Proposal]: Add AVX10v2 API to add Avx10.2 support #109083

[API Proposal]: Add AVX10v2 API to add Avx10.2 support #109083

DeepakRajendrakumaran commented Oct 21, 2024 •

edited by tannergooding

Loading

dotnet-policy-service bot commented Oct 21, 2024

DeepakRajendrakumaran commented Oct 23, 2024

tannergooding commented Oct 23, 2024

DeepakRajendrakumaran commented Oct 24, 2024

khushal1996 commented Oct 24, 2024

tannergooding commented Oct 24, 2024

khushal1996 commented Oct 24, 2024

DeepakRajendrakumaran commented Oct 24, 2024 •

edited

Loading

DeepakRajendrakumaran commented Oct 30, 2024

tannergooding commented Oct 30, 2024

DeepakRajendrakumaran commented Oct 31, 2024

DeepakRajendrakumaran commented Nov 12, 2024

tannergooding commented Nov 19, 2024 •

edited

Loading

bartonjs commented Nov 19, 2024 •

edited by dotnet-api-review bot

Loading

khushal1996 commented Nov 25, 2024

tannergooding commented Nov 25, 2024

khushal1996 commented Nov 25, 2024

tannergooding commented Nov 26, 2024

khushal1996 commented Dec 9, 2024

DeepakRajendrakumaran commented Dec 11, 2024

tannergooding commented Dec 11, 2024

khushal1996 commented Dec 12, 2024 •

edited

Loading

tannergooding commented Dec 12, 2024

khushal1996 commented Dec 12, 2024 •

edited

Loading

tannergooding commented Dec 13, 2024 •

edited

Loading

khushal1996 commented Dec 13, 2024

tannergooding commented Jan 31, 2025 •

edited

Loading

khushal1996 commented Feb 3, 2025

khushal1996 commented Feb 19, 2025 •

edited

Loading

tannergooding commented Feb 19, 2025

[API Proposal]: Add AVX10v2 API to add Avx10.2 support #109083

[API Proposal]: Add AVX10v2 API to add Avx10.2 support #109083

Comments

DeepakRajendrakumaran commented Oct 21, 2024 • edited by tannergooding Loading

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

dotnet-policy-service bot commented Oct 21, 2024

DeepakRajendrakumaran commented Oct 23, 2024

tannergooding commented Oct 23, 2024

DeepakRajendrakumaran commented Oct 24, 2024

khushal1996 commented Oct 24, 2024

tannergooding commented Oct 24, 2024

khushal1996 commented Oct 24, 2024

DeepakRajendrakumaran commented Oct 24, 2024 • edited Loading

DeepakRajendrakumaran commented Oct 30, 2024

tannergooding commented Oct 30, 2024

DeepakRajendrakumaran commented Oct 31, 2024

DeepakRajendrakumaran commented Nov 12, 2024

tannergooding commented Nov 19, 2024 • edited Loading

bartonjs commented Nov 19, 2024 • edited by dotnet-api-review bot Loading

khushal1996 commented Nov 25, 2024

tannergooding commented Nov 25, 2024

khushal1996 commented Nov 25, 2024

tannergooding commented Nov 26, 2024

khushal1996 commented Dec 9, 2024

DeepakRajendrakumaran commented Dec 11, 2024

tannergooding commented Dec 11, 2024

khushal1996 commented Dec 12, 2024 • edited Loading

tannergooding commented Dec 12, 2024

khushal1996 commented Dec 12, 2024 • edited Loading

tannergooding commented Dec 13, 2024 • edited Loading

khushal1996 commented Dec 13, 2024

tannergooding commented Jan 31, 2025 • edited Loading

khushal1996 commented Feb 3, 2025

khushal1996 commented Feb 19, 2025 • edited Loading

tannergooding commented Feb 19, 2025

DeepakRajendrakumaran commented Oct 21, 2024 •

edited by tannergooding

Loading

DeepakRajendrakumaran commented Oct 24, 2024 •

edited

Loading

tannergooding commented Nov 19, 2024 •

edited

Loading

bartonjs commented Nov 19, 2024 •

edited by dotnet-api-review bot

Loading

khushal1996 commented Dec 12, 2024 •

edited

Loading

khushal1996 commented Dec 12, 2024 •

edited

Loading

tannergooding commented Dec 13, 2024 •

edited

Loading

tannergooding commented Jan 31, 2025 •

edited

Loading

khushal1996 commented Feb 19, 2025 •

edited

Loading