Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not address expose a struct having 1 slot #40957

Merged
merged 1 commit into from
Sep 8, 2020

Conversation

kunalspathak
Copy link
Member

@kunalspathak kunalspathak commented Aug 17, 2020

I noticed the following inefficiency where we always pushing the SIMD_8 or SIMD_16 on stack. After talking to @CarolEidt , it turned out to be a simple fix which gives good gains.

Crossgen CodeSize Diffs for System.Private.CoreLib.dll for  protononjit.dll
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -17388 (-0.34% of base)
    diff is an improvement.
Top file improvements (bytes):
      -17388 : System.Private.CoreLib.dasm (-0.34% of base)
1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.
Top method regressions (bytes):
          12 ( 6.00% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector64`1[Byte][System.Byte]:Equals(System.Runtime.Intrinsics.Vector64`1[Byte]):bool:this
          12 ( 5.77% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector64`1[Double][System.Double]:Equals(System.Runtime.Intrinsics.Vector64`1[Double]):bool:this
          12 ( 6.25% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector64`1[Int64][System.Int64]:Equals(System.Runtime.Intrinsics.Vector64`1[Int64]):bool:this
          12 ( 6.25% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector64`1[Int32][System.Int32]:Equals(System.Runtime.Intrinsics.Vector64`1[Int32]):bool:this
          12 ( 6.00% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector64`1[Int16][System.Int16]:Equals(System.Runtime.Intrinsics.Vector64`1[Int16]):bool:this
          12 ( 5.77% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector64`1[Single][System.Single]:Equals(System.Runtime.Intrinsics.Vector64`1[Single]):bool:this
          12 ( 6.00% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector64`1[SByte][System.SByte]:Equals(System.Runtime.Intrinsics.Vector64`1[SByte]):bool:this
          12 ( 6.25% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector64`1[UInt64][System.UInt64]:Equals(System.Runtime.Intrinsics.Vector64`1[UInt64]):bool:this
          12 ( 6.25% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector64`1[UInt32][System.UInt32]:Equals(System.Runtime.Intrinsics.Vector64`1[UInt32]):bool:this
          12 ( 6.00% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector64`1[UInt16][System.UInt16]:Equals(System.Runtime.Intrinsics.Vector64`1[UInt16]):bool:this
Top method improvements (bytes):
         -32 (-22.86% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:Create(System.Runtime.Intrinsics.Vector128`1[Byte],System.Runtime.Intrinsics.Vector128`1[Byte]):System.Runtime.Intrinsics.Vector256`1[Byte]
         -32 (-22.86% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:Create(System.Runtime.Intrinsics.Vector128`1[Double],System.Runtime.Intrinsics.Vector128`1[Double]):System.Runtime.Intrinsics.Vector256`1[Double]
         -32 (-22.86% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:Create(System.Runtime.Intrinsics.Vector128`1[Int16],System.Runtime.Intrinsics.Vector128`1[Int16]):System.Runtime.Intrinsics.Vector256`1[Int16]
         -32 (-22.86% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:Create(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector128`1[Int32]):System.Runtime.Intrinsics.Vector256`1[Int32]
         -32 (-22.86% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:Create(System.Runtime.Intrinsics.Vector128`1[Int64],System.Runtime.Intrinsics.Vector128`1[Int64]):System.Runtime.Intrinsics.Vector256`1[Int64]
         -32 (-22.86% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:Create(System.Runtime.Intrinsics.Vector128`1[SByte],System.Runtime.Intrinsics.Vector128`1[SByte]):System.Runtime.Intrinsics.Vector256`1[SByte]
         -32 (-22.86% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:Create(System.Runtime.Intrinsics.Vector128`1[Single],System.Runtime.Intrinsics.Vector128`1[Single]):System.Runtime.Intrinsics.Vector256`1[Single]
         -32 (-22.86% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:Create(System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector128`1[UInt16]):System.Runtime.Intrinsics.Vector256`1[UInt16]
         -32 (-22.86% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:Create(System.Runtime.Intrinsics.Vector128`1[UInt32],System.Runtime.Intrinsics.Vector128`1[UInt32]):System.Runtime.Intrinsics.Vector256`1[UInt32]
         -32 (-22.86% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:Create(System.Runtime.Intrinsics.Vector128`1[UInt64],System.Runtime.Intrinsics.Vector128`1[UInt64]):System.Runtime.Intrinsics.Vector256`1[UInt64]
         -24 (-42.86% of base) : System.Private.CoreLib.dasm - System.Numerics.VectorMath:ConditionalSelectBitwise(System.Runtime.Intrinsics.Vector128`1[Single],System.Runtime.Intrinsics.Vector128`1[Single],System.Runtime.Intrinsics.Vector128`1[Single]):System.Runtime.Intrinsics.Vector128`1[Single]
         -24 (-42.86% of base) : System.Private.CoreLib.dasm - System.Numerics.VectorMath:ConditionalSelectBitwise(System.Runtime.Intrinsics.Vector128`1[Double],System.Runtime.Intrinsics.Vector128`1[Double],System.Runtime.Intrinsics.Vector128`1[Double]):System.Runtime.Intrinsics.Vector128`1[Double]
         -24 (-26.09% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[UInt32],System.Runtime.Intrinsics.Vector64`1[UInt32]):System.Runtime.Intrinsics.Vector128`1[UInt32]
         -24 (-26.09% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[UInt64],System.Runtime.Intrinsics.Vector64`1[UInt64]):System.Runtime.Intrinsics.Vector128`1[UInt64]
         -24 (-26.09% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector64`1[UInt16]):System.Runtime.Intrinsics.Vector128`1[UInt16]
         -24 (-26.09% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector128`1[Single]
         -24 (-26.09% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[SByte],System.Runtime.Intrinsics.Vector64`1[SByte]):System.Runtime.Intrinsics.Vector128`1[SByte]
         -24 (-26.09% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector128`1[Double]
         -24 (-26.09% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector128`1[Byte]
         -24 (-26.09% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int32]):System.Runtime.Intrinsics.Vector128`1[Int32]
Top method regressions (percentages):
          12 ( 6.25% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector64`1[Int64][System.Int64]:Equals(System.Runtime.Intrinsics.Vector64`1[Int64]):bool:this
          12 ( 6.25% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector64`1[Int32][System.Int32]:Equals(System.Runtime.Intrinsics.Vector64`1[Int32]):bool:this
          12 ( 6.25% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector64`1[UInt64][System.UInt64]:Equals(System.Runtime.Intrinsics.Vector64`1[UInt64]):bool:this
          12 ( 6.25% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector64`1[UInt32][System.UInt32]:Equals(System.Runtime.Intrinsics.Vector64`1[UInt32]):bool:this
          12 ( 6.00% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector64`1[Byte][System.Byte]:Equals(System.Runtime.Intrinsics.Vector64`1[Byte]):bool:this
          12 ( 6.00% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector64`1[Int16][System.Int16]:Equals(System.Runtime.Intrinsics.Vector64`1[Int16]):bool:this
          12 ( 6.00% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector64`1[SByte][System.SByte]:Equals(System.Runtime.Intrinsics.Vector64`1[SByte]):bool:this
          12 ( 6.00% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector64`1[UInt16][System.UInt16]:Equals(System.Runtime.Intrinsics.Vector64`1[UInt16]):bool:this
          12 ( 5.77% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector64`1[Double][System.Double]:Equals(System.Runtime.Intrinsics.Vector64`1[Double]):bool:this
          12 ( 5.77% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector64`1[Single][System.Single]:Equals(System.Runtime.Intrinsics.Vector64`1[Single]):bool:this
Top method improvements (percentages):
         -24 (-42.86% of base) : System.Private.CoreLib.dasm - System.Numerics.VectorMath:ConditionalSelectBitwise(System.Runtime.Intrinsics.Vector128`1[Single],System.Runtime.Intrinsics.Vector128`1[Single],System.Runtime.Intrinsics.Vector128`1[Single]):System.Runtime.Intrinsics.Vector128`1[Single]
         -24 (-42.86% of base) : System.Private.CoreLib.dasm - System.Numerics.VectorMath:ConditionalSelectBitwise(System.Runtime.Intrinsics.Vector128`1[Double],System.Runtime.Intrinsics.Vector128`1[Double],System.Runtime.Intrinsics.Vector128`1[Double]):System.Runtime.Intrinsics.Vector128`1[Double]
         -16 (-28.57% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|40_0(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector128`1[Byte]
         -16 (-28.57% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|41_0(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector128`1[Double]
         -16 (-28.57% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|42_0(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector128`1[Int16]
         -16 (-28.57% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|43_0(System.Runtime.Intrinsics.Vector64`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int32]):System.Runtime.Intrinsics.Vector128`1[Int32]
         -16 (-28.57% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|44_0(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector128`1[Int64]
         -16 (-28.57% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|45_0(System.Runtime.Intrinsics.Vector64`1[SByte],System.Runtime.Intrinsics.Vector64`1[SByte]):System.Runtime.Intrinsics.Vector128`1[SByte]
         -16 (-28.57% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|46_0(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector128`1[Single]
         -16 (-28.57% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|47_0(System.Runtime.Intrinsics.Vector64`1[UInt16],System.Runtime.Intrinsics.Vector64`1[UInt16]):System.Runtime.Intrinsics.Vector128`1[UInt16]
         -16 (-28.57% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|48_0(System.Runtime.Intrinsics.Vector64`1[UInt32],System.Runtime.Intrinsics.Vector64`1[UInt32]):System.Runtime.Intrinsics.Vector128`1[UInt32]
         -16 (-28.57% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|49_0(System.Runtime.Intrinsics.Vector64`1[UInt64],System.Runtime.Intrinsics.Vector64`1[UInt64]):System.Runtime.Intrinsics.Vector128`1[UInt64]
         -24 (-26.09% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[UInt32],System.Runtime.Intrinsics.Vector64`1[UInt32]):System.Runtime.Intrinsics.Vector128`1[UInt32]
         -24 (-26.09% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[UInt64],System.Runtime.Intrinsics.Vector64`1[UInt64]):System.Runtime.Intrinsics.Vector128`1[UInt64]
         -24 (-26.09% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector64`1[UInt16]):System.Runtime.Intrinsics.Vector128`1[UInt16]
         -24 (-26.09% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector128`1[Single]
         -24 (-26.09% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[SByte],System.Runtime.Intrinsics.Vector64`1[SByte]):System.Runtime.Intrinsics.Vector128`1[SByte]
         -24 (-26.09% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector128`1[Double]
         -24 (-26.09% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector128`1[Byte]
         -24 (-26.09% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int32]):System.Runtime.Intrinsics.Vector128`1[Int32]
2284 total methods with Code Size differences (2274 improved, 10 regressed), 25063 unchanged.

Below is the sample diff of TryFindFirstMatchedLane.

image

The regression I am seeing inside System.Runtime.Intrinsics.Vector641[Byte][System.Byte]:Equals(System.Runtime.Intrinsics.Vector641[Byte]):bool:this is because of following diffs. I think we might be able to improve it, but I haven't done much investigation.

image

I haven't done much investigation yet. Probably I will sync up with @CarolEidt .

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 17, 2020
@kunalspathak
Copy link
Member Author

Failure is related to #40885.

@kunalspathak kunalspathak marked this pull request as ready for review August 18, 2020 23:39
@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib

@CarolEidt
Copy link
Contributor

The regression you show is a size regression, but an improvement in PerfScore because we use a callee-save register and replace a load in the loop with a register move.

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kunalspathak
Copy link
Member Author

@BruceForstall , @AndyAyersMS - Should this go in .NET 5 or wait until we snap?

@AndyAyersMS
Copy link
Member

I suspect this will not clear the current bar for .Net 5 changes.

@BruceForstall
Copy link
Member

Agreed. It's too late for code quality improvements; we should only consider important bug fixes for 5.0.

@kunalspathak
Copy link
Member Author

The regression you show is a size regression, but an improvement in PerfScore because we use a callee-save register and replace a load in the loop with a register move.

@CarolEidt , I didn't realize that I can diff PerfScore. Here it is now:

Crossgen PerfScore Diffs for System.Private.CoreLib.dll for  protononjit.dll
Summary of Perf Score diffs:
(Lower is better)
Total PerfScoreUnits of diff: -6246.80 (-0.01% of base)
    diff is an improvement.
Top file improvements (PerfScoreUnits):
    -6246.80 : System.Private.CoreLib.dasm (-0.01% of base)
1 total files with Perf Score differences (1 improved, 0 regressed), 0 unchanged.
Top method improvements (PerfScoreUnits):
      -11.40 (-49.35% of base) : System.Private.CoreLib.dasm - System.Numerics.VectorMath:ConditionalSelectBitwise(System.Runtime.Intrinsics.Vector128`1[Single],System.Runtime.Intrinsics.Vector128`1[Single],System.Runtime.Intrinsics.Vector128`1[Single]):System.Runtime.Intrinsics.Vector128`1[Single]
      -11.40 (-49.35% of base) : System.Private.CoreLib.dasm - System.Numerics.VectorMath:ConditionalSelectBitwise(System.Runtime.Intrinsics.Vector128`1[Double],System.Runtime.Intrinsics.Vector128`1[Double],System.Runtime.Intrinsics.Vector128`1[Double]):System.Runtime.Intrinsics.Vector128`1[Double]
      -11.20 (-18.98% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:Create(System.Runtime.Intrinsics.Vector128`1[Byte],System.Runtime.Intrinsics.Vector128`1[Byte]):System.Runtime.Intrinsics.Vector256`1[Byte]
      -11.20 (-18.98% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:Create(System.Runtime.Intrinsics.Vector128`1[Double],System.Runtime.Intrinsics.Vector128`1[Double]):System.Runtime.Intrinsics.Vector256`1[Double]
      -11.20 (-18.98% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:Create(System.Runtime.Intrinsics.Vector128`1[Int16],System.Runtime.Intrinsics.Vector128`1[Int16]):System.Runtime.Intrinsics.Vector256`1[Int16]
      -11.20 (-18.98% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:Create(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector128`1[Int32]):System.Runtime.Intrinsics.Vector256`1[Int32]
      -11.20 (-18.98% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:Create(System.Runtime.Intrinsics.Vector128`1[Int64],System.Runtime.Intrinsics.Vector128`1[Int64]):System.Runtime.Intrinsics.Vector256`1[Int64]
      -11.20 (-18.98% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:Create(System.Runtime.Intrinsics.Vector128`1[SByte],System.Runtime.Intrinsics.Vector128`1[SByte]):System.Runtime.Intrinsics.Vector256`1[SByte]
      -11.20 (-18.98% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:Create(System.Runtime.Intrinsics.Vector128`1[Single],System.Runtime.Intrinsics.Vector128`1[Single]):System.Runtime.Intrinsics.Vector256`1[Single]
      -11.20 (-18.98% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:Create(System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector128`1[UInt16]):System.Runtime.Intrinsics.Vector256`1[UInt16]
      -11.20 (-18.98% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:Create(System.Runtime.Intrinsics.Vector128`1[UInt32],System.Runtime.Intrinsics.Vector128`1[UInt32]):System.Runtime.Intrinsics.Vector256`1[UInt32]
      -11.20 (-18.98% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:Create(System.Runtime.Intrinsics.Vector128`1[UInt64],System.Runtime.Intrinsics.Vector128`1[UInt64]):System.Runtime.Intrinsics.Vector256`1[UInt64]
       -9.40 (-27.49% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[UInt32],System.Runtime.Intrinsics.Vector64`1[UInt32]):System.Runtime.Intrinsics.Vector128`1[UInt32]
       -9.40 (-27.49% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[UInt64],System.Runtime.Intrinsics.Vector64`1[UInt64]):System.Runtime.Intrinsics.Vector128`1[UInt64]
       -9.40 (-27.49% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector64`1[UInt16]):System.Runtime.Intrinsics.Vector128`1[UInt16]
       -9.40 (-27.49% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector128`1[Single]
       -9.40 (-27.49% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[SByte],System.Runtime.Intrinsics.Vector64`1[SByte]):System.Runtime.Intrinsics.Vector128`1[SByte]
       -9.40 (-27.49% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector128`1[Double]
       -9.40 (-27.49% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector128`1[Byte]
       -9.40 (-27.49% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:WithUpper(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int32]):System.Runtime.Intrinsics.Vector128`1[Int32]
Top method improvements (percentages):
      -11.40 (-49.35% of base) : System.Private.CoreLib.dasm - System.Numerics.VectorMath:ConditionalSelectBitwise(System.Runtime.Intrinsics.Vector128`1[Single],System.Runtime.Intrinsics.Vector128`1[Single],System.Runtime.Intrinsics.Vector128`1[Single]):System.Runtime.Intrinsics.Vector128`1[Single]
      -11.40 (-49.35% of base) : System.Private.CoreLib.dasm - System.Numerics.VectorMath:ConditionalSelectBitwise(System.Runtime.Intrinsics.Vector128`1[Double],System.Runtime.Intrinsics.Vector128`1[Double],System.Runtime.Intrinsics.Vector128`1[Double]):System.Runtime.Intrinsics.Vector128`1[Double]
       -9.10 (-36.55% of base) : System.Private.CoreLib.dasm - System.Text.ASCIIUtility:ContainsNonAsciiByte(System.Runtime.Intrinsics.Vector128`1[Byte]):bool
       -7.60 (-36.02% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|40_0(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector128`1[Byte]
       -7.60 (-36.02% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|41_0(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector128`1[Double]
       -7.60 (-36.02% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|42_0(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector128`1[Int16]
       -7.60 (-36.02% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|43_0(System.Runtime.Intrinsics.Vector64`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int32]):System.Runtime.Intrinsics.Vector128`1[Int32]
       -7.60 (-36.02% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|44_0(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector128`1[Int64]
       -7.60 (-36.02% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|45_0(System.Runtime.Intrinsics.Vector64`1[SByte],System.Runtime.Intrinsics.Vector64`1[SByte]):System.Runtime.Intrinsics.Vector128`1[SByte]
       -7.60 (-36.02% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|46_0(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector128`1[Single]
       -7.60 (-36.02% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|47_0(System.Runtime.Intrinsics.Vector64`1[UInt16],System.Runtime.Intrinsics.Vector64`1[UInt16]):System.Runtime.Intrinsics.Vector128`1[UInt16]
       -7.60 (-36.02% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|48_0(System.Runtime.Intrinsics.Vector64`1[UInt32],System.Runtime.Intrinsics.Vector64`1[UInt32]):System.Runtime.Intrinsics.Vector128`1[UInt32]
       -7.60 (-36.02% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128:<Create>g__SoftwareFallback|49_0(System.Runtime.Intrinsics.Vector64`1[UInt64],System.Runtime.Intrinsics.Vector64`1[UInt64]):System.Runtime.Intrinsics.Vector128`1[UInt64]
       -4.20 (-33.07% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Arm.AdvSimd:AbsoluteDifferenceAdd(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
       -4.20 (-33.07% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Arm.AdvSimd:AbsoluteDifferenceAdd(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
       -4.20 (-33.07% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Arm.AdvSimd:AbsoluteDifferenceAdd(System.Runtime.Intrinsics.Vector64`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int32]):System.Runtime.Intrinsics.Vector64`1[Int32]
       -4.20 (-33.07% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Arm.AdvSimd:AbsoluteDifferenceAdd(System.Runtime.Intrinsics.Vector64`1[SByte],System.Runtime.Intrinsics.Vector64`1[SByte],System.Runtime.Intrinsics.Vector64`1[SByte]):System.Runtime.Intrinsics.Vector64`1[SByte]
       -4.20 (-33.07% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Arm.AdvSimd:AbsoluteDifferenceAdd(System.Runtime.Intrinsics.Vector64`1[UInt16],System.Runtime.Intrinsics.Vector64`1[UInt16],System.Runtime.Intrinsics.Vector64`1[UInt16]):System.Runtime.Intrinsics.Vector64`1[UInt16]
       -4.20 (-33.07% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Arm.AdvSimd:AbsoluteDifferenceAdd(System.Runtime.Intrinsics.Vector64`1[UInt32],System.Runtime.Intrinsics.Vector64`1[UInt32],System.Runtime.Intrinsics.Vector64`1[UInt32]):System.Runtime.Intrinsics.Vector64`1[UInt32]
       -4.20 (-33.07% of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Arm.AdvSimd:AbsoluteDifferenceAdd(System.Runtime.Intrinsics.Vector128`1[Byte],System.Runtime.Intrinsics.Vector128`1[Byte],System.Runtime.Intrinsics.Vector128`1[Byte]):System.Runtime.Intrinsics.Vector128`1[Byte]
2284 total methods with Perf Score differences (2284 improved, 0 regressed), 25063 unchanged.

@sandreenko
Copy link
Contributor

Could you please help me understand why we need to mark hfa arg as exposed or doNotEnreg in general?

@CarolEidt
Copy link
Contributor

Could you please help me understand why we need to mark hfa arg as exposed or doNotEnreg in general?

I'm not entirely sure why HFAs in general were marked that way, but I'm reasonably certain that the single-element HFAs being marked that way was an oversight. For the multi-element HFAs, I presume it was just an excess of caution.

Copy link
Contributor

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kunalspathak kunalspathak merged commit de44989 into dotnet:master Sep 8, 2020
@kunalspathak kunalspathak deleted the enregister branch September 8, 2020 20:37
@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants