Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement for Guid.Equals using SSE #53012

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 30 additions & 7 deletions src/libraries/System.Private.CoreLib/src/System/Guid.cs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
using System.Numerics;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;
using System.Runtime.Versioning;
using Internal.Runtime.CompilerServices;

Expand Down Expand Up @@ -801,17 +803,38 @@ public override int GetHashCode()

public bool Equals(Guid g) => EqualsCore(this, g);

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static bool EqualsCore(in Guid left, in Guid right)
{
ref int rA = ref Unsafe.AsRef(in left._a);
ref int rB = ref Unsafe.AsRef(in right._a);
if (Sse2.IsSupported)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about ARM64? We care about ARM64 as much as we care about x64.

Copy link
Contributor

@FilipToth FilipToth May 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if SSE2 is not supported, we can just use a method without SSE2? Much like what we have been doing so far. But still, in most cases SSE2is supported.

{
Vector128<byte> g1 = Unsafe.As<Guid, Vector128<byte>>(ref Unsafe.AsRef(left));
Vector128<byte> g2 = Unsafe.As<Guid, Vector128<byte>>(ref Unsafe.AsRef(right));

if (Sse41.IsSupported)
{
var xor = Sse2.Xor(g1, g2);
return Sse41.TestZ(xor, xor);
}

var result = Sse2.CompareEqual(g1, g2);
return Sse2.MoveMask(result) == 0b1111_1111_1111_1111;
Comment on lines +820 to +821
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var result = Sse2.CompareEqual(g1, g2);
return Sse2.MoveMask(result) == 0b1111_1111_1111_1111;
return g1.Equals(g2);

Results in same codegen and is simpler to read.
If this hinders inlining, etc. keep it as is (except var -> concrete type).

Ideally the JIT should emit code that takes SSE4.1's _mm_testz_si128 into account?
(Iif this variant is always faster on the supported cpus)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vector128<T>.Equals(Vector128<T>) currently does not have a specific/optimized code path for SSE 4.1's _mm_testz_si128, which I why I called out an explicit code path for SSE 4.1 in Guid.EqualsCore.

But yes, under the SSE2 code path, it should be equivalent to invoke g1.Equals(g2) on the assumption that the JIT will inline the call, which it should because Vector128<T>.Equals(Vector128<T>) is decorated with "aggressive inlining".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently does not have a specific/optimized code path for SSE 4.1's

Yeah, my question was intented for @tannergooding (?) to have a look at this on the JIT-side 😃

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the JIT should emit code that takes SSE4.1's _mm_testz_si128 into account?

There are a few improvements that could happen for Vector128.Equals. I've been waiting on #49397 before tackling any of them, however.

}

return SoftwareFallback(left, right);

// Compare each element
static bool SoftwareFallback(in Guid left, in Guid right)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re in: just to double-check the discussion in the issue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to take direction on this either way. I have found in my performance tests thus far that passing a Guid by reference outperforms passing by value - I suspect because the JIT has difficulty keeping the Guid parameters in registers because they have 11 fields.

{
ref int rA = ref Unsafe.AsRef(in left._a);
ref int rB = ref Unsafe.AsRef(in right._a);

return rA == rB
&& Unsafe.Add(ref rA, 1) == Unsafe.Add(ref rB, 1)
&& Unsafe.Add(ref rA, 2) == Unsafe.Add(ref rB, 2)
&& Unsafe.Add(ref rA, 3) == Unsafe.Add(ref rB, 3);
// Compare each element

return rA == rB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we special case 64-bits processes by comparing with longs?
E.g. on ARM 64 this code will be hit.
Cf. #35654

if (Environment.Is64BitProcess)
{
    // code with long
}
else
{
    // code with int
}

The "fast out behavior" should be given with longs too, but I'm not sure how about alignment on ARM 64.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to update with a special case for 64-bit software fallback if desired. The current Guid.EqualsCore method doesn't cater for this, but that's no reason to not cater for it now.

&& Unsafe.Add(ref rA, 1) == Unsafe.Add(ref rB, 1)
&& Unsafe.Add(ref rA, 2) == Unsafe.Add(ref rB, 2)
&& Unsafe.Add(ref rA, 3) == Unsafe.Add(ref rB, 3);
}
}

private static int GetResult(uint me, uint them) => me < them ? -1 : 1;
Expand Down