Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get index of first non ascii char #39507

Closed
wants to merge 4 commits into from

Conversation

pgovind
Copy link

@pgovind pgovind commented Jul 17, 2020

I think this is ready and can be reviewed now.

Implements GetIndexOfFirstNonAsciiChar from #35034

Updated Perf:

| Faster                                                                 | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| ---------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| -------- |
| System.Text.Experimental.Perf.IsNormalized_GetIndexOfFirstNonAsciiChar |      1.40 |        111784.29 |         79682.34 |         |

@jkotas jkotas added the NO-REVIEW Experimental/testing PR, do NOT review it label Jul 17, 2020
@pgovind pgovind force-pushed the GetIndexOfFirstNonAsciiChar branch from 265901b to 280db35 Compare July 30, 2020 00:09
@danmoseley danmoseley added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Aug 5, 2020
@pgovind pgovind force-pushed the GetIndexOfFirstNonAsciiChar branch from 0807a27 to 9a7de4e Compare August 6, 2020 18:39
@pgovind pgovind added area-System.Runtime.Intrinsics and removed NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) NO-REVIEW Experimental/testing PR, do NOT review it labels Aug 6, 2020
@ghost
Copy link

ghost commented Aug 6, 2020

Tagging subscribers to this area: @tannergooding
See info in area-owners.md if you want to be subscribed.

@@ -747,6 +787,18 @@ private static unsafe nuint GetIndexOfFirstNonAsciiChar_Sse2(char* pBuffer, nuin
goto FoundNonAsciiDataInFirstOrSecondVector;
}
}
else if (AdvSimd.Arm64.IsSupported)
{
currentMask = Unicode.Utf16Utility.GetNonAsciiBytes(AdvSimd.AddSaturate(combinedVector, asciiMaskForAddSaturate).AsByte(), bitmask);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: The goto FoundNonAsciiDataInFirstOrSecondVector; statement is replicated across many branches, which could make the logic harder to modify longer-term. Consider factoring SIMD logic into a predicate method, e.g. bool ContainNonAsciiDataInFirstOrSecondVector() and then consume like so:

if (ContainNonAsciiDataInFirstOrSecondVector(..args..))
{
    goto FoundNonAsciiDataInFirstOrSecondVector;
}

@kunalspathak
Copy link
Member

Do you mind sending a PR to dotnet/performance to add IsNormalized_GetIndexOfFirstNonAsciiChar () benchmark? We need it to track .NET 3.1 vs. .NET 5 performance improvements in this area.


Vector128<byte> bitmask = BitConverter.IsLittleEndian ?
Vector128.Create(0x80402010_08040201).AsByte() :
Vector128.Create(0x01020408_10204080).AsByte();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is also true for other PRs that Carlos sent, but can you remind me why we use bitMask for !IsLittleEndian if it is not supported? Is the idea that when we start supporting, the code will just work?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's the idea.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not map the SSE2 logic for AdvSimd. AdvSimd performance can be better and doesn't have to go through AddSaturate() logic.

}
else if (AdvSimd.Arm64.IsSupported)
{
currentMask = Unicode.Utf16Utility.GetNonAsciiBytes(AdvSimd.AddSaturate(firstVector, asciiMaskForAddSaturate).AsByte(), bitmask);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should check out excellent feedback from @TamarChristinaArm in #39050 (comment) and #39050 (comment) about optimizing this code. I think you should follow the same because in the end, all you are doing this calling BitOperations.TrailingZeroCount() on it.

Copy link
Contributor

@TamarChristinaArm TamarChristinaArm Aug 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with splitting it off. In this case it looks like this code only cares about the index of the first non-zero element. So we can do even better here.

If you use a mask 0x0f00 and BIC you can get a much more efficient sequence (see https://github.com/ARM-software/optimized-routines/blob/224cb5f67b71757b99fe1e10b5a437c17a1d733c/string/aarch64/strlen.S#L164)

essentially

cmlt    v1.16b, v1.16b, #0
bic	v1.8h, 0x0f, lsl 8
umaxp	v1.16b, v1.16b, v1.16b
fmov	x1, d1
rbit	x0, x0
clz	x0, x1

as the sequence to get the first element that has the msb set. Of course for the cases where it's just doing if (currentMask != 0) you just need a maxp and fmov as I explained in the other post.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for little endian btw, for big-endian you need a slight variantion. though this can be avoided loading with LD1 instead of LDR (which I think is what LoadVector128 does here).

}
else if (AdvSimd.Arm64.IsSupported)
{
firstVector = AdvSimd.LoadVector128((ushort*)pBuffer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note for the future, this would be a great place for LoadPair #39243

}
else if (AdvSimd.Arm64.IsSupported)
{
currentMask = Unicode.Utf16Utility.GetNonAsciiBytes(AdvSimd.AddSaturate(firstVector, asciiMaskForAddSaturate).AsByte(), bitmask);
Copy link
Contributor

@TamarChristinaArm TamarChristinaArm Aug 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with splitting it off. In this case it looks like this code only cares about the index of the first non-zero element. So we can do even better here.

If you use a mask 0x0f00 and BIC you can get a much more efficient sequence (see https://github.com/ARM-software/optimized-routines/blob/224cb5f67b71757b99fe1e10b5a437c17a1d733c/string/aarch64/strlen.S#L164)

essentially

cmlt    v1.16b, v1.16b, #0
bic	v1.8h, 0x0f, lsl 8
umaxp	v1.16b, v1.16b, v1.16b
fmov	x1, d1
rbit	x0, x0
clz	x0, x1

as the sequence to get the first element that has the msb set. Of course for the cases where it's just doing if (currentMask != 0) you just need a maxp and fmov as I explained in the other post.

@pgovind
Copy link
Author

pgovind commented Aug 13, 2020

Updated Perf:

| Faster                                                                 | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| ---------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| -------- |
| System.Text.Experimental.Perf.IsNormalized_GetIndexOfFirstNonAsciiChar |      1.40 |        111784.29 |         79682.34 |         |

cc @kunalspathak

@pgovind
Copy link
Author

pgovind commented Aug 13, 2020

Filed #40805 to follow up on Tamar's suggestions here. Considering that we're already seeing decent perf improvements here, how do folks feel about merging this PR now and investigating the suggestions in a future PR?

@kunalspathak
Copy link
Member

Couple of things:

  • There was some feedback given in ARM64 intrinsics support for Utf8String.Experimental #39103 (comment) that I don't see we implemented. It was also brought up by Tamar, the key point is we should not mimic the SSE2 logic of AddSaturate() because ARM64 has better instructions to do similar operation. I agree that it will take some time to make it most efficient implementation, but we should at least think about ways to do the operation without AddSaturate(), etc. Currently, we are doing more than needed operations like 3 AddPairwise which might show improvement on WSL2 but might have degraded performance on different processors.
  • If I see the code of IsNormalized_GetIndexOfFirstNonAsciiChar() in https://github.com/dotnet/performance/pull/1445/files#diff-2d837d38e3d94ab0d7f80e232693857eR90 , I see that you are testing it on ascii data instead of non-ascii data. Was that intentional? Are the above results from the ascii version as seen in the PR in performance repo?

With that, I am not sure if we should rush this in for RC1.

@jeffhandley
Copy link
Member

@pgovind I'm going to close this PR since this work is on hold. When we get back around to this work, we can create a fresh PR.

@ghost ghost locked as resolved and limited conversation to collaborators Feb 22, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants