Get index of first non ascii char #39507

pgovind · 2020-07-17T08:15:30Z

I think this is ready and can be reviewed now.

Implements GetIndexOfFirstNonAsciiChar from #35034

Updated Perf:

| Faster                                                                 | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| ---------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| -------- |
| System.Text.Experimental.Perf.IsNormalized_GetIndexOfFirstNonAsciiChar |      1.40 |        111784.29 |         79682.34 |         |

ghost · 2020-08-06T18:43:20Z

Tagging subscribers to this area: @tannergooding
See info in area-owners.md if you want to be subscribed.

eiriktsarpalis · 2020-08-10T08:13:39Z

src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs

@@ -747,6 +787,18 @@ private static unsafe nuint GetIndexOfFirstNonAsciiChar_Sse2(char* pBuffer, nuin
                            goto FoundNonAsciiDataInFirstOrSecondVector;
                        }
                    }
+                    else if (AdvSimd.Arm64.IsSupported)
+                    {
+                        currentMask = Unicode.Utf16Utility.GetNonAsciiBytes(AdvSimd.AddSaturate(combinedVector, asciiMaskForAddSaturate).AsByte(), bitmask);


Minor nit: The goto FoundNonAsciiDataInFirstOrSecondVector; statement is replicated across many branches, which could make the logic harder to modify longer-term. Consider factoring SIMD logic into a predicate method, e.g. bool ContainNonAsciiDataInFirstOrSecondVector() and then consume like so:

if (ContainNonAsciiDataInFirstOrSecondVector(..args..)) { goto FoundNonAsciiDataInFirstOrSecondVector; }

kunalspathak · 2020-08-10T15:57:46Z

Do you mind sending a PR to dotnet/performance to add IsNormalized_GetIndexOfFirstNonAsciiChar () benchmark? We need it to track .NET 3.1 vs. .NET 5 performance improvements in this area.

kunalspathak · 2020-08-10T15:59:36Z

src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs

+
+            Vector128<byte> bitmask = BitConverter.IsLittleEndian ?
+                Vector128.Create(0x80402010_08040201).AsByte() :
+                Vector128.Create(0x01020408_10204080).AsByte();


I think this is also true for other PRs that Carlos sent, but can you remind me why we use bitMask for !IsLittleEndian if it is not supported? Is the idea that when we start supporting, the code will just work?

Yup, that's the idea.

kunalspathak

You should not map the SSE2 logic for AdvSimd. AdvSimd performance can be better and doesn't have to go through AddSaturate() logic.

kunalspathak · 2020-08-10T16:02:18Z

src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs

+            }
+            else if (AdvSimd.Arm64.IsSupported)
+            {
+                currentMask = Unicode.Utf16Utility.GetNonAsciiBytes(AdvSimd.AddSaturate(firstVector, asciiMaskForAddSaturate).AsByte(), bitmask);


You should check out excellent feedback from @TamarChristinaArm in #39050 (comment) and #39050 (comment) about optimizing this code. I think you should follow the same because in the end, all you are doing this calling BitOperations.TrailingZeroCount() on it.

Agreed with splitting it off. In this case it looks like this code only cares about the index of the first non-zero element. So we can do even better here.

If you use a mask 0x0f00 and BIC you can get a much more efficient sequence (see https://github.com/ARM-software/optimized-routines/blob/224cb5f67b71757b99fe1e10b5a437c17a1d733c/string/aarch64/strlen.S#L164)

essentially

cmlt v1.16b, v1.16b, #0 bic v1.8h, 0x0f, lsl 8 umaxp v1.16b, v1.16b, v1.16b fmov x1, d1 rbit x0, x0 clz x0, x1

as the sequence to get the first element that has the msb set. Of course for the cases where it's just doing if (currentMask != 0) you just need a maxp and fmov as I explained in the other post.

this is for little endian btw, for big-endian you need a slight variantion. though this can be avoided loading with LD1 instead of LDR (which I think is what LoadVector128 does here).

TamarChristinaArm · 2020-08-10T16:59:32Z

src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs

+                    }
+                    else if (AdvSimd.Arm64.IsSupported)
+                    {
+                        firstVector = AdvSimd.LoadVector128((ushort*)pBuffer);


Just a note for the future, this would be a great place for LoadPair #39243

TamarChristinaArm · 2020-08-10T18:30:36Z

src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs

+            }
+            else if (AdvSimd.Arm64.IsSupported)
+            {
+                currentMask = Unicode.Utf16Utility.GetNonAsciiBytes(AdvSimd.AddSaturate(firstVector, asciiMaskForAddSaturate).AsByte(), bitmask);


Agreed with splitting it off. In this case it looks like this code only cares about the index of the first non-zero element. So we can do even better here.

If you use a mask 0x0f00 and BIC you can get a much more efficient sequence (see https://github.com/ARM-software/optimized-routines/blob/224cb5f67b71757b99fe1e10b5a437c17a1d733c/string/aarch64/strlen.S#L164)

essentially

cmlt v1.16b, v1.16b, #0 bic v1.8h, 0x0f, lsl 8 umaxp v1.16b, v1.16b, v1.16b fmov x1, d1 rbit x0, x0 clz x0, x1

as the sequence to get the first element that has the msb set. Of course for the cases where it's just doing if (currentMask != 0) you just need a maxp and fmov as I explained in the other post.

pgovind · 2020-08-13T22:05:23Z

Updated Perf:

| Faster                                                                 | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| ---------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| -------- |
| System.Text.Experimental.Perf.IsNormalized_GetIndexOfFirstNonAsciiChar |      1.40 |        111784.29 |         79682.34 |         |

cc @kunalspathak

pgovind · 2020-08-13T22:15:22Z

Filed #40805 to follow up on Tamar's suggestions here. Considering that we're already seeing decent perf improvements here, how do folks feel about merging this PR now and investigating the suggestions in a future PR?

kunalspathak · 2020-08-13T23:31:49Z

Couple of things:

There was some feedback given in ARM64 intrinsics support for Utf8String.Experimental #39103 (comment) that I don't see we implemented. It was also brought up by Tamar, the key point is we should not mimic the SSE2 logic of AddSaturate() because ARM64 has better instructions to do similar operation. I agree that it will take some time to make it most efficient implementation, but we should at least think about ways to do the operation without AddSaturate(), etc. Currently, we are doing more than needed operations like 3 AddPairwise which might show improvement on WSL2 but might have degraded performance on different processors.
If I see the code of IsNormalized_GetIndexOfFirstNonAsciiChar() in https://github.com/dotnet/performance/pull/1445/files#diff-2d837d38e3d94ab0d7f80e232693857eR90 , I see that you are testing it on ascii data instead of non-ascii data. Was that intentional? Are the above results from the ascii version as seen in the PR in performance repo?

With that, I am not sure if we should rush this in for RC1.

jeffhandley · 2021-01-23T02:09:24Z

@pgovind I'm going to close this PR since this work is on hold. When we get back around to this work, we can create a fresh PR.

Dotnet-GitSync-Bot added the area-Infrastructure-coreclr label Jul 17, 2020

pgovind removed the area-Infrastructure-coreclr label Jul 17, 2020

jkotas added the NO-REVIEW Experimental/testing PR, do NOT review it label Jul 17, 2020

GetIndexOfFirstNonAsciiChar

280db35

pgovind force-pushed the GetIndexOfFirstNonAsciiChar branch from 265901b to 280db35 Compare July 30, 2020 00:09

Temp commit to change branches

b4edde6

danmoseley added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Aug 5, 2020

Bug fix

9a7de4e

pgovind force-pushed the GetIndexOfFirstNonAsciiChar branch from 0807a27 to 9a7de4e Compare August 6, 2020 18:39

revert commenting out test case

15fd6a3

pgovind requested review from echesakov and kunalspathak August 6, 2020 18:42

pgovind added area-System.Runtime.Intrinsics and removed NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) NO-REVIEW Experimental/testing PR, do NOT review it labels Aug 6, 2020

eiriktsarpalis reviewed Aug 10, 2020

View reviewed changes

eiriktsarpalis approved these changes Aug 10, 2020

View reviewed changes

kunalspathak reviewed Aug 10, 2020

View reviewed changes

kunalspathak requested changes Aug 10, 2020

View reviewed changes

TamarChristinaArm reviewed Aug 10, 2020

View reviewed changes

pgovind mentioned this pull request Aug 12, 2020

Get index of first non ascii byte #39506

Merged

pgovind mentioned this pull request Aug 13, 2020

Investigate potential ARM64 perf improvements in AsciiUtility and Utf16Utility #40805

Open

BruceForstall mentioned this pull request Aug 24, 2020

Optimize library code using arm64 intrinsics #33308

Closed

jeffhandley mentioned this pull request Aug 24, 2020

Optimize System.Text.ASCIIUtility for arm64 using cross-platform intrinsics #41292

Closed

2 tasks

TamarChristinaArm mentioned this pull request Sep 9, 2020

[ARM64] Performance regression: Utf8Encoding #41699

Closed

jeffhandley closed this Jan 23, 2021

ghost locked as resolved and limited conversation to collaborators Feb 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get index of first non ascii char #39507

Get index of first non ascii char #39507

pgovind commented Jul 17, 2020 •

edited

Loading

ghost commented Aug 6, 2020

eiriktsarpalis Aug 10, 2020

kunalspathak commented Aug 10, 2020

kunalspathak Aug 10, 2020

pgovind Aug 10, 2020

kunalspathak left a comment

kunalspathak Aug 10, 2020

TamarChristinaArm Aug 10, 2020 •

edited

Loading

TamarChristinaArm Aug 10, 2020

TamarChristinaArm Aug 10, 2020

TamarChristinaArm Aug 10, 2020 •

edited

Loading

pgovind commented Aug 13, 2020 •

edited

Loading

pgovind commented Aug 13, 2020

kunalspathak commented Aug 13, 2020

jeffhandley commented Jan 23, 2021

Get index of first non ascii char #39507

Get index of first non ascii char #39507

Conversation

pgovind commented Jul 17, 2020 • edited Loading

ghost commented Aug 6, 2020

eiriktsarpalis Aug 10, 2020

Choose a reason for hiding this comment

kunalspathak commented Aug 10, 2020

kunalspathak Aug 10, 2020

Choose a reason for hiding this comment

pgovind Aug 10, 2020

Choose a reason for hiding this comment

kunalspathak left a comment

Choose a reason for hiding this comment

kunalspathak Aug 10, 2020

Choose a reason for hiding this comment

TamarChristinaArm Aug 10, 2020 • edited Loading

Choose a reason for hiding this comment

TamarChristinaArm Aug 10, 2020

Choose a reason for hiding this comment

TamarChristinaArm Aug 10, 2020

Choose a reason for hiding this comment

TamarChristinaArm Aug 10, 2020 • edited Loading

Choose a reason for hiding this comment

pgovind commented Aug 13, 2020 • edited Loading

pgovind commented Aug 13, 2020

kunalspathak commented Aug 13, 2020

jeffhandley commented Jan 23, 2021

pgovind commented Jul 17, 2020 •

edited

Loading

TamarChristinaArm Aug 10, 2020 •

edited

Loading

TamarChristinaArm Aug 10, 2020 •

edited

Loading

pgovind commented Aug 13, 2020 •

edited

Loading