Optimize write barriers for server gc on arm64 #106934

EgorBo · 2024-08-25T10:59:35Z

Introduce ServerGC-specific write barriers on arm64. The only difference in them is removed "is gen0" check:

-    // Branch to Exit if the reference is not in the Gen0 heap
-    ldr  x12, LOCAL_LABEL(wbs_ephemeral_low)
-    ldr  x17, LOCAL_LABEL(wbs_ephemeral_high)
-    cmp  x15, x12
-    ccmp x15, x17, #0x2, hs
-    bhs  LOCAL_LABEL(Exit)

since server gc has multiple heaps and this check doesn't work. Copied from x64 which has JIT_WriteBarrier_SVR64

I'll add windows and NAOT support if this is approved as the right thing to do

dotnet-policy-service · 2024-08-25T11:00:10Z

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

EgorBo · 2024-08-25T13:04:45Z

Looks like we should rather port Regions-specific write-barriers to ARM64 (#67389 did it only for x64) - *_Bit_Region64 and *_Byte_Region64.

EgorBo · 2024-08-26T15:09:42Z

@dotnet/gc is porting *_Bit_Region64 and *_Byte_Region64 write barriers to ARM64 on your radar? (#67389 introduced them, but for x64 only).

Maoni0 · 2024-08-27T00:17:33Z

we have not scheduled any work item for it. if you have cycles, you are of course welcome to work on it.

EgorBo · 2024-08-27T00:32:46Z

we have not scheduled any work item for it. if you have cycles, you are of course welcome to work on it.

It doesn't look to hard to implement to me, so I might try. Closing this PR since we're not going to need it if we implement the Regions-specific WBs.

mangod9 · 2024-08-27T00:36:38Z

Implementing precise WB was the initial impetus around the write barrier investigation recently. But that showed that there might be other improvements which are required as well. Does your recent experimentation show that might not be the case?

EgorBo · 2024-08-27T00:40:14Z

Implementing precise WB was the initial impetus around the write barrier investigation recently.

I've got an impression that that investigation was about raw performance of the current write-barrier rather than making it more precise. Correct me if I am wrong, but the region based write-barrier technically can be a bit slower. E.g. I see that the x64 PR had a lot of regressions in microbnehcmarks attached (while making good GC pause improvements for real world apps).

mangod9 · 2024-08-27T00:53:14Z

yeah the investigation was about improving overall arm64 WB perf since the measurements were showing that its apparently order of magnitude slower than x64. Hoping to have a concentrated effort to improve in 10, but the challenge is whether its perf would have material impact on real world scenarios.

Maoni0 · 2024-08-27T02:40:52Z

we have not scheduled any work item for it. if you have cycles, you are of course welcome to work on it.

It doesn't look to hard to implement to me, so I might try. Closing this PR since we're not going to need it if we implement the Regions-specific WBs.

I'm confused - the bit/byte versions are the regions specific WBs. they do not apply to segments.

last I heard the WB perf on arm is still significantly slower than on x64 so it'd be good to figure that out before implementing the bit/byte versions.

EgorBo · 2024-08-27T14:46:04Z

I'm confused - the bit/byte versions are the regions specific WBs. they do not apply to segments.

Should we spend efforts on making segments faster? My main motivation is to improve Linux-arm64 on big 1P apps, which, presumably, use Regions.

EgorBo · 2024-08-27T14:48:47Z

last I heard the WB perf on arm is still significantly slower than on x64 so it'd be good to figure that out before implementing the bit/byte versions.

I think there are two things currently:

Calls to WB almost always go through jump stubs because managed code can't reach libcoreclr with relocs (it happens on linux-x64 too, but a lot more rarely) - at least I do see jump stubs in all perf profiles. E.g.:
The "is card table already marked" is expensive

jkotas · 2024-08-27T16:51:06Z

I do see jump stubs in all perf profiles

Are arm64 jump stubs as efficient as possible on modern arm64 hardware?

EgorBo · 2024-08-27T16:54:00Z

I do see jump stubs in all perf profiles

Are arm64 jump stubs as efficient as possible on modern arm64 hardware?

Can't say for sure, but I had pretty confusing results recently, I compared two benchmarks where we were able to preallocate the loader heap next to libcoreclr (hence, direct call without jump stubs) vs default with a jump stub. For some reason the direct call was notably slower 😕

mangod9 · 2024-08-27T17:02:09Z

adding @janvorli here as well, since he had done some experimentation around jump stub perf during W^X work I think. Phoebe's results also showed a few anomalous results, so there seems to be a question over accuracy of perf tooling on arm64.

Optimize write barriers for server gc on arm64

9c3b1f4

dotnet-issue-labeler bot added the area-VM-coreclr label Aug 25, 2024

dotnet-policy-service bot assigned EgorBo Aug 25, 2024

This comment was marked as resolved.

Sign in to view

This was referenced Aug 25, 2024

System.IO.Net5Compat.Tests and System.IO.Tests suddenly exiting with error 137 #100558

Open

SIGKILL (OOM?) while running LibraryImportGenerator.Tests w/o actionable log messages or artifacts dotnet/dnceng#2496

Open

EgorBo closed this Aug 27, 2024

EgorBo deleted the arm64-wb-svr64 branch August 27, 2024 00:40

dotnet deleted a comment from EgorBot Aug 27, 2024

github-actions bot locked and limited conversation to collaborators Sep 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize write barriers for server gc on arm64 #106934

Optimize write barriers for server gc on arm64 #106934

EgorBo commented Aug 25, 2024 •

edited

Loading

dotnet-policy-service bot commented Aug 25, 2024

This comment was marked as resolved.

This comment was marked as resolved.

EgorBo commented Aug 25, 2024 •

edited

Loading

EgorBo commented Aug 26, 2024 •

edited

Loading

Maoni0 commented Aug 27, 2024

EgorBo commented Aug 27, 2024

mangod9 commented Aug 27, 2024

EgorBo commented Aug 27, 2024 •

edited

Loading

mangod9 commented Aug 27, 2024

Maoni0 commented Aug 27, 2024

EgorBo commented Aug 27, 2024

EgorBo commented Aug 27, 2024 •

edited

Loading

jkotas commented Aug 27, 2024

EgorBo commented Aug 27, 2024

mangod9 commented Aug 27, 2024

Optimize write barriers for server gc on arm64 #106934

Optimize write barriers for server gc on arm64 #106934

Conversation

EgorBo commented Aug 25, 2024 • edited Loading

dotnet-policy-service bot commented Aug 25, 2024

This comment was marked as resolved.

This comment was marked as resolved.

EgorBo commented Aug 25, 2024 • edited Loading

EgorBo commented Aug 26, 2024 • edited Loading

Maoni0 commented Aug 27, 2024

EgorBo commented Aug 27, 2024

mangod9 commented Aug 27, 2024

EgorBo commented Aug 27, 2024 • edited Loading

mangod9 commented Aug 27, 2024

Maoni0 commented Aug 27, 2024

EgorBo commented Aug 27, 2024

EgorBo commented Aug 27, 2024 • edited Loading

jkotas commented Aug 27, 2024

EgorBo commented Aug 27, 2024

mangod9 commented Aug 27, 2024

EgorBo commented Aug 25, 2024 •

edited

Loading

EgorBo commented Aug 25, 2024 •

edited

Loading

EgorBo commented Aug 26, 2024 •

edited

Loading

EgorBo commented Aug 27, 2024 •

edited

Loading

EgorBo commented Aug 27, 2024 •

edited

Loading