Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize write barriers for server gc on arm64 #106934

Closed
wants to merge 1 commit into from

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Aug 25, 2024

Introduce ServerGC-specific write barriers on arm64. The only difference in them is removed "is gen0" check:

-    // Branch to Exit if the reference is not in the Gen0 heap
-    ldr  x12, LOCAL_LABEL(wbs_ephemeral_low)
-    ldr  x17, LOCAL_LABEL(wbs_ephemeral_high)
-    cmp  x15, x12
-    ccmp x15, x17, #0x2, hs
-    bhs  LOCAL_LABEL(Exit)

since server gc has multiple heaps and this check doesn't work. Copied from x64 which has JIT_WriteBarrier_SVR64

I'll add windows and NAOT support if this is approved as the right thing to do

Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@EgorBo

This comment was marked as resolved.

@EgorBot

This comment was marked as resolved.

@EgorBo
Copy link
Member Author

EgorBo commented Aug 25, 2024

Looks like we should rather port Regions-specific write-barriers to ARM64 (#67389 did it only for x64) - *_Bit_Region64 and *_Byte_Region64.

@EgorBo
Copy link
Member Author

EgorBo commented Aug 26, 2024

@dotnet/gc is porting *_Bit_Region64 and *_Byte_Region64 write barriers to ARM64 on your radar? (#67389 introduced them, but for x64 only).

@Maoni0
Copy link
Member

Maoni0 commented Aug 27, 2024

we have not scheduled any work item for it. if you have cycles, you are of course welcome to work on it.

@EgorBo
Copy link
Member Author

EgorBo commented Aug 27, 2024

we have not scheduled any work item for it. if you have cycles, you are of course welcome to work on it.

It doesn't look to hard to implement to me, so I might try. Closing this PR since we're not going to need it if we implement the Regions-specific WBs.

@EgorBo EgorBo closed this Aug 27, 2024
@mangod9
Copy link
Member

mangod9 commented Aug 27, 2024

Implementing precise WB was the initial impetus around the write barrier investigation recently. But that showed that there might be other improvements which are required as well. Does your recent experimentation show that might not be the case?

@EgorBo
Copy link
Member Author

EgorBo commented Aug 27, 2024

Implementing precise WB was the initial impetus around the write barrier investigation recently.

I've got an impression that that investigation was about raw performance of the current write-barrier rather than making it more precise. Correct me if I am wrong, but the region based write-barrier technically can be a bit slower. E.g. I see that the x64 PR had a lot of regressions in microbnehcmarks attached (while making good GC pause improvements for real world apps).

@EgorBo EgorBo deleted the arm64-wb-svr64 branch August 27, 2024 00:40
@mangod9
Copy link
Member

mangod9 commented Aug 27, 2024

yeah the investigation was about improving overall arm64 WB perf since the measurements were showing that its apparently order of magnitude slower than x64. Hoping to have a concentrated effort to improve in 10, but the challenge is whether its perf would have material impact on real world scenarios.

@Maoni0
Copy link
Member

Maoni0 commented Aug 27, 2024

we have not scheduled any work item for it. if you have cycles, you are of course welcome to work on it.

It doesn't look to hard to implement to me, so I might try. Closing this PR since we're not going to need it if we implement the Regions-specific WBs.

I'm confused - the bit/byte versions are the regions specific WBs. they do not apply to segments.

last I heard the WB perf on arm is still significantly slower than on x64 so it'd be good to figure that out before implementing the bit/byte versions.

@EgorBo
Copy link
Member Author

EgorBo commented Aug 27, 2024

I'm confused - the bit/byte versions are the regions specific WBs. they do not apply to segments.

Should we spend efforts on making segments faster? My main motivation is to improve Linux-arm64 on big 1P apps, which, presumably, use Regions.

@EgorBo
Copy link
Member Author

EgorBo commented Aug 27, 2024

last I heard the WB perf on arm is still significantly slower than on x64 so it'd be good to figure that out before implementing the bit/byte versions.

I think there are two things currently:

  1. Calls to WB almost always go through jump stubs because managed code can't reach libcoreclr with relocs (it happens on linux-x64 too, but a lot more rarely) - at least I do see jump stubs in all perf profiles. E.g.:

  2. The "is card table already marked" is expensive

@dotnet dotnet deleted a comment from EgorBot Aug 27, 2024
@dotnet dotnet deleted a comment from EgorBot Aug 27, 2024
@dotnet dotnet deleted a comment from EgorBot Aug 27, 2024
@dotnet dotnet deleted a comment from EgorBot Aug 27, 2024
@jkotas
Copy link
Member

jkotas commented Aug 27, 2024

I do see jump stubs in all perf profiles

Are arm64 jump stubs as efficient as possible on modern arm64 hardware?

@EgorBo
Copy link
Member Author

EgorBo commented Aug 27, 2024

I do see jump stubs in all perf profiles

Are arm64 jump stubs as efficient as possible on modern arm64 hardware?

Can't say for sure, but I had pretty confusing results recently, I compared two benchmarks where we were able to preallocate the loader heap next to libcoreclr (hence, direct call without jump stubs) vs default with a jump stub. For some reason the direct call was notably slower 😕

@mangod9
Copy link
Member

mangod9 commented Aug 27, 2024

adding @janvorli here as well, since he had done some experimentation around jump stub perf during W^X work I think. Phoebe's results also showed a few anomalous results, so there seems to be a question over accuracy of perf tooling on arm64.

@github-actions github-actions bot locked and limited conversation to collaborators Sep 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants