-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster thread local statics #63619
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
I do not think it is a good idea to deviate from the standard ABI by having .NET-specific dedicated register that points to the thread-local block. It has ripple effects and comes with number of trade-offs. Some things will get faster, some things will get slower. There is a dedicated register in the standard ABI that points to the thread-local block already: We can also work on reducing number of indirections to go from the thread-local register to the actual static. There is at least one indirection that can be avoided by refactoring how thread statics work. |
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsI noticed that various Thread Statics helpers constantly pop up in native traces in micro and macro benchmarks, e.g. Platform-JSON TE benchmark: A simple demo: while (true)
{
var arr = ArrayPool<int>.Shared.Rent(10000);
ArrayPool<int>.Shared.Return(arr);
} when it spins it spends most of the time in if only we had a dedicated register (I think we can afford one at least on arm) which always pointed to current thread we could avoid that kind of overhead. Something like this (pseudocode):
making TLS access super cheap. As a bonus it might help with some pinvoke/safepoints routine?
|
I noticed that various Thread Statics helpers constantly pop up in native traces in micro and macro benchmarks, e.g. Platform-JSON TE benchmark:
data:image/s3,"s3://crabby-images/7f909/7f9092a4f1ad36d8140a9ee0b75c35f448e41915" alt="image"
A simple demo:
when it spins it spends most of the time in
data:image/s3,"s3://crabby-images/87fc3/87fc37c54942256051ccf510725133845f2b9317" alt="image"
data:image/s3,"s3://crabby-images/7a870/7a870770d7d96c8e26fc2c46d2593320f878ef51" alt="image"
JIT_GetSharedGCThreadStaticBaseDynamicClass
:where it just early outs:
if only we had a dedicated register (I think we can afford one at least on arm) which always pointed to current thread we could avoid that kind of overhead. Something like this (pseudocode):
making TLS access super cheap. As a bonus it might help with some pinvoke/safepoints routine?
Thoughts?
cc @jkotas
category:cq
theme:runtime
skill-level:intermediate
cost:small
impact:medium
The text was updated successfully, but these errors were encountered: