Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster thread local statics #63619

Open
EgorBo opened this issue Jan 11, 2022 · 4 comments
Open

Faster thread local statics #63619

EgorBo opened this issue Jan 11, 2022 · 4 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@EgorBo
Copy link
Member

EgorBo commented Jan 11, 2022

I noticed that various Thread Statics helpers constantly pop up in native traces in micro and macro benchmarks, e.g. Platform-JSON TE benchmark:
image

A simple demo:

while (true)
{
    var arr = ArrayPool<int>.Shared.Rent(10000);
    ArrayPool<int>.Shared.Return(arr);
}

when it spins it spends most of the time in JIT_GetSharedGCThreadStaticBaseDynamicClass:
image
where it just early outs:
image

if only we had a dedicated register (I think we can afford one at least on arm) which always pointed to current thread we could avoid that kind of overhead. Something like this (pseudocode):

if (!CTreg->tlsInited)
  initTls();
var field = CTreg->field;

making TLS access super cheap. As a bonus it might help with some pinvoke/safepoints routine?
Thoughts?
cc @jkotas

category:cq
theme:runtime
skill-level:intermediate
cost:small
impact:medium

@EgorBo EgorBo added the tenet-performance Performance related issue label Jan 11, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Jan 11, 2022
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@jkotas
Copy link
Member

jkotas commented Jan 11, 2022

I do not think it is a good idea to deviate from the standard ABI by having .NET-specific dedicated register that points to the thread-local block. It has ripple effects and comes with number of trade-offs. Some things will get faster, some things will get slower.

There is a dedicated register in the standard ABI that points to the thread-local block already: gs:[0] on Windows x64, etc. The JIT does not know how to inline this access today. If we wanted to improve things here, we should teach the JIT how to inline the gs:[xxx] statics access. It is basically same idea as what you have proposed, except that you replace the .NET specific dedicated register with the standard ABI one.

We can also work on reducing number of indirections to go from the thread-local register to the actual static. There is at least one indirection that can be avoided by refactoring how thread statics work.

@jkotas jkotas added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 11, 2022
@ghost
Copy link

ghost commented Jan 11, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

I noticed that various Thread Statics helpers constantly pop up in native traces in micro and macro benchmarks, e.g. Platform-JSON TE benchmark:
image

A simple demo:

while (true)
{
    var arr = ArrayPool<int>.Shared.Rent(10000);
    ArrayPool<int>.Shared.Return(arr);
}

when it spins it spends most of the time in JIT_GetSharedGCThreadStaticBaseDynamicClass:
image
where it just early outs:
image

if only we had a dedicated register (I think we can afford one at least on arm) which always pointed to current thread we could avoid that kind of overhead. Something like this (pseudocode):

if (!CTreg->tlsInited)
  initTls();
var field = CTreg->field;

making TLS access super cheap. As a bonus it might help with some pinvoke/safepoints routine?
Thoughts?
cc @jkotas

Author: EgorBo
Assignees: -
Labels:

tenet-performance, area-CodeGen-coreclr, area-VM-coreclr, untriaged

Milestone: -

@jkotas jkotas changed the title Keep a dedicated register for CurrentThread pointer Faster thread local statics Jan 11, 2022
@EgorBo
Copy link
Member Author

EgorBo commented Jan 11, 2022

@jkotas thanks for the feedback! I was about to ask "should I rename it to "make TLS faster?" 🙂
BTW, related: #63622

@EgorBo EgorBo added this to the Future milestone Jan 11, 2022
@EgorBo EgorBo removed the untriaged New issue has not been triaged by the area owner label Jan 11, 2022
@EgorBo EgorBo self-assigned this Jan 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

2 participants