-
Notifications
You must be signed in to change notification settings - Fork 510
WIP: Move portable thread pool and timer implementation to shared partition #6880
Conversation
624a2c3
to
049724e
Compare
Uh, not quite there yet... I will need to move |
@jkotas / @stephentoub / @jkoritzinsky Do I assume correctly that it should be possible to move (Alternatively I can reduce the footprint by reusing existing |
Yes, that should be fine. It should be fine to do for pretty much anything to CoreRT CoreLib.Native to support sharing. The CoreRT CoreLib.Native may go away eventually. |
Thanks!
Not planning to work on that until the Environment move from CoreFX is finished. There is quite a lot of overlap. |
src/System.Private.CoreLib/shared/System/Threading/ThreadPool.cs
Outdated
Show resolved
Hide resolved
src/System.Private.CoreLib/shared/System.Private.CoreLib.Shared.projitems
Outdated
Show resolved
Hide resolved
cd5ae4a
to
a158995
Compare
The remaining build failures are because of the missing CoreFX addition to System.Native. |
a158995
to
109ccbb
Compare
@@ -1110,6 +1110,24 @@ | |||
<Compile Include="$(MSBuildThisFileDirectory)System\Security\SecureString.Unix.cs" /> | |||
<Compile Include="$(MSBuildThisFileDirectory)System\TimeZoneInfo.Unix.cs" /> | |||
</ItemGroup> | |||
<ItemGroup Condition="'$(FeaturePortableThreadPool)' == 'true'"> | |||
<Compile Include="$(MSBuildThisFileDirectory)System\Threading\ThreadPool.Portable.cs" /> | |||
<Compile Include="$(MSBuildThisFileDirectory)System\Threading\ClrThreadPool.cs" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Little bit confusing naming here, is ClrThreadPool equal to ThreadPool.Portable ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept the original naming of the files. There's ThreadPool.Portable.cs
which implements the ThreadPool
methods. The other files are the internal implementation classes, which are created and called from the ThreadPool.Portable.cs
code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the naming comes from an implementation inside CLR vs. Win32 Thread Pool API (https://docs.microsoft.com/en-us/windows/desktop/procthread/thread-pool-api).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to follow the existing naming convention. ThreadPool.cs for portable code, ThreadPool.CoreCLR.cs for CoreCLR runtime specific implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already ThreadPool.cs
shared between all implementations. Then there are three implementations of the actual thread pool (CoreCLR w/ unmanaged code and PAL, CortRT/Portable, CoreRT/Windows).
This moves one of the implementations (CoreRT/Portable) under feature flag to shared partition. This implementation is managed reimplementation of what CoreCLR does and it's currently used by CoreRT on Unix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to use the CoreRT implementation in Mono by simply adding
<FeaturePortableThreadPool>true</FeaturePortableThreadPool>
<FeaturePortableTimer>true</FeaturePortableTimer>
to Mono's System.Private.CoreLib.csproj. It could eventually be used in CoreCLR too, but that will require a lot of performance testing and changes to unmanaged code that are currently not feasible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it helps, it would be ok to create PortableThreadPool subdirectory and move the portable threadpool implementation there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it helps, it would be ok to create PortableThreadPool subdirectory and move the portable threadpool implementation there.
And rename ClrThreadPool.*.cs
to PortableThreadPool.*.cs
, presumably, or something along those lines?
I forgot to test Mono builds. This still has to be done before merge. |
Attempted the Mono build. These still have to be fixed:
|
208cdd2
to
0b029f7
Compare
0b029f7
to
776178c
Compare
I still haven't figured out adequate short term solution for That means Mono will have to implement it, but it's only four methods and it's likely to share backend implementation of (Long term there's a possibility to move the WaitSubsystem and related implementations as a feature to Shared partition, but that is a bigger project and may not make sense if Mono chooses to reuse its existing code) |
Does it have to be LowLevelLifoSemaphore? Could the code use Semaphore instead? |
I don't understand the code well enough to answer that. Short version: Long version:
|
… no longer used on Windows.
A regular It is certainly possible to have fully managed implementation of I'll see if I can make a simple implementation for use in Mono, but I would probably still keep it as Mono specific for now. |
Here's more information on why the Semaphore is implemented the way it is - dotnet/coreclr#13921. |
I'll do some benchmarks, but looks like |
<Compile Include="System\Threading\Interlocked.cs" /> | ||
<Compile Include="System\Threading\LockHolder.cs" /> | ||
<Compile Include="System\Threading\LowLevelLock.cs" /> | ||
<Compile Include="System\Threading\LowLevelMonitor.cs" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, people working on this found it convenient to be able to debug the WaitSystem on Windows. That's whe LowLevelMonitor was implemented on both Windows and Unix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I thought the point was to support debugging ThreadPool.Portable on Windows (the only usage outside of WaitSubsystem). That is still supported after this change. The whole WaitSubsystem was already guarded in the same Unix-only conditions.
I was more worried about introducing some performance bottleneck by replacing usages of LowLevelLock with Lock, but I didn't get to run the benchmarks yet to verify the impact.
If it turns out to be a problem I can revert the last few commits and take a different approach with bringing some implementation of LowLevelLock (and LowLevelSemaphore) to Mono.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to maintain being able to use the portable thread pool on Windows as well under at least a build-time flag for debugging and perf comparisons. There shouldn't be too many dependencies other than the few that were already identified here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to maintain being able to use the portable thread pool on Windows as well under at least a build-time flag for debugging and perf comparisons.
Definitely. I use that myself.
#if CORERT | ||
private readonly Lock _waitThreadLock = new Lock(); | ||
#else | ||
private object _waitThreadLock = new object(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can create class LowLevelLock : object
with a couple of methods on it to make these ifdefs unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered it as an option. It would likely result in some counter-part like [LowLevel]LockHolder
that can be used in using (new LockHolder(foo))
. There's only few places so #if's seemed like an option for now, but I am going to revisit the options before dropping the "WIP" tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm specifically going to evaluate the performance of LowLevelLock
(only used in two places; has the Windows PInvoke-heavy implementation that is not used anywhere else) vs. Lock
(used to implement Monitor, so should be pretty optimized) vs. lock (obj)
(portable, but goes through one extra hoop through ObjectHeader/SyncTable; the impact on the specific places might be negligible) to see whether there is any merit preferring one over the others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation-wise, Lock
in CoreRT is very roughly based on the old implementation of Monitor
with some major differences. It is quite likely that currently it is not as good as Monitor
's current implementation in CoreCLR. Eventually that implementation would also have to be ported to CoreRT. In any case, see my other comment for my suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LowLevelLock
(only used in two places; has the Windows PInvoke-heavy implementation that is not used anywhere else)
LowLevelLock
does some (tiny) bit of spin-waiting and where it is used for the most part the locks are typically uncontended, so it's not really p/invoke-heavy in practice (if it turns out to be, the spin-waiting strategy could be improved to fix that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The motivation behind this change was two-fold:
- Avoid bringing
LowLevelLock
/LowLevelMonitor
to shared CoreLib (and all the dependent code). - Use the same primitive as the
lock (...)
pattern. The premise was that this is what user code gets to use and it should be the one that is optimized better.
The problem is that it violates the condition that it is non-interruptible. Another problem is that I grossly underestimated the dependencies of LowLevelLifoSemaphore
, so in my revised plan I would bring the dependencies to shared code anyway and this change would become unnecessary.
Benchmarks of Here's a draft of implementation of The implementation is based on the code from https://github.com/dotnet/coreclr/blob/a28b25aacdcd2adb0fdfa70bd869f53ba6565976/src/vm/synch.cpp#L578-L981. It doesn't implement the spinning (as that is currently not used by the CoreRT code) and as the low-level primitive for waking up threads it relies on the portable I'm still contemplating about moving bulk of this new code into CoreRT (and in effect to the shared partition) and adding the spinning optimization from CoreCLR. Then it would be reduced to 4 methods in Mono ( |
3b33465
to
1a66e33
Compare
Yet another alternative would be to
That way there would be no Mono specific code at all. The downside is that there would be more native code that has to be transferred to CoreFX (the interop code for wrapping pthread_mutex_t/pthread_cond_t). |
Sorry I haven't looked at this yet, I'll get back soon |
No rush. I was reading through the code and I am only now getting to the point that I understand certain decisions behind the implementation. Sorry for some comments being messy, it follows my thought process. Btw, @kouvel, I used some of your benchmarks from other PR (thread pool sustained/burst loads), but I had to heavily modify them to run under BenchmarkDotNet. The original benchmarks were producing so inconsistent results on my machine that they were unusable (difference between two consecutive runs was often close to 100%). It is possibly due to tiered JIT compilation not being properly accounted for in the initial warm-up phase. If you have any pointers to some good benchmarks for ThreadPool/Timer I would be happy to use them. |
Common to all of
At the moment, tiered compilation has to be disabled ( These tests have to be ported to BDN at some point. I have not yet for several reasons that I won't get into, but one is that the harness I use is more useful in many ways for build-to-build comparisons with finer control over iterations and managing measurement errors. It would be nice to port the tests to BDN at some point to make it easier for others to run and for tracking purposes. If you have ported some of the tests, please put up a PR if you can! :)
@benaadams' task tests are good ones: https://github.com/benaadams/ThreadPoolTaskTesting. They also should be ported to BDN into the performance repo at some point. |
I haven't looked at the code changes yet, I should be able to next week hopefully |
I suspected that would be the reason for it.
I haven't run into this in my tests and I don't see the code paths where it would actually create the circular dependency, but I may have just been lucky. Thanks, the rest more or less confirmed what I figured from the code in the past few days. Accordingly I currently lean towards the following plan:
I'll extract the good bits of this PR into separate PR and submit any bigger changes separately. Once I get to the point where all dependencies are resolved I can get back to this PR and keep it as simple as possible.
Great, I will take a look! |
Does this potentially bring us closer to a mostly managed thread pool implementation? Would it be possible to try this out in .NET Core as well? |
Yes and eventually yes. The goal is primarily to bring this to Mono. Actually bringing it to CoreCLR is out of scope for me, but if someone else is interested I do have a version of the code that builds on top of CoreCLR. I will be happy to share that once all the problems from this PR are resolved. It will need a lot of benchmarks and tweaking before it would be ready as a replacement though. |
|
From their uses in the thread pool I don't think there is a circular dependency issue. Those types were initially created for the wait subsystem (not for the thread pool), and they were used in the portable thread pool implementation because there were handy. I was just trying to convey the reason for their existence, I don't think the circular dependency issue is of concern for this change. |
I didn't mean to use single one. I was thinking about one per thread and waking up specific threads on
Last time I checked it seemed it would not be easy. I will check again before writing some code :) |
Yea that is possible, and we considered something like that if nothing else was available. I think it would be easier to find a lower-level primitive that would offer the same behavior. IO completion ports use LIFO release order for good reason, perhaps there is also a low-level primitive on Unixes that offers LIFO waits and if so that may do well for this purpose. |
Closing for now. Will reopen once I sort out all the dependencies. |
@marek-safar This introduces ThreadPool implementation with the FeaturePortableThreadPool flag into the shared partition. It should be possible to use it as-is in Mono, but I didn't try to build it yet.