-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release/7.0-staging] Make WindowsServiceLifetime gracefully stop #85656
[release/7.0-staging] Make WindowsServiceLifetime gracefully stop #85656
Conversation
* Make WindowsServiceLifetime gracefully stop WindowsServiceLifetime was not waiting for ServiceBase to stop the service. As a result we would sometimes end the process before notifying service control manager that the service had stopped -- resulting in an error in the eventlog and sometimes a service restart. We also were permitting multiple calls to Stop to occur - through SCM callbacks, and through public API. We must not call SetServiceStatus again once the service is marked as stopped. * Alternate approach to ensuring we only ever set STATE_STOPPED once. * Avoid calling ServiceBase.Stop on stopped service I fixed double-calling STATE_STOPPED in ServiceBase, but this fix will not be present on .NETFramework. Workaround that by avoiding calling ServiceBase.Stop when the service has already been stopped by SCM. * Add tests for WindowsServiceLifetime These tests leverage RemoteExecutor to avoid creating a separate service assembly. * Respond to feedback and add more tests. This better integrates with the RemoteExecutor component as well, by hooking up the service process and fetching its handle. This gives us the correct logging and exitcode handling from RemoteExecutor. * Honor Cancellation in StopAsync * Fix bindingRedirects in RemoteExecutor * Use Async lambdas for service testing * Fix issue on Win7 where duplicate service descriptions are disallowed * Respond to feedback * Fix comment and add timeout
Tagging subscribers to this area: @dotnet/area-extensions-hosting Issue DetailsBackport of #83892 DescriptionWindows services implemented with Microsoft.Extensions.Hosting.WindowsServices would not gracefully stop - due to exit without reporting STOPPED status - resulting in an error in Service Control Manager. If the service had configured error handling this would also cause the service to restart rather than stay stopped. In some cases the service would crash with on stop as well - generating WER dumps. The clean exit without reporting STOPPED was caused because nothing held up the main routine from exiting before SCM was notified that the service was stopped. This race condition was always present, but has gotten worse over the past two releases as we made the runtime faster - effectively giving the background thread less time to complete and report it's stop. The AV was caused by a double call to stop the service -- once happening when handing the SCM Stop request, and again from hosting when shutting down the application lifetime. This double call was violating the constraint of SetServiceStatus:
We've avoided calling stop twice and also guarded against user's calling ServiceBase.Stop when the service is already stopping. Customer ImpactNot all shutdown logic is allowed to run - which could result in data loss. Customers receive error reports about their services. Services may restart when intended to stop. TestingFix has been made in main for a month and validated on nightly builds by customers. We added numerous automated test cases to cover this scenario and cover the overall lifetime sequence. We also tested the service stop logic manually and through system shutdown (note that this bug still exists which causes problems on shutdown RiskRisk is low, but not zero due to the size of the change and the addition of cross-thread synchronization and locking. We've tried to mitigate this through lots of automated and manual testing. We also kept the locking to a minimum and run no user code under the added lock. Probably the biggest risk here is that we didn't completely fix everything folks care about in this path. For example: #83093 Note the changes to ServiceBase are not strictly required - which is good because we cannot change that type on .NETFramework where this same package is supported.
|
This is odd to me. I see that we've versioned Logging.Abstractions but I also see that we're building the package: Lines 9 to 10 in 12aaae4
I'm not sure why we'd hit this error. Building the package locally to see what's happening. |
I understand why it's happening. It's because we pull in the serviced project transitively through a reference to |
This package has been serviced and we compile against the serviced version of its assemblies. None of the directly referenced projects have been serviced so our package doesn't restore the serviced versions. Lift up the dependency on Logging.Abstractions to ensure we reference the serviced package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
OOB package authoring changes look good.
Just left some non-blocking questions.
<AutoGenerateBindingRedirects Condition="'$(AutoGenerateBindingRedirects)' == ''">true</AutoGenerateBindingRedirects> | ||
<GenerateBindingRedirectsOutputType Condition="'$(GenerateBindingRedirectsOutputType)' == ''">true</GenerateBindingRedirectsOutputType> | ||
</PropertyGroup> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the main PR I found no description for this change except "Fixing binding redirects for remote excutor". For my own education, would you mind explaining this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We weren't generating any bindingRedirects for tests, but they need them.
When running from XUnit we were working because XUnit has an AssemblyResolve hook that does a similar thing (though it can't do the exact same thing).
When I added out-of-proc services with RemoteExecutor -- really any use of remote executor -- loads failed since they didn't have this resolve handler. Ensuring the app.config is generated makes sure that remote executor gets it. It will also avoid us needing to rely on the AssemblyResolve hook in Xunit (and it's fragility).
Here's the discussion: #83892 (comment)
<GeneratePackageOnBuild>false</GeneratePackageOnBuild> | ||
<ServicingVersion>0</ServicingVersion> | ||
<GeneratePackageOnBuild>true</GeneratePackageOnBuild> | ||
<ServicingVersion>1</ServicingVersion> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for also adding the servicing bump to M.W.Compat.
@@ -27,6 +29,7 @@ | |||
|
|||
<ItemGroup> | |||
<ProjectReference Include="$(LibrariesProjectRoot)Microsoft.Extensions.Hosting\src\Microsoft.Extensions.Hosting.csproj" /> | |||
<ProjectReference Include="$(LibrariesProjectRoot)Microsoft.Extensions.Logging.Abstractions\src\Microsoft.Extensions.Logging.Abstractions.csproj" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What code in WindowsServiceLifetime.cs caused the need to add this new project reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is the transitive dependency problem you described here, right? #85656 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. It's a side-effect of incremental servicing and our package testing caught it. Good reason to pay attention to those failures 👍
@@ -31,6 +31,7 @@ public class ServiceBase : Component | |||
private bool _commandPropsFrozen; // set to true once we've use the Can... properties. | |||
private bool _disposed; | |||
private bool _initialized; | |||
private object _stopLock = new object(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lock was a great addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix didn't specifically require it but it helps users of ServiceBase from tripping over the same problem. Might be we should lock on all state changes, but for now I focused just on stop since that's documented to be bad to call more than once.
Backport of #83892
Fixes #62579 on 7.0
Description
Windows services implemented with Microsoft.Extensions.Hosting.WindowsServices would not gracefully stop - due to exit without reporting STOPPED status - resulting in an error in Service Control Manager. If the service had configured error handling this would also cause the service to restart rather than stay stopped. In some cases the service would crash with on stop as well - generating WER dumps.
The clean exit without reporting STOPPED was caused because nothing held up the main routine from exiting before SCM was notified that the service was stopped. This race condition was always present, but has gotten worse over the past two releases as we made the runtime faster - effectively giving the background thread less time to complete and report it's stop.
The AV was caused by a double call to stop the service -- once happening when handing the SCM Stop request, and again from hosting when shutting down the application lifetime. This double call was violating the constraint of SetServiceStatus:
We've avoided calling stop twice and also guarded against user's calling ServiceBase.Stop when the service is already stopping.
Customer Impact
Not all shutdown logic is allowed to run - which could result in data loss. Customers receive error reports about their services. Services may restart when intended to stop.
Testing
Fix has been made in main for a month and validated on nightly builds by customers. We added numerous automated test cases to cover this scenario and cover the overall lifetime sequence. We also tested the service stop logic manually and through system shutdown (note that this bug still exists which causes problems on shutdown
Risk
Risk is low, but not zero due to the size of the change and the addition of cross-thread synchronization and locking. We've tried to mitigate this through lots of automated and manual testing. We also kept the locking to a minimum and run no user code under the added lock. Probably the biggest risk here is that we didn't completely fix everything folks care about in this path. For example: #83093
Note the changes to ServiceBase are not strictly required - which is good because we cannot change that type on .NETFramework where this same package is supported.