Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unterminated EtwSessions #897

Closed
fuzolan opened this issue Jul 9, 2020 · 6 comments · Fixed by #1124
Closed

Unterminated EtwSessions #897

fuzolan opened this issue Jul 9, 2020 · 6 comments · Fixed by #1124
Labels
bug Something isn't working

Comments

@fuzolan
Copy link

fuzolan commented Jul 9, 2020

I have an app on IIS. It's a Core Project which runs under .net full framework.
Everything ist fine for some time and after a while a Error about the gc Metrics is thrown

[02:06:20 Warning] Elastic.Apm {SystemTotalCpuProvider} TraceEventSession initialization failed - GC metrics won't be collected System.Runtime.InteropServices.COMException (0x800705AA): Nicht gen�gend Systemressourcen, um den angeforderten Dienst auszuf�hren. (Ausnahme von HRESULT: 0x800705AA) bei System.Runtime.InteropServices.Marshal.ThrowExceptionForHRInternal(Int32 errorCode, IntPtr errorInfo) bei Microsoft.Diagnostics.Tracing.Session.TraceEventSession.EnableProvider(Guid providerGuid, TraceEventLevel providerLevel, UInt64 matchAnyKeywords, TraceEventProviderOptions options) bei Elastic.Apm.Metrics.MetricsProvider.GcMetricsProvider.<.ctor>b__15_0()

If I start logman query -ets then multiple sessions are registered:
EtwSessionForCLRElasticApm_c4ec3ef4-a0ec-4728-9f2e-467f0f6c3873 Ablaufverfolgung Wird ausgeführt EtwSessionForCLRElasticApm_ed27d9c7-32d6-4fb6-89d6-1df9c94d1a13 Ablaufverfolgung Wird ausgeführt EtwSessionForCLRElasticApm_bb52852d-149b-4553-9ee4-c919410768d8 Ablaufverfolgung Wird ausgeführt EtwSessionForCLRElasticApm_13b0a9f6-4deb-41d1-93fd-3ae720d25f50 Ablaufverfolgung Wird ausgeführt EtwSessionForCLRElasticApm_af988b00-cf24-4aa2-9f95-323c998ff07f Ablaufverfolgung Wird ausgeführt EtwSessionForCLRElasticApm_b416b7d5-d038-47a1-a9a6-db4eaa82d6c3 Ablaufverfolgung Wird ausgeführt EtwSessionForCLRElasticApm_5a71802d-8e58-4611-abc2-649c658a8b59 Ablaufverfolgung Wird ausgeführt EtwSessionForCLRElasticApm_1b85478c-cee8-4a55-8744-615d60eab5bd Ablaufverfolgung Wird ausgeführt EtwSessionForCLRElasticApm_df4a69c2-a513-4a8e-9bfc-a675147520c8 Ablaufverfolgung Wird ausgeführt EtwSessionForCLRElasticApm_ce828e91-0b60-46d9-a685-90c388521aea Ablaufverfolgung Wird ausgeführt EtwSessionForCLRElasticApm_d09e3f1a-876d-4ee8-a20f-2004671d0f60 Ablaufverfolgung Wird ausgeführt EtwSessionForCLRElasticApm_929b327f-e251-4b08-8d26-8e5d03f03e57 Ablaufverfolgung Wird ausgeführt EtwSessionForCLRElasticApm_2091e548-8a59-452c-ad3e-170bab302aa9 Ablaufverfolgung Wird ausgeführt EtwSessionForCLRElasticApm_bc5e9af9-9f6b-43bd-b58e-006fcdfb50b8 Ablaufverfolgung Wird ausgeführt EtwSessionForCLRElasticApm_733a117d-4686-49ed-a6fe-e5094660b9f5 Ablaufverfolgung Wird ausgeführt EtwSessionForCLRElasticApm_f683fd92-0110-4db4-9659-cf1314102c94 Ablaufverfolgung Wird ausgeführt EtwSessionForCLRElasticApm_0cd42c20-c694-40ff-9a85-343152b6bdff Ablaufverfolgung Wird ausgeführt EtwSessionForCLRElasticApm_48b3f16b-a970-4798-af75-7c2b7abf0460 Ablaufverfolgung Wird ausgeführt EtwSessionForCLRElasticApm_5ae413de-18d4-4e44-a988-076ae2191b7e Ablaufverfolgung Wird ausgeführt EtwSessionForCLRElasticApm_ef80e1d6-cbee-46a5-859d-2ab1fd125e67 Ablaufverfolgung Wird ausgeführt EtwSessionForCLRElasticApm_602e5d76-d36f-418a-9f40-ca0e9d488a50 Ablaufverfolgung Wird ausgeführt

What should I do?

@fuzolan fuzolan added the bug Something isn't working label Jul 9, 2020
@gregkalapos
Copy link
Contributor

As a quick (and definitely not perfect) workaround you can kill the leftover ETW processes with logman stop <SessionName> -ets.

@fuzolan
Copy link
Author

fuzolan commented Jul 9, 2020

thank you!

You can also use the windows gui "perfmon" to kill it via mouse

So it's perhaps better to use the servicename instead of a new guid in GcMetricsProvider.

var sessionName = SessionNamePrefix + Guid.NewGuid();

@WillGunn
Copy link

WillGunn commented Jan 6, 2021

This issue is reproducible in our aspnet core service, and I may have found the culprit.

based on the sample code from MS, you should be calling dispose on the source first:
https://github.com/microsoft/dotnet-samples/blob/master/Microsoft.Diagnostics.Tracing/TraceEvent/TraceEvent/10_SimpleEventSourceMonitor.cs#L145

For good measure, you should probably also call _traceEventSession.Stop()
https://github.com/microsoft/perfview/blob/master/src/TraceEvent/TraceEventSession.cs#L828

I can open a PR if you would like.

WillGunn pushed a commit to WillGunn/apm-agent-dotnet that referenced this issue Jan 12, 2021
build and run the new sample project standalone, not in IIS Express
in powershell run Get-EtwTraceSession -Name EtwSessionForCLRElastic*
quit sample project
run Get-EtwTraceSession -Name EtwSessionForCLRElastic* again
you'll see there is a leftover Etw Trace Session
@WillGunn
Copy link

To help diagnose, I setup a reproduction sample app that mimics our internal setup close enough.
#1120

@russcam
Copy link
Contributor

russcam commented Jan 13, 2021

This has been fixed in #1106, in f23b044. I've added an additional assertion in 138f5c8 to verify the behaviour (which currently fails on the master branch).

By default, TraceEventSession is configured to stop the session on dispose, and dispose the source

namespace Microsoft.Diagnostics.Tracing.Session
{
  /// <summary>
  /// A TraceEventSession represents a single ETW Tracing Session.   A session is and event sink that
  /// can enable or disable event logging from event providers).    TraceEventSessions can log their
  /// events either to a file, or by issuing callbacks when events arrive (a so-called 'real time'
  /// session).
  /// <para>
  /// Session are MACHINE wide and unlike most OS resources the operating system does NOT reclaim
  /// them when the process that created it dies.  By default TraceEventSession tries is best to
  /// do this reclamation, but it is possible that for 'orphan' session to accidentally survive
  /// if the process is ended abruptly (e.g. by the debugger or a user explicitly killing it).  It is
  /// possible to turn off TraceEventSession automatic reclamation by setting the StopOnDispose
  /// property to false (its default is true).
  /// </para>
  /// <para>
  /// Kernel events have additional restrictions.   In particular there is a special API (EnableKernelProvider).
  /// Before Windows 8, there was a restriction that kernel events could only be enabled from a session
  /// with a special name (see KernelTraceEventParser.KernelSessionName) and thus there could only be a single
  /// session that could log kernel events (and that session could not log non-kernel events).  These
  /// restrictions were dropped in windows 8.
  /// </para>
  /// </summary>
  public sealed class TraceEventSession : IDisposable
  {
    // ... other implementation details
   
    /// <summary>
    /// Close the session and clean up any resources associated with the session.     It is OK to call this more than once.
    /// This API is OK to call from one thread while Process() is being run on another.   Calling Dispose is on
    /// a real time session is the way you can force a real time session to stop in a timely manner.
    /// </summary>
    public void Dispose()
    {
      lock (this)
      {
        if (this.m_StopOnDispose)
        {
          this.m_StopOnDispose = false;
          this.Stop(true);
        }
        if (this.m_SessionHandle != ulong.MaxValue)
        {
          TraceEventNativeMethods.CloseTrace(this.m_SessionHandle);
          this.m_SessionHandle = ulong.MaxValue;
        }
        if (this.m_source != null)
        {
          this.m_source.Dispose();
          this.m_source = (ETWTraceEventSource) null;
        }
        if (this.m_kernelSession != null)
        {
          this.m_kernelSession.Dispose();
          this.m_kernelSession = (TraceEventSession) null;
        }
        GC.SuppressFinalize((object) this);
      }
    }
}

The issue is caused by the using around _traceEventSession in the GcMetricsProvider ctor, and the Task.Run() inside it, which leaves the session in a running state that can't be deleted on disposing the GcMetricsProvider. @gregkalapos, should I split the commits out that fix this, from #1106?

@WillGunn Thank you for the reproduction sample app. Rather than opening a reproduction as a PR in future, you can just link to the branch in your fork/clone that has the sample on the issue.

@gregkalapos
Copy link
Contributor

@gregkalapos, should I split the commits out that fix this, from #1106?

I think the easiest would be to just merge #1106 as it is.

russcam added a commit to russcam/apm-agent-dotnet that referenced this issue Jan 13, 2021
This commit fixes a bug in collecting GC Metrics on
.NET Framework where the underlying ETW session created by
TraceEventSession was not stopped and deleted when
GcMetricsProvider was disposed. The issue arises due
to the using around _traceEventSession in the
GcMetricsProvider's ctor, and the Task.Run() inside it,
which leaves the session in a running state that can't
be deleted on disposing the GcMetricsProvider.

An internal TraceEventSession name property exposes
the name of the session to allow tests to assert that
the session is deleted on dispose.

Fixes elastic#897
russcam added a commit that referenced this issue Jan 13, 2021
This commit fixes a bug in collecting GC Metrics on
.NET Framework where the underlying ETW session created by
TraceEventSession was not stopped and deleted when
GcMetricsProvider was disposed. The issue arises due
to the using around _traceEventSession in the
GcMetricsProvider's ctor, and the Task.Run() inside it,
which leaves the session in a running state that can't
be deleted on disposing the GcMetricsProvider.

An internal TraceEventSession name property exposes
the name of the session to allow tests to assert that
the session is deleted on dispose.

Fixes #897
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants