AppInsights blocks Node.JS process from exiting for hours #958

PavelBansky · 2022-05-02T23:01:34Z

Even very simple telemetry submission code is preventing NodeJS process from exiting. Sometimes for minutes, sometimes for hours. The problem is especially noticeable on Mac OS X, where the delays are always at least a minutes.

This is being used in Azure DevOps extension, that adds build task into the pipeline. Adding minutes or hours to the build is really a big problem.

Removing AppInisghts makes the code run under a second.

  try {
      addBuildVariableTelemetry();           
      const totalTimeTaken = Math.round((Date.now() - startTime) / 1000); // get time in seconds
      addMetric("ExecutionTime", totalTimeTaken);             

      const telemetryClient = new appInsights.TelemetryClient(TELEMETRY_KEY);
      
      /* None of this helps 
      telemetryClient.config.maxBatchIntervalMs = 0;
      telemetryClient.config.maxBatchSize = 0;        
      telemetryClient.config.httpsAgent = new https.Agent();
      telemetryClient.config.enableAutoCollectPerformance = false ; 
      telemetryClient.config.enableUseAsyncHooks = false;
      */
      
      telemetryClient.trackEvent({ name: taskName, properties: EventProperties as any, measurements: Metrics as any });
      if (Traces.length > 0) {
          telemetryClient.trackTrace({ message: Traces.join("\n"), severity: appInsights.Contracts.SeverityLevel.Information, properties: EventProperties as any});
      }

      telemetryClient.flush();        
      tl.debug("Telemetry submitted");
      hasPublishedFinalTelemetry = true;
  }
  catch (error) {
      console.log("##vso[task.LogIssue type=warning;]Telemetry wrapper:", error); // don't let telemetry crash us
  }

hectorhdzg · 2022-05-03T21:24:23Z

@PavelBansky I need more information to determine the problem, hope you can help with that, can you enable debug and warning logs to see if there are any errors coming from the SDK?, also flush have a callback parameter that will be called when completed, can you try to wait for that callback? I wonder if this is related to the way Azure DevOps run the node.js process

const telemetryClient = new appInsights.TelemetryClient();
telemetryClient.config.enableInternalDebugLogging = true;
telemetryClient.config.enableInternalWarningLogging = true;
telemetryClient.trackEvent({
    name: "TestEvent"
});
telemetryClient.flush({ callback: () => { console.log("Done flushing") } });

patroza · 2022-05-06T10:49:27Z

This is real, we have a similar experience when upgrading from 2.3.1 to 2.3.2; on GitHub actions, e2e tests that spin up the http server (per test or file), which is wrapped with application insights, only the first out of 6 files completes, all tests from the other 5 fail with test timeout.
Reverting application insights (after a long and painful process of elimination) revealed the culprit.
Don't have a small repro available though.

PavelBansky · 2022-05-06T16:55:35Z

@hectorhdzg , I have tried both. Callback and not callback the result is always the same.

function publishTelemetryEvents(isFinalTelemetry: boolean, callback: (v: string) => void, attemptNumber = 1) {
    const maxAttempts = 5;

    if (hasPublishedFinalTelemetry) {
        // do not log telemetry after we have already logged final telemetry
        // this also stops recursive calls from process.beforeExit handler
        return;
    }

    try {
        addBuildVariableTelemetry();           
        const totalTimeTaken = Math.round((Date.now() - startTime) / 1000); // get time in seconds
        addMetric("ExecutionTime", totalTimeTaken);

        const telemetryClient = new appInsights.TelemetryClient(TELEMETRY_KEY);
        telemetryClient.trackEvent({ name: taskName, properties: EventProperties as any, measurements: Metrics as any });
        if (Traces.length > 0) {
            telemetryClient.trackTrace({ message: Traces.join("\n"), severity: appInsights.Contracts.SeverityLevel.Information, properties: EventProperties as any});
        }
        telemetryClient.flush({
            callback: (response) => {
                try {
                    const parsedObject = JSON.parse(response);
                    if (parsedObject.errors.length) {
                        throw parsedObject.errors; // this will reach the outer catch and is handled the same as an unparseable response.
                    }
                } catch (parseError) {
                    if (attemptNumber < maxAttempts) {
                        // This is a failed telemetry send case. If we encounter an error, we retry, unless we reach max attempts..
                        publishTelemetryEvents(isFinalTelemetry, callback, attemptNumber + 1);
                        return; 
                    }
                    console.log("##vso[task.LogIssue type=warning;]Telemetry flush:", parseError);
                }
                hasPublishedFinalTelemetry = hasPublishedFinalTelemetry || isFinalTelemetry;
                tl.debug("Telemetry submitted");

                callback(response);
            }
        });
    }
    catch (error) {
        console.log("##vso[task.LogIssue type=warning;]Telemetry wrapper:", error); // don't let telemetry crash us
    }
}

I will try with the debugging enabled.

PavelBansky · 2022-05-09T19:05:12Z

@hectorhdzg : I have enabled logging and see no warnings or errors on the console.

telemetryClient.config.enableInternalDebugLogging = true;
telemetryClient.config.enableInternalWarningLogging = true;

At this point we will have to drop AppInsights all together until this is resolved :(

hectorhdzg · 2022-05-09T19:12:46Z

@PavelBansky I don't have repro of the issue and there are not logs so not much I can do here, if you can share you Azure DevOps build privately will be helpful

antoine-coulon · 2022-07-01T15:21:35Z

Hey @hectorhdzg,

I'm currently having the same problem here on 2.3.3 on MacOS with Node.js 16.14.0 with a very minimalist repro:

import { post } from "superagent";
import * as appInsights from "applicationinsights";

appInsights
  .setup(
    "InstrumentationKey=ENV_KEY;IngestionEndpoint=https://westeurope-5.in.applicationinsights.azure.com"
  )
  .start();

post("https://westeurope-5.in.applicationinsights.azure.com/v2.1/track")
  .send([fakeTelemetryRequestBody])
  .then(appInsights.defaultClient.flush)
  .catch(console.error);

When starting the script, the Node.js process hangs during few minutes before exiting 7 times out of 10 (from what I could observe). It happens that the process exits directly but it's clearly not consistent.

Do you have any information to share about that?

Thanks

hectorhdzg · 2022-07-05T23:52:19Z

@antoine-coulon flush method have a callback you can pass to ensure is done; can you try that?

(async function main() {
    post("https://www.bing.com/")
        .send([{}])
        .then(async () => {
            await new Promise((resolve) => {
                appInsights.defaultClient.flush({
                    callback: (msg) => {
                        console.log(msg);
                        resolve();
                    }
                });
            });
        })
        .catch(console.error);
})();

antoine-coulon · 2022-07-06T15:19:24Z

@hectorhdzg I already tried it as you were already suggesting it but it doesn't work either.
The callback is invoked with {"itemsReceived":1,"itemsAccepted":1,"errors":[]} which seems expected after the trigger of the HTTP request. However, the process still hangs afterwards.

Here is an example with an even more minimalist reproduction:

appInsights.setup(
  "InstrumentationKey=KEY;IngestionEndpoint=https://westeurope-5.in.applicationinsights.azure.com"
);

appInsights.start();
appInsights.defaultClient.flush();

Do you agree on the fact that this should allow the Node.js process to exit? Even if flush is asynchronous, as there is nothing to process everything should be flushed almost directly allowing resources to be teared down

hectorhdzg · 2022-07-06T20:48:12Z

@antoine-coulon I cannot reproduce in Windows or Linux but I'm working on getting a MacOS machine to try it out, is there something else involved in your repro? any other packages or code that could be affecting the behavior?

antoine-coulon · 2022-07-06T22:12:10Z

@hectorhdzg no nothing else, I've just tested it in a raw project with only the appinsights dependency.

As a reminder here is my setup:
OS: MacOS Big Sur 11.5.1
Node.js: 16.14.0 & 18.4.0
appinsights: *

JamieMagee · 2022-08-02T00:39:35Z

Is it possible to add timeouts to all HTTP calls? This falls under CWE-1088¹:

CWE-1088: Synchronous Access of Remote Resource without Timeout

The code has a synchronous call to a remote resource, but there is no timeout for the call, or the timeout is set to infinite.
This issue can prevent the software from running reliably, since an outage for the remote resource can cause the software to hang. If the relevant code is reachable by an attacker, then this reliability problem might introduce a vulnerability.

https://cwe.mitre.org/data/definitions/1088.html ↩

hectorhdzg added the investigate label May 3, 2022

hectorhdzg mentioned this issue Jul 12, 2022

Adding timeout to Azure Metadata service call #988

Merged

hectorhdzg removed the investigate label Aug 2, 2022

hectorhdzg assigned JacksonWeber Aug 2, 2022

JacksonWeber mentioned this issue Aug 9, 2022

Add HTTP Request Timeout #1001

Merged

JacksonWeber closed this as completed in #1001 Aug 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AppInsights blocks Node.JS process from exiting for hours #958

AppInsights blocks Node.JS process from exiting for hours #958

PavelBansky commented May 2, 2022

hectorhdzg commented May 3, 2022

patroza commented May 6, 2022

PavelBansky commented May 6, 2022

PavelBansky commented May 9, 2022

hectorhdzg commented May 9, 2022

antoine-coulon commented Jul 1, 2022

hectorhdzg commented Jul 5, 2022

antoine-coulon commented Jul 6, 2022

hectorhdzg commented Jul 6, 2022

antoine-coulon commented Jul 6, 2022

JamieMagee commented Aug 2, 2022

AppInsights blocks Node.JS process from exiting for hours #958

AppInsights blocks Node.JS process from exiting for hours #958

Comments

PavelBansky commented May 2, 2022

hectorhdzg commented May 3, 2022

patroza commented May 6, 2022

PavelBansky commented May 6, 2022

PavelBansky commented May 9, 2022

hectorhdzg commented May 9, 2022

antoine-coulon commented Jul 1, 2022

hectorhdzg commented Jul 5, 2022

antoine-coulon commented Jul 6, 2022

hectorhdzg commented Jul 6, 2022

antoine-coulon commented Jul 6, 2022

JamieMagee commented Aug 2, 2022

Footnotes