Sync SDK - Async resource detection #1484

dyladan · 2020-09-02T13:02:50Z

This makes it so that the resource supplied to the providers may be a Resource or a Promise<Resource> which allows async detection to be applied without waiting for it to finish before starting the SDK.

Summary of changes:

Tracer and Meter Providers

config.resource may now be a Resource or a Promise<Resource>
this.resource is now Promise<Resource>

Tracer and Meter

config.resource may now be a Resource or a Promise<Resource>
this.resource is now Promise<Resource>

Span

this.resource property is now Promise<Resource>
On span start/end, the resource is now awaited before calling the span processor onStart/onEnd

Metric base class

constructor accepts a Promise<Resource> instead of Resource
this.resource is now Promise<Resource>
awaits the resource in getMetricRecord

*Metric classes

constructor accepts a Promise<Resource> instead of Resource

Tests

Many of the tests assumed that the export would happen synchronously. For instance, it was not uncommon to see the following in tests:

const span = tracer.startSpan(name);
span.end();
const spans = exporter.getFinishedSpans();
// assert with spans

Now that the span start/end may wait before calling the processor, it is necessary to yield to the event loop before getting finished spans:

const span = tracer.startSpan(name);
span.end();
await new Promise(resolve => setTimeout(resolve)); // wait for 0 ms
const spans = exporter.getFinishedSpans();
// assert with spans

codecov · 2020-09-02T13:11:21Z

Codecov Report

Merging #1484 into master will decrease coverage by 0.45%.
The diff coverage is 88.46%.

@@            Coverage Diff             @@
##           master    #1484      +/-   ##
==========================================
- Coverage   93.74%   93.28%   -0.46%     
==========================================
  Files         152      138      -14     
  Lines        4635     3858     -777     
  Branches      931      776     -155     
==========================================
- Hits         4345     3599     -746     
+ Misses        290      259      -31

Impacted Files	Coverage Δ
...es/opentelemetry-metrics/src/BaseObserverMetric.ts	`100.00% <ø> (ø)`
...s/opentelemetry-metrics/src/BatchObserverMetric.ts	`93.10% <ø> (ø)`
...ackages/opentelemetry-metrics/src/CounterMetric.ts	`100.00% <ø> (ø)`
...ges/opentelemetry-metrics/src/SumObserverMetric.ts	`100.00% <ø> (ø)`
...s/opentelemetry-metrics/src/UpDownCounterMetric.ts	`100.00% <ø> (ø)`
...entelemetry-metrics/src/UpDownSumObserverMetric.ts	`100.00% <ø> (ø)`
...s/opentelemetry-metrics/src/ValueObserverMetric.ts	`100.00% <ø> (ø)`
...s/opentelemetry-metrics/src/ValueRecorderMetric.ts	`100.00% <ø> (ø)`
packages/opentelemetry-metrics/src/types.ts	`100.00% <ø> (ø)`
packages/opentelemetry-tracing/src/Tracer.ts	`100.00% <ø> (ø)`
... and 22 more

obecny

Generally it looks very good.
I have one big concern with regards getFinishedSpans. The following piece of code even strengthen that

await new Promise(resolve => setTimeout(resolve));
const spans = memoryExporter.getFinishedSpans();

And because getFinishedSpans is a public api so people might be confused why it doesn't contain the finished spans after this change. Based on that I would make getFinishedSpans to be a Promise.
This way you would be able to call await memoryExporter.getFinishedSpans() which will look more intuitive.
Otherwise this might produce many unwanted bugs and confusion why something doesn't work and why I'm not getting finished spans if they are already finished. WDYT ?

dyladan · 2020-09-04T18:07:29Z

Generally it looks very good.
I have one big concern with regards getFinishedSpans. The following piece of code even strengthen that
await new Promise(resolve => setTimeout(resolve));
const spans = memoryExporter.getFinishedSpans();
And because getFinishedSpans is a public api so people might be confused why it doesn't contain the finished spans after this change. Based on that I would make getFinishedSpans to be a Promise.
This way you would be able to call await memoryExporter.getFinishedSpans() which will look more intuitive.
Otherwise this might produce many unwanted bugs and confusion why something doesn't work and why I'm not getting finished spans if they are already finished. WDYT ?

The problem is not that getFinishedSpans is sync. It is that the spans are actually not finished. When you call span.end(), spanProcessor.onEnd() is not called until the resource promise is resolved. Even if I wanted to make getFinishedSpans async to wait for spans to finish, there is no way for getFinishedSpans to know how many spans to wait for. For example:

const span1 = tracer.startSpan('1');
const span2 = tracer.startSpan('2');

span1.end();
await exporter.getFinishedSpans(); // does this wait for 1 or 2 spans?

Also, the exporter doesn't even know the span exists until the span processor tells it, and that only happens after the span ends. So there would be no way for the exporter to even know if any spans are started, let alone how many to wait for.

obecny · 2020-09-04T18:14:18Z

@dyladan I was thinking of keeping it as a promise so it can resolve with next tick so in the mentioned example

const span1 = tracer.startSpan('1');
const span2 = tracer.startSpan('2');
span1.end();
const finishedSpans = await exporter.getFinishedSpans();

finishedSpans would then contain span1

As currently the same piece of code would produce exactly the same effect (you would get span1 ). After change exporter.getFinishedSpans() will never return span1 even though it was finished. So it might be hard to guess how long you should really wait for that, whereas with the correct api call await getFinishedSpans you should get it.

obecny · 2020-09-04T18:17:01Z

Also, the exporter doesn't even know the span exists until the span processor tells it, and that only happens after the span ends. So there would be no way for the exporter to even know if any spans are started, let alone how many to wait for.

And method span.end is also sync so you could expect that the getFinishedSpans should already have it - like it is now

obecny · 2020-09-04T18:19:42Z

or we should then convert method span.end() to be a promise.

dyladan · 2020-09-04T19:43:30Z

~~@obecny this PR is currently broken because the global shutdown testing is broken. It doesn't seem to have anything to do with this code. I think I've passed some listener limit.~~

edit: fixed, but I still don't like the global SIGTERM handling. IMO that should be handled by the end user app

vmarchaud · 2020-09-05T09:21:59Z

Could we just have a TestMemoryExporter that we use internaly implement this await new Promise(resolve => setTimeout(resolve)); mechanism directly ? I understand why it need to be there but i fear that some people might forget about writing this line when making new tests. WDYT ?

dyladan · 2020-09-08T18:55:46Z

packages/opentelemetry-tracing/test/MultiSpanProcessor.test.ts

  });

-  it('should export spans on graceful shutdown from two span processor', () => {
+  it('should export spans on graceful shutdown from two span processors', done => {


@obecny this test fails in browser because the unload event handler is actually never called. It wasn't called previously, but it was never caught because the assertions are never run. I changed the test to have a done callback and wait for finish, and it never finishes.

Flarna · 2020-09-12T00:32:20Z

I expect that forcedFlush needs to be also adapted to wait a tick.
A typical usecase for forcedFlush is AWS lambda which may suspend forever just after the the function (represented by a span) is finished.

dyladan · 2020-09-14T15:28:01Z

@obecny @mwear The longer this PR is open and the more I work on it, the less I like this solution. My primary issue with it is the export pipeline becoming asynchronous. For an example of why this is a problem, take a lambda function. In lambda, you want to force flush on every invocation because you don't know if your function will be called again in the future and you don't want to lose the traces. Currently, it is possible to do something like this:

async function handler(event) {
	const span = tracer.startSpan("invoke");
    // some work
    span.end()
    await exporter.forceFlush();
    return someObj;
}

If we land this PR, the forceFlush will not export the span because the resource promise will not yet be resolved and the span will not yet have been sent to the span processor and exporter.

obecny · 2020-09-15T01:57:00Z

@obecny @mwear The longer this PR is open and the more I work on it, the less I like this solution. My primary issue with it is the export pipeline becoming asynchronous. For an example of why this is a problem, take a lambda function. In lambda, you want to force flush on every invocation because you don't know if your function will be called again in the future and you don't want to lose the traces. Currently, it is possible to do something like this:
async function handler(event) {
	const span = tracer.startSpan("invoke");
    // some work
    span.end()
    await exporter.forceFlush();
    return someObj;
}
If we land this PR, the forceFlush will not export the span because the resource promise will not yet be resolved and the span will not yet have been sent to the span processor and exporter.

That's why I originally deferred the span resource on end not on start. This way the span was already in processor and then you could simply wait for it until it ends. I was thinking of moving this logic to span processor and then defer the spans that needs to wait for the resource.

dyladan · 2020-09-15T13:08:35Z

@obecny @mwear The longer this PR is open and the more I work on it, the less I like this solution. My primary issue with it is the export pipeline becoming asynchronous. For an example of why this is a problem, take a lambda function. In lambda, you want to force flush on every invocation because you don't know if your function will be called again in the future and you don't want to lose the traces. Currently, it is possible to do something like this:
async function handler(event) {
	const span = tracer.startSpan("invoke");
    // some work
    span.end()
    await exporter.forceFlush();
    return someObj;
}
If we land this PR, the forceFlush will not export the span because the resource promise will not yet be resolved and the span will not yet have been sent to the span processor and exporter.
That's why I originally deferred the span resource on end not on start. This way the span was already in processor and then you could simply wait for it until it ends. I was thinking of moving this logic to span processor and then defer the spans that needs to wait for the resource.

That still has the same problem of making the export pipeline async. I am working on another approach right now that makes the Resource.getAttributes return a promise which should be awaited in the exporters. It is a much simpler change, allows the sync sdk creation, and defers awaiting resources until as late as possible in the chain.

obecny · 2020-10-02T19:10:49Z

@dyladan what next with this ?

dyladan · 2020-10-02T20:40:53Z

After talking to the maintainers group and the spec, they want us to hold off. None of the other sigs have async/remote resource detection and if the user needs that they're expected to do it themselves. They want to specify something here after GA

dobesv · 2020-10-28T06:02:51Z

What is the current workaround for sync startup, then? So far I just set autoDetectResources: false which seems to let the system put its hooks into all the modules in a sync manner, but I suppose I might lose something from this, I don't yet know what.

vmarchaud · 2020-10-28T07:03:58Z

Well i've wrote this workaround that work for me:

import { detectResources, Resource } from '@opentelemetry/resources'
import { gcpDetector } from '@opentelemetry/resource-detector-gcp'

export class CustomResource extends Resource {
  public attributes: ResourceAttributes = {}

  addAttributes (attributes: ResourceAttributes) {
    this.attributes = Object.assign(this.attributes, attributes)
    return this
  }
}

const run = () => {
  detectResources({
    detectors: [ gcpDetector ],
    logger
  }).then((detectedResources) => {
    resource.addAttributes(detectedResources.attributes)
  }).catch(err => {
    logger?.error(`Error while detecting ressources`, err)
  })
  const provider = new NodeTracerProvider({
    resource
  })
  api.trace.setGlobalTracerProvider(provider)
}

However this is non-compliant in regard of the spec, sadly i think that would be the easiest way to solve this problem :/

dobesv · 2020-10-28T16:19:03Z

Could resources be "pre-detected", maybe put an environment variable that specifies exactly what is being detected there in a way it can be loaded synchronously? I'm not running this in a variety of environments, so I don't need auto-detection.

dyladan · 2020-10-30T16:01:05Z

Could resources be "pre-detected", maybe put an environment variable that specifies exactly what is being detected there in a way it can be loaded synchronously? I'm not running this in a variety of environments, so I don't need auto-detection.

Definitely. You can always pass a resource to the constructor of the TracerProvider or MeterProvider

tedsuo · 2020-11-09T09:36:42Z

FWIW I have been using this pattern for async loading, and it's been working for me:

Assume my service is started by calling node server.js. In a new, separate initialization file, server_init.js, I manage my NodeSDK lifecycle. Here, I only require my original application after the SDK finishes loading.

const opentelemetry = require("@opentelemetry/sdk-node");
const process = require("process");

const sdk = new opentelemetry.NodeSDK({
// configure exporters,  etc
})

// start the sdk and wait for any installed resource detection to run.
sdk.start().then(() => {
  // require your original application startup file here.
  require('./server');
});

function shutdown(){
  sdk.shutdown()
    .then(
      () => console.log("SDK shut down successfully"),
      (err) => console.log("Error shutting down SDK", err),
    )
    .finally(() => process.exit(0))
};

process.on('beforeExit', shutdown);
process.on('SIGINT', shutdown);
process.on('SIGTERM', shutdown);

This allows for async sdk loading, without the risk of pre-loading any of my application modules or making a hash out of my original application startup in server.js. It also makes it easy to copy-paste and apply to many services, and it also allows me to easily run my application without loading opentelemetry, should I want to do that.

Obviously, if async resources are going away, a sync approach is easier. But the approach above feels clean, and requires no changes to what we have today.

Am I missing something important? Every other approach I have tried has been a pretzel, but this two-phase pattern feels elegant, or at least easy.

dobesv · 2020-11-09T17:31:15Z

The problem with the two-phase approach is that the scripts I am running with instrumentation are not instrumentation-aware and use if(require.main === module) { ... } and they use process.argv to parse their arguments. If I replace the main script I'll have to change the startup convention for all those scripts, which is a nuisance. I was hoping to retain the same sync startup method I was able to do with elastic-apm-node where I can just pass node -r path/to/instrument.js to load instrumentation.

tedsuo · 2020-11-10T00:11:58Z

Thank for the clarification @dobesv, I had forgotten about that pattern.

anuraaga · 2021-06-24T08:33:50Z

Hi @dyladan - just curious if this PR may get resurrected? The issue of resource detection being async has come up in the lambda support because the SDK may not be ready on the first due to the detectors being async, it would be nice if there's a way to allow the Resource to not block initialization of the SDK

aws-observability/aws-otel-lambda#106

dyladan · 2021-06-30T13:29:03Z

Hi @dyladan - just curious if this PR may get resurrected? The issue of resource detection being async has come up in the lambda support because the SDK may not be ready on the first due to the detectors being async, it would be nice if there's a way to allow the Resource to not block initialization of the SDK

aws-observability/aws-otel-lambda#106

No chance. It actually isn't implementable this way in the current spec because the spec requires spanProcessor.onStart to run synchronously when span is created and spanProcessor.onEnd to run synchronously when it ends. Since the ReadableSpan needs to have access to the resource, there is no way to wait for it before calling onEnd without making it async.

dyladan · 2021-06-30T13:30:19Z

There have been quite a few conversations around making resources appendable, which would in my opinion be a much better solution to this problem, but every time someone tries to make the spec change there are people who fight it. I have tried several times and at this point it feels to me like it's just not going to happen.

* chore: release main * chore: release main

dyladan added 3 commits September 1, 2020 16:46

chore: wip sync sdk

64abb52

chore: update http tests

fcb4e7f

chore: grpc tests

df94801

dyladan requested review from legendecas, markwolff, mayurkale22, mwear, naseemkullah, obecny, OlivierAlbertini and vmarchaud as code owners September 2, 2020 13:02

dyladan added the enhancement New feature or request label Sep 2, 2020

chore: lint

5195c8c

chore: remove browser incompatible setImmediate

003c27f

dyladan mentioned this pull request Sep 2, 2020

chore: more advanced types for sync/async resource obecny/opentelemetry-js#2

Closed

dyladan linked an issue Sep 2, 2020 that may be closed by this pull request

Resource auto-detection and SDK initialization. #1410

Closed

2 tasks

dyladan mentioned this pull request Sep 2, 2020

chore: refactoring sdk start so that it can be a real sync #1400

Closed

obecny reviewed Sep 4, 2020

View reviewed changes

dyladan added 3 commits September 4, 2020 14:27

Merge remote-tracking branch 'origin/master' into sync-sdk

aa200a2

chore: fix tests

e5ce7b9

chore: revert shut down changes

90c1fb9

dyladan added 2 commits September 4, 2020 15:58

chore: fix tests

63fb976

chore: lint

2b47712

dyladan commented Sep 8, 2020

View reviewed changes

chore: dispatch unload even instead of window.close in tests

d303360

dyladan mentioned this pull request Sep 14, 2020

Handling of async resources #1533

Closed

mwear mentioned this pull request Sep 18, 2020

Update getting started guide to use node-sdk package #1534

Closed

Base automatically changed from master to main January 25, 2021 19:26

dyladan requested review from Flarna and johnbley as code owners January 25, 2021 19:26

anuraaga mentioned this pull request Jun 24, 2021

get tracer in aws auto instrumented lambda aws-observability/aws-otel-lambda#106

Closed

dyladan closed this Jun 30, 2021

vmarchaud mentioned this pull request Aug 7, 2021

Resource doesn't use env detector to detect OTEL_RESOURCE_ATTRIBUTES #2259

Closed

maorleger mentioned this pull request Aug 26, 2021

Add nodejs container and demo lmolkova/azuresdk_tracing_demo#1

Merged

pichlermarc pushed a commit to dynatrace-oss-contrib/opentelemetry-js that referenced this pull request Dec 15, 2023

chore: release main (open-telemetry#1484)

418b6f6

* chore: release main * chore: release main

martinkuba pushed a commit to martinkuba/opentelemetry-js that referenced this pull request Mar 13, 2024

chore: release main (open-telemetry#1484)

9af37b3

* chore: release main * chore: release main

martinkuba pushed a commit to martinkuba/opentelemetry-js that referenced this pull request Mar 16, 2024

chore: release main (open-telemetry#1484)

a8712c7

* chore: release main * chore: release main

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync SDK - Async resource detection #1484

Sync SDK - Async resource detection #1484

dyladan commented Sep 2, 2020 •

edited

Loading

codecov bot commented Sep 2, 2020 •

edited

Loading

obecny left a comment

dyladan commented Sep 4, 2020

obecny commented Sep 4, 2020

obecny commented Sep 4, 2020

obecny commented Sep 4, 2020

dyladan commented Sep 4, 2020 •

edited

Loading

vmarchaud commented Sep 5, 2020

dyladan Sep 8, 2020

Flarna commented Sep 12, 2020

dyladan commented Sep 14, 2020

obecny commented Sep 15, 2020

dyladan commented Sep 15, 2020

obecny commented Oct 2, 2020

dyladan commented Oct 2, 2020

dobesv commented Oct 28, 2020

vmarchaud commented Oct 28, 2020

dobesv commented Oct 28, 2020

dyladan commented Oct 30, 2020

tedsuo commented Nov 9, 2020

dobesv commented Nov 9, 2020

tedsuo commented Nov 10, 2020

anuraaga commented Jun 24, 2021

dyladan commented Jun 30, 2021

dyladan commented Jun 30, 2021

Sync SDK - Async resource detection #1484

Sync SDK - Async resource detection #1484

Conversation

dyladan commented Sep 2, 2020 • edited Loading

Summary of changes:

Tracer and Meter Providers

Tracer and Meter

Span

Metric base class

*Metric classes

Tests

codecov bot commented Sep 2, 2020 • edited Loading

Codecov Report

obecny left a comment

Choose a reason for hiding this comment

dyladan commented Sep 4, 2020

obecny commented Sep 4, 2020

obecny commented Sep 4, 2020

obecny commented Sep 4, 2020

dyladan commented Sep 4, 2020 • edited Loading

vmarchaud commented Sep 5, 2020

dyladan Sep 8, 2020

Choose a reason for hiding this comment

Flarna commented Sep 12, 2020

dyladan commented Sep 14, 2020

obecny commented Sep 15, 2020

dyladan commented Sep 15, 2020

obecny commented Oct 2, 2020

dyladan commented Oct 2, 2020

dobesv commented Oct 28, 2020

vmarchaud commented Oct 28, 2020

dobesv commented Oct 28, 2020

dyladan commented Oct 30, 2020

tedsuo commented Nov 9, 2020

dobesv commented Nov 9, 2020

tedsuo commented Nov 10, 2020

anuraaga commented Jun 24, 2021

dyladan commented Jun 30, 2021

dyladan commented Jun 30, 2021

dyladan commented Sep 2, 2020 •

edited

Loading

codecov bot commented Sep 2, 2020 •

edited

Loading

dyladan commented Sep 4, 2020 •

edited

Loading