Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync SDK - Async resource detection #1484

Closed
wants to merge 12 commits into from

Conversation

dyladan
Copy link
Member

@dyladan dyladan commented Sep 2, 2020

This makes it so that the resource supplied to the providers may be a Resource or a Promise<Resource> which allows async detection to be applied without waiting for it to finish before starting the SDK.

Summary of changes:

Tracer and Meter Providers

  • config.resource may now be a Resource or a Promise<Resource>
  • this.resource is now Promise<Resource>

Tracer and Meter

  • config.resource may now be a Resource or a Promise<Resource>
  • this.resource is now Promise<Resource>

Span

  • this.resource property is now Promise<Resource>
  • On span start/end, the resource is now awaited before calling the span processor onStart/onEnd

Metric base class

  • constructor accepts a Promise<Resource> instead of Resource
  • this.resource is now Promise<Resource>
  • awaits the resource in getMetricRecord

*Metric classes

  • constructor accepts a Promise<Resource> instead of Resource

Tests

Many of the tests assumed that the export would happen synchronously. For instance, it was not uncommon to see the following in tests:

const span = tracer.startSpan(name);
span.end();
const spans = exporter.getFinishedSpans();
// assert with spans

Now that the span start/end may wait before calling the processor, it is necessary to yield to the event loop before getting finished spans:

const span = tracer.startSpan(name);
span.end();
await new Promise(resolve => setTimeout(resolve)); // wait for 0 ms
const spans = exporter.getFinishedSpans();
// assert with spans

@codecov
Copy link

codecov bot commented Sep 2, 2020

Codecov Report

Merging #1484 into master will decrease coverage by 0.45%.
The diff coverage is 88.46%.

@@            Coverage Diff             @@
##           master    #1484      +/-   ##
==========================================
- Coverage   93.74%   93.28%   -0.46%     
==========================================
  Files         152      138      -14     
  Lines        4635     3858     -777     
  Branches      931      776     -155     
==========================================
- Hits         4345     3599     -746     
+ Misses        290      259      -31     
Impacted Files Coverage Δ
...es/opentelemetry-metrics/src/BaseObserverMetric.ts 100.00% <ø> (ø)
...s/opentelemetry-metrics/src/BatchObserverMetric.ts 93.10% <ø> (ø)
...ackages/opentelemetry-metrics/src/CounterMetric.ts 100.00% <ø> (ø)
...ges/opentelemetry-metrics/src/SumObserverMetric.ts 100.00% <ø> (ø)
...s/opentelemetry-metrics/src/UpDownCounterMetric.ts 100.00% <ø> (ø)
...entelemetry-metrics/src/UpDownSumObserverMetric.ts 100.00% <ø> (ø)
...s/opentelemetry-metrics/src/ValueObserverMetric.ts 100.00% <ø> (ø)
...s/opentelemetry-metrics/src/ValueRecorderMetric.ts 100.00% <ø> (ø)
packages/opentelemetry-metrics/src/types.ts 100.00% <ø> (ø)
packages/opentelemetry-tracing/src/Tracer.ts 100.00% <ø> (ø)
... and 22 more

Copy link
Member

@obecny obecny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally it looks very good.
I have one big concern with regards getFinishedSpans. The following piece of code even strengthen that

await new Promise(resolve => setTimeout(resolve));
const spans = memoryExporter.getFinishedSpans();

And because getFinishedSpans is a public api so people might be confused why it doesn't contain the finished spans after this change. Based on that I would make getFinishedSpans to be a Promise.
This way you would be able to call await memoryExporter.getFinishedSpans() which will look more intuitive.
Otherwise this might produce many unwanted bugs and confusion why something doesn't work and why I'm not getting finished spans if they are already finished. WDYT ?

@dyladan
Copy link
Member Author

dyladan commented Sep 4, 2020

Generally it looks very good.
I have one big concern with regards getFinishedSpans. The following piece of code even strengthen that

await new Promise(resolve => setTimeout(resolve));
const spans = memoryExporter.getFinishedSpans();

And because getFinishedSpans is a public api so people might be confused why it doesn't contain the finished spans after this change. Based on that I would make getFinishedSpans to be a Promise.
This way you would be able to call await memoryExporter.getFinishedSpans() which will look more intuitive.
Otherwise this might produce many unwanted bugs and confusion why something doesn't work and why I'm not getting finished spans if they are already finished. WDYT ?

The problem is not that getFinishedSpans is sync. It is that the spans are actually not finished. When you call span.end(), spanProcessor.onEnd() is not called until the resource promise is resolved. Even if I wanted to make getFinishedSpans async to wait for spans to finish, there is no way for getFinishedSpans to know how many spans to wait for. For example:

const span1 = tracer.startSpan('1');
const span2 = tracer.startSpan('2');

span1.end();
await exporter.getFinishedSpans(); // does this wait for 1 or 2 spans?

Also, the exporter doesn't even know the span exists until the span processor tells it, and that only happens after the span ends. So there would be no way for the exporter to even know if any spans are started, let alone how many to wait for.

@obecny
Copy link
Member

obecny commented Sep 4, 2020

@dyladan I was thinking of keeping it as a promise so it can resolve with next tick so in the mentioned example

const span1 = tracer.startSpan('1');
const span2 = tracer.startSpan('2');
span1.end();
const finishedSpans = await exporter.getFinishedSpans(); 

finishedSpans would then contain span1

As currently the same piece of code would produce exactly the same effect (you would get span1 ). After change exporter.getFinishedSpans() will never return span1 even though it was finished. So it might be hard to guess how long you should really wait for that, whereas with the correct api call await getFinishedSpans you should get it.

@obecny
Copy link
Member

obecny commented Sep 4, 2020

Also, the exporter doesn't even know the span exists until the span processor tells it, and that only happens after the span ends. So there would be no way for the exporter to even know if any spans are started, let alone how many to wait for.

And method span.end is also sync so you could expect that the getFinishedSpans should already have it - like it is now

@obecny
Copy link
Member

obecny commented Sep 4, 2020

or we should then convert method span.end() to be a promise.

@dyladan
Copy link
Member Author

dyladan commented Sep 4, 2020

@obecny this PR is currently broken because the global shutdown testing is broken. It doesn't seem to have anything to do with this code. I think I've passed some listener limit.

edit: fixed, but I still don't like the global SIGTERM handling. IMO that should be handled by the end user app

@vmarchaud
Copy link
Member

Could we just have a TestMemoryExporter that we use internaly implement this await new Promise(resolve => setTimeout(resolve)); mechanism directly ? I understand why it need to be there but i fear that some people might forget about writing this line when making new tests. WDYT ?

});

it('should export spans on graceful shutdown from two span processor', () => {
it('should export spans on graceful shutdown from two span processors', done => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@obecny this test fails in browser because the unload event handler is actually never called. It wasn't called previously, but it was never caught because the assertions are never run. I changed the test to have a done callback and wait for finish, and it never finishes.

@Flarna
Copy link
Member

Flarna commented Sep 12, 2020

I expect that forcedFlush needs to be also adapted to wait a tick.
A typical usecase for forcedFlush is AWS lambda which may suspend forever just after the the function (represented by a span) is finished.

@dyladan
Copy link
Member Author

dyladan commented Sep 14, 2020

@obecny @mwear The longer this PR is open and the more I work on it, the less I like this solution. My primary issue with it is the export pipeline becoming asynchronous. For an example of why this is a problem, take a lambda function. In lambda, you want to force flush on every invocation because you don't know if your function will be called again in the future and you don't want to lose the traces. Currently, it is possible to do something like this:

async function handler(event) {
	const span = tracer.startSpan("invoke");
    // some work
    span.end()
    await exporter.forceFlush();
    return someObj;
}

If we land this PR, the forceFlush will not export the span because the resource promise will not yet be resolved and the span will not yet have been sent to the span processor and exporter.

@obecny
Copy link
Member

obecny commented Sep 15, 2020

@obecny @mwear The longer this PR is open and the more I work on it, the less I like this solution. My primary issue with it is the export pipeline becoming asynchronous. For an example of why this is a problem, take a lambda function. In lambda, you want to force flush on every invocation because you don't know if your function will be called again in the future and you don't want to lose the traces. Currently, it is possible to do something like this:

async function handler(event) {
	const span = tracer.startSpan("invoke");
    // some work
    span.end()
    await exporter.forceFlush();
    return someObj;
}

If we land this PR, the forceFlush will not export the span because the resource promise will not yet be resolved and the span will not yet have been sent to the span processor and exporter.

That's why I originally deferred the span resource on end not on start. This way the span was already in processor and then you could simply wait for it until it ends. I was thinking of moving this logic to span processor and then defer the spans that needs to wait for the resource.

@dyladan
Copy link
Member Author

dyladan commented Sep 15, 2020

@obecny @mwear The longer this PR is open and the more I work on it, the less I like this solution. My primary issue with it is the export pipeline becoming asynchronous. For an example of why this is a problem, take a lambda function. In lambda, you want to force flush on every invocation because you don't know if your function will be called again in the future and you don't want to lose the traces. Currently, it is possible to do something like this:

async function handler(event) {
	const span = tracer.startSpan("invoke");
    // some work
    span.end()
    await exporter.forceFlush();
    return someObj;
}

If we land this PR, the forceFlush will not export the span because the resource promise will not yet be resolved and the span will not yet have been sent to the span processor and exporter.

That's why I originally deferred the span resource on end not on start. This way the span was already in processor and then you could simply wait for it until it ends. I was thinking of moving this logic to span processor and then defer the spans that needs to wait for the resource.

That still has the same problem of making the export pipeline async. I am working on another approach right now that makes the Resource.getAttributes return a promise which should be awaited in the exporters. It is a much simpler change, allows the sync sdk creation, and defers awaiting resources until as late as possible in the chain.

@obecny
Copy link
Member

obecny commented Oct 2, 2020

@dyladan what next with this ?

@dyladan
Copy link
Member Author

dyladan commented Oct 2, 2020

After talking to the maintainers group and the spec, they want us to hold off. None of the other sigs have async/remote resource detection and if the user needs that they're expected to do it themselves. They want to specify something here after GA

@dobesv
Copy link
Contributor

dobesv commented Oct 28, 2020

What is the current workaround for sync startup, then? So far I just set autoDetectResources: false which seems to let the system put its hooks into all the modules in a sync manner, but I suppose I might lose something from this, I don't yet know what.

@vmarchaud
Copy link
Member

Well i've wrote this workaround that work for me:

import { detectResources, Resource } from '@opentelemetry/resources'
import { gcpDetector } from '@opentelemetry/resource-detector-gcp'

export class CustomResource extends Resource {
  public attributes: ResourceAttributes = {}

  addAttributes (attributes: ResourceAttributes) {
    this.attributes = Object.assign(this.attributes, attributes)
    return this
  }
}

const run = () => {
  detectResources({
    detectors: [ gcpDetector ],
    logger
  }).then((detectedResources) => {
    resource.addAttributes(detectedResources.attributes)
  }).catch(err => {
    logger?.error(`Error while detecting ressources`, err)
  })
  const provider = new NodeTracerProvider({
    resource
  })
  api.trace.setGlobalTracerProvider(provider)
}

However this is non-compliant in regard of the spec, sadly i think that would be the easiest way to solve this problem :/

@dobesv
Copy link
Contributor

dobesv commented Oct 28, 2020

Could resources be "pre-detected", maybe put an environment variable that specifies exactly what is being detected there in a way it can be loaded synchronously? I'm not running this in a variety of environments, so I don't need auto-detection.

@dyladan
Copy link
Member Author

dyladan commented Oct 30, 2020

Could resources be "pre-detected", maybe put an environment variable that specifies exactly what is being detected there in a way it can be loaded synchronously? I'm not running this in a variety of environments, so I don't need auto-detection.

Definitely. You can always pass a resource to the constructor of the TracerProvider or MeterProvider

@tedsuo
Copy link

tedsuo commented Nov 9, 2020

FWIW I have been using this pattern for async loading, and it's been working for me:

Assume my service is started by calling node server.js. In a new, separate initialization file, server_init.js, I manage my NodeSDK lifecycle. Here, I only require my original application after the SDK finishes loading.

const opentelemetry = require("@opentelemetry/sdk-node");
const process = require("process");

const sdk = new opentelemetry.NodeSDK({
// configure exporters,  etc
})

// start the sdk and wait for any installed resource detection to run.
sdk.start().then(() => {
  // require your original application startup file here.
  require('./server');
});

function shutdown(){
  sdk.shutdown()
    .then(
      () => console.log("SDK shut down successfully"),
      (err) => console.log("Error shutting down SDK", err),
    )
    .finally(() => process.exit(0))
};

process.on('beforeExit', shutdown);
process.on('SIGINT', shutdown);
process.on('SIGTERM', shutdown);

This allows for async sdk loading, without the risk of pre-loading any of my application modules or making a hash out of my original application startup in server.js. It also makes it easy to copy-paste and apply to many services, and it also allows me to easily run my application without loading opentelemetry, should I want to do that.

Obviously, if async resources are going away, a sync approach is easier. But the approach above feels clean, and requires no changes to what we have today.

Am I missing something important? Every other approach I have tried has been a pretzel, but this two-phase pattern feels elegant, or at least easy.

@dobesv
Copy link
Contributor

dobesv commented Nov 9, 2020

The problem with the two-phase approach is that the scripts I am running with instrumentation are not instrumentation-aware and use if(require.main === module) { ... } and they use process.argv to parse their arguments. If I replace the main script I'll have to change the startup convention for all those scripts, which is a nuisance. I was hoping to retain the same sync startup method I was able to do with elastic-apm-node where I can just pass node -r path/to/instrument.js to load instrumentation.

@tedsuo
Copy link

tedsuo commented Nov 10, 2020

Thank for the clarification @dobesv, I had forgotten about that pattern.

@anuraaga
Copy link
Contributor

Hi @dyladan - just curious if this PR may get resurrected? The issue of resource detection being async has come up in the lambda support because the SDK may not be ready on the first due to the detectors being async, it would be nice if there's a way to allow the Resource to not block initialization of the SDK

aws-observability/aws-otel-lambda#106

@dyladan
Copy link
Member Author

dyladan commented Jun 30, 2021

Hi @dyladan - just curious if this PR may get resurrected? The issue of resource detection being async has come up in the lambda support because the SDK may not be ready on the first due to the detectors being async, it would be nice if there's a way to allow the Resource to not block initialization of the SDK

aws-observability/aws-otel-lambda#106

No chance. It actually isn't implementable this way in the current spec because the spec requires spanProcessor.onStart to run synchronously when span is created and spanProcessor.onEnd to run synchronously when it ends. Since the ReadableSpan needs to have access to the resource, there is no way to wait for it before calling onEnd without making it async.

@dyladan dyladan closed this Jun 30, 2021
@dyladan
Copy link
Member Author

dyladan commented Jun 30, 2021

There have been quite a few conversations around making resources appendable, which would in my opinion be a much better solution to this problem, but every time someone tries to make the spec change there are people who fight it. I have tried several times and at this point it feels to me like it's just not going to happen.

pichlermarc pushed a commit to dynatrace-oss-contrib/opentelemetry-js that referenced this pull request Dec 15, 2023
* chore: release main

* chore: release main
martinkuba pushed a commit to martinkuba/opentelemetry-js that referenced this pull request Mar 13, 2024
* chore: release main

* chore: release main
martinkuba pushed a commit to martinkuba/opentelemetry-js that referenced this pull request Mar 16, 2024
* chore: release main

* chore: release main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resource auto-detection and SDK initialization.
7 participants