Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ObservationThreadLocalAccessor.restore() generating errors in tests #3943

Closed
nvervelle opened this issue Jun 28, 2023 · 15 comments
Closed

ObservationThreadLocalAccessor.restore() generating errors in tests #3943

nvervelle opened this issue Jun 28, 2023 · 15 comments

Comments

@nvervelle
Copy link
Contributor

nvervelle commented Jun 28, 2023

Describe the bug
When upgrading from spring-boot 3.0.4 (micrometer 1.10.4) to spring-boot 3.0.8 or 3.1.1 (micrometer 1.11.1), some of my tests are now failing with a message comparing a NoopObservation with null :

Observation <io.micrometer.observation.NoopObservation@390cd91e> to which we're restoring is not the same as the one set as this scope's parent observation <null> . Most likely a manually created Observation has a scope opened that was never closed. This may lead to thread polluting and memory leaks

It seems related to #3831 which introduced NullObservation and the verification in ObservationThreadLocalAccessor.restore().

Environment

  • Micrometer version 1.11.1
  • OS: Ubuntu
  • Java version: openjdk version "17.0.1" 2021-10-19

To Reproduce

I'm trying to extract a test case from my own code to show how to reproduce it. I will post it here once I'm able to reproduce it on a sample application.

For the moment, it seems related to observations created by Mono.tap(Micrometer.observation(registry)).

Exemple of test that fails with the new version :

  @Test
  @DisplayName("Save when the bucket does not exist. Must terminate with errors.")
  void saveWhenBucketDoesNotExist() throws IOException {
    // GIVEN
    final Resource fileResource = ClasspathResourceLoader.fromFile("input/ID-card.png");

    // WHEN
    final Mono<FileReference> responseFileReferenceMono =
        s3Repository.save(fileResource.getFile().toPath());

    // THEN
    StepVerifier.create(responseFileReferenceMono)
        .expectError(NoSuchBucketException.class)
        .verify();
  }

Code under test :

  public Mono<FileReference> save(final Path path) {
    return save(AsyncRequestBody.fromFile(path), path.toFile().length());
  }

  private Mono<FileReference> save(
      final AsyncRequestBody asyncRequestBody,
      final long length) {
    return Mono.fromFuture(() -> putObject(asyncRequestBody, fileReference, length))
        .name("s3 put-object")
        .doOnError(NoSuchBucketException.class, e -> LOGGER.error("Unknown bucket"))
        .tap(Micrometer.observation(observationService.registry()))
        .thenReturn(fileReference);
  }

  private CompletableFuture<PutObjectResponse> putObject(
      final AsyncRequestBody asyncRequestBody,
      final long length) {
    final PutObjectRequest request =
        PutObjectRequest.builder()
            .bucket("test")
            .key("myKey")
            .contentLength(length)
            .contentType(MediaType.APPLICATION_OCTET_STREAM.getType())
            .build();
    return s3client.putObject(request, asyncRequestBody);
  }

Expected behavior
Test should not fail when testing an error happening inside a reactive stream which is observed.

Additional context

@pp-dusza
Copy link

Came across the same issue when invoking a rest endpoint. Spring Boot 3.0.8 or higher.

@nvervelle
Copy link
Contributor Author

nvervelle commented Jun 28, 2023

I'm trying to reduce the problem to a simple example, but it's quite difficult...

For the moment, depending on where I put the following test, it either succeeds (in a simple JUnit test) or fails (in a test in a class annotated with @SpringBootTest). The test that fails with spring boot 3.1.1 succeeds with spring boot 3.0.4. So something is going wrong when spring boot is loaded with its beans. My spring boot test is still in a complete environment, so I will try with a reduced environment...

import io.micrometer.observation.ObservationRegistry;
import java.util.concurrent.CompletableFuture;
import org.junit.jupiter.api.Test;
import reactor.core.observability.micrometer.Micrometer;
import reactor.core.publisher.Mono;
import reactor.test.StepVerifier;

public class MicrometerTest {

  @Test
  void monoWithoutException() {
    final Mono<String> response =
        Mono.fromFuture(CompletableFuture.supplyAsync(() -> "OK"))
            .name("mono-without-exception")
            .tap(Micrometer.observation(ObservationRegistry.NOOP))
            .thenReturn("OK");

    StepVerifier.create(response).expectNext("OK").expectComplete().verify();
  }
}

@nvervelle
Copy link
Contributor Author

I'm not sure why the test fails with @SpringBootTest and succeeds without, but here's what I think happens in ObservationThreadLocalAccessor.restore() and causes the problem :

My understanding is that :

  • NoopObservation.INSTANCE can act both as the current and previous observation (as it's the always the current Observation returned by NoopObservationRegistry)
  • On the other hand NoopObservation.Scope.INSTANCE isn't chained to itself, so depending on the path you take it can be the previous scope (if you go by observation) or not (if you go by the scope itself)
  • ObservationThreadLocalAccessor.restore() doesn't properly handle this situation

@nvervelle
Copy link
Contributor Author

nvervelle commented Jun 29, 2023

Strange, I tried following step by step in the debugger what's happening, with a breakpoint in MicrometerObservationListener constructor :

  • 👍 When it calls Observation.createNotStarted(), as we use NoopObservationRegistry, the first test evals to true
  • 👍 The method then returns NOOP
  • 👎 But NOOP is not properly initialized (debugger says its value is null, while NoopObservation.INSTANCE is properly set) : is there a race condition somewhere leading to NOOP incorrectly set ? Maybe using directly NoopObservation.INSTANCE instead of duplicating its value would be better ?
  • MicrometerObservationListener then tries to set the contextual name on a null object, resulting in a NPE thrown and caught by MonoTapRestoringThreadLocals which put the reactive pipeline in error, which in turn result in some calls to restore the thread local with NoopObservation.INSTANCE

@nvervelle
Copy link
Contributor Author

I've opened a PR to fix the NPE (I tested locally that it indeed fixes the NPE), but this doesn't solve the initial issue, so I will have to dwelve deeper...

@nvervelle
Copy link
Contributor Author

First PR #3946 merged on 1.10.x branch, I will try another PR for fixing the Scope hierarchy when using NoopObservation.

@marcingrzejszczak
Copy link
Contributor

I'm starting to think that OTLA should take into account whether we're using a NOOP observation registry or sth like that. I'll also look into this topic

@nvervelle
Copy link
Contributor Author

Yes, that's what I'm trying to do too

@marcingrzejszczak
Copy link
Contributor

I think I fixed it, let me file a PR

marcingrzejszczak added a commit that referenced this issue Jun 29, 2023
without this change there was a problem with OTLA, default ObservationRegistry and NoOpRegistry interop. They didn't see each other scopes

with this change we're allowing scopes for NoOpRegistry and NoOpObservations

fixes gh-3943
@marcingrzejszczak
Copy link
Contributor

Related PR - #3947

@marcingrzejszczak
Copy link
Contributor

Can someone check the latest micrometer snapshots if they helped?

@nvervelle
Copy link
Contributor Author

Hi @marcingrzejszczak ,

I've managed to force micrometer version on the project that had failing tests, here's what I see :

  • 1.10.4: 🟢
  • 1.10.5: 🟢
  • 1.10.6: 🟢
  • 1.10.7: 🟢
  • 1.10.8 🔴
  • 1.10.9-SNAPSHOT: 🟢

So it seems a lot better 👍

@marcingrzejszczak
Copy link
Contributor

Fantastic! Thank you for checking it out. I'll close the issue for now and will reopen it if there are any issues

@nvervelle
Copy link
Contributor Author

I guess it's also been fixed on the 1.11.x branch and the main 👍

@marcingrzejszczak
Copy link
Contributor

Yes, I merged it forward

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants