Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XRay multi threading - Failed to end subsegment: subsegment cannot be found #227

Open
dan-lind opened this issue Oct 18, 2020 · 6 comments
Assignees
Labels

Comments

@dan-lind
Copy link

I have an issue using Xray in a multithreaded environment with my REST API. I'm using the Xray Auto instrumentation agent for Java, and Spring boot.

I've tried to follow the example found here https://docs.aws.amazon.com/xray/latest/devguide/scorekeep-workerthreads.html

However when ending my subsegment, I get a log output of "Suppressing AWS X-Ray context missing exception (SubsegmentNotFoundException): Failed to end subsegment: subsegment cannot be found."

I suspect what happens is that the request has been fully handled and response returned to the client, before the CompletableFuture is done. I assume this means that the Xray segment has been closed and that might explain why I'm seeing this issue. What I'm wondering is if there´s anything I can do to fix this?

I could probably live with this if it was only the trace within the CompletableFuture that was lost, but it looks like the entire trace is lost (there are calls to other HTTP services, dynamo DB etc, which won't show up unless I remove the piece of code below)

    Entity segment = AWSXRay.getGlobalRecorder().getTraceEntity();

    CompletableFuture.runAsync(() -> {
        AWSXRay.getGlobalRecorder().setTraceEntity(segment);

        AWSXRay.beginSubsegment("PostTest");
        try {
            defaultService.createStatus(accessToken, statusRequest);
        } finally {
            AWSXRay.endSubsegment();
        }
    })
@willarmiros
Copy link
Contributor

Hi @dan-lind,

The X-Ray auto-instrumentation agent actually automatically propagates X-Ray context between threads, so you shouldn't need to do the getTraceEntity and setTraceEntity workarounds described in those docs. Could you try removing those lines and rerunning your program? If the problem persists, it would be very helpful to have the full stack trace and/or a functional snippet of reproduction code so I can try to diagnose the problem on my end.

@dan-lind
Copy link
Author

Hi @willarmiros ,

I have actually tried that too, but it yields almost the same result, but instead the error is
"Failed to end segment: segment cannot be found.".

Will see if I can make a small reproducer or atleast get you the full stacktrace

@dan-lind
Copy link
Author

@willarmiros I've created a repro which should clearly display the issue I'm having.
Please find it here: https://github.com/dan-lind/xray-agent-segment-issue

@willarmiros
Copy link
Contributor

@dan-lind Thank you very much for the repro code, I will investigate this issue as soon as possible.

@willarmiros
Copy link
Contributor

willarmiros commented Nov 3, 2020

Hi @dan-lind,

I've figured out your problem after testing out your sample app. The problem in the withIssue block is that you're relying on AWSXRay.endSubsegment() in your async block. This method uses the magic of context to retrieve an in-progress subsegment and end it. The problem is that when your app's servlet returns a response, it attempts (and fails because of the in-progress subsegment) to end the segment and clears the context. Clearing context is necessary because the thread that was handling that request to your servlet may be reused or destroyed now that the response has been returned, so we can no longer confidently maintain context within it.

So, the solution is to not rely on context within your async block. This is simple enough, changing the withIssue method to the following worked for me:

    @GetMapping("/withIssue")
    public ResponseEntity<?> withIssue() {
        CompletableFuture.runAsync(() -> {
            try {
                Subsegment sub = AWSXRay.beginSubsegment("Sugsegment");
                Thread.sleep(3000);
                AWSXRay.getGlobalRecorder().endSubsegment(sub);  // This method doesn't touch context storage
            } catch (InterruptedException e) {
                //NOOP
            }
        });
        return ResponseEntity.ok().build();
    }

Since this is a pretty common use case that is particularly obscured by the magic of the agent, I'll do 2 things:

  1. Move the non-context-based endSubsegment(sub) to the top-level AWSXRay class for better visibility.

  2. Update documentation to better cover this use case.

@dan-lind
Copy link
Author

dan-lind commented Nov 4, 2020

Thanks @willarmiros, I will test this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants