Possible bug with `ThreadLocal.asContextElement` #985

alamothe · 2019-02-15T00:30:26Z

Hello,
I'm observing a strange behavior with the following code:

	val t: ThreadLocal<String> = ThreadLocal.withInitial { "a" }

	@Test
	fun testContext1() = runBlocking {
		println(t.get())
		withContext(coroutineContext.plus(t.asContextElement("b"))) {
			println(t.get())
		}
		println(t.get())
	}

	@Test
	fun testContext2() = runBlocking {
		println(t.get())
		withContext(coroutineContext.plus(t.asContextElement("b"))) {
			println(t.get())
			delay(Random.nextLong(100))
			println(t.get())
		}
		println(t.get())
	}

	@Test
	fun testContext3() = runBlocking {
		println(t.get())
		async(coroutineContext.plus(t.asContextElement("b"))) {
			println(t.get())
			delay(Random.nextLong(100))
			println(t.get())
		}.await()
		println(t.get())
	}

The result is:

testContext1 prints aba - expected
testContext2 prints abbb - It looks like the value is not restored for some reason. Is this a bug? What's the explanation?
testContext3 prints abba - expected

Please help!

I'm using org.jetbrains.kotlinx:kotlinx-coroutines-core:1.1.1

The text was updated successfully, but these errors were encountered:

elizarov · 2019-02-15T09:47:35Z

It is not exactly a bug, but a side-effect of the way asContextElement is implemented. See, your main coroutine (in runBlocking) does not have t's context element, so this is interpreted as "main coroutine do not care about this thread local". So then, depending on specific ways you get back to the main tread, the value of t can be either restored or not restored for the main coroutine.

If you write runBlocking(t.asContextElement("a")) { ... } in all your tests, then they all are going to run as expected.

Also note, then when you use withContext function you don't have to write withContext(coroutineContext.plus(other)) { ... }, because that merging of current context with the other context is exactly what withContext does (see its docs), so you can simply write withContext(other) { ... }.

Does it help?

alamothe · 2019-02-16T20:35:18Z

Hey @elizarov thank you for your reply & the tip for withContext.

I changed the test like this:

	@Test
	fun testContext2() {
		println(t.get())
		runBlocking {
			println(t.get())
			withContext(t.asContextElement("b")) {
				println(t.get())
				delay(Random.nextLong(100))
				println(t.get())
			}
			println(t.get())
		}
		println(t.get())
	}

It prints a a b b b a.

Are you basically saying that, between the outermost asContextElement and the end of the coroutine, the value of ThreadLocal is undefined? I find that very troubling. In my opinion, such undefined behavior could lead to disastrous bugs.

How about after the coroutine? Is it guaranteed to be restored?

alamothe · 2019-02-16T20:42:47Z

Consider also this case where the value depends on previous value (not uncommon):

	val tInt: ThreadLocal<Int> = ThreadLocal.withInitial { 0 }

	@Test
	fun testContext4() {
		println(tInt.get())
		runBlocking {
			println(tInt.get())
			withContext(tInt.asContextElement(tInt.get() + 1)) {
				println(tInt.get())
				// Comment for other behavior
				delay(Random.nextLong(100))
			}
			println(tInt.get())
			withContext(tInt.asContextElement(tInt.get() + 1)) {
				println(tInt.get())
				// Comment for other behavior
				delay(Random.nextLong(100))
			}
			println(tInt.get())
		}
		println(tInt.get())
	}

So now in the second withContext, the value could be both 1 or 2. In practice, withContext can be deep in the stack and have no idea if it is outermost asContextElement or not.

alamothe · 2019-02-16T20:46:45Z

If I understand correctly, the only workaround is to remember to set as a context element every time a coroutine is started. But what if that's happening at 100 places (not uncommon, consider the codebase which is migrating to coroutines)

elizarov · 2019-02-22T09:08:31Z

The design principle of t.asContextElement(value) is this -- a coroutine with this element in its context will have value set in the corresponding thread local. Indeed, there are no guarantees for coroutines that do have this element in their context, as they are not supposed to ever care about it. If they do care about this thread-local, they should have it in their context.

You don't need to remember to always set it if you use the structured concurrency approach and always launch your coroutines as children of other coroutines. Once you set the corresponding element in the context of your top-level scope, it is going to be inherited by all your coroutines.

alamothe · 2019-02-25T08:54:48Z

You don't need to remember to always set it if you use the structured concurrency approach and always launch your coroutines as children of other coroutines. Once you set the corresponding element in the context of your top-level scope, it is going to be inherited by all your coroutines.

What I was referring to is usage in a servlet container. In this case, each request will start outside of a coroutine (as a separate thread), and then a coroutine will be started at some point via runBlocking. Code outside of a coroutine will care about a thread-local value. To be able to access it both inside and outside of a coroutine - isn't that precisely the idea behind ThreadLocal.asContextElement?

So in this scenario, does the implementation guarantee that the value will be restored? In my example, outside of runBlocking? (my tests do show the value being restored after runBlocking)

I hope the answer is yes. But even so, you should realize the inconsistency here:

Value of a thread-local is well-defined inside a coroutine which has it in its context.
Value of a thread-local is well-defined outside of coroutine.
Value of a thread-local is undefined inside a coroutine which doesn't have it in its context.

Isn't it well reasoned to expect 2 and 3 to act the same? Outside of a coroutine there is no context at all. It's troublesome that the behavior is different. I am sure that someone will get bitten by this.

elizarov · 2019-02-25T11:33:16Z

@alamothe The value of thread-local is guaranteed to be restored when you go outside of the scope of any of the coroutines. If you write the following code:

// not in a coroutine here
runBlocking(t.asContextElement()) {

}
// the value of t is guaranteed to be restored here

alamothe · 2019-02-25T21:32:06Z

Got it. The way I solved it is to explicitly add to context every time I start a coroutine like that, and even implement as a helper function.

I still consider this behavior unintuitive, and suggest adding a warning to the docs, stating that "the value of a thread-local is undefined inside a coroutine which doesn't have it in its context", with some examples.

Fixes #985

frost13it · 2019-09-24T18:02:36Z

It would also be nice to add the warning to the documentation of ThreadContextElement too.
At the moment it does not mention that restoreThreadContext() may not be called when it would be expected by users implementing ThreadContextElement themselves.

And it would be much better to have thread state restored outside withContext { }, because a library may not have control over initial context of the coroutine, so this feature becomes unusable there.

frost13it · 2019-09-25T09:12:07Z

The kotlinx-coroutines-slf4j module seems to be a bit broken because of this problem.
Running the following snippet:

import kotlinx.coroutines.Dispatchers
import kotlinx.coroutines.slf4j.MDCContext
import kotlinx.coroutines.withContext
import org.slf4j.LoggerFactory

private val logger = LoggerFactory.getLogger("MainKt")

suspend fun main() {
    logger.info("Before operation")
    withContext(MDCContext(mapOf("operationId" to "123"))) {
        withContext(Dispatchers.IO) {
            logger.info("Inside operation")
        }
    }
    logger.info("After operation")
}

Procuces the following output:

[] INFO  MainKt - Before operation
[operationId=123] INFO  MainKt - Inside operation
[operationId=123] INFO  MainKt - After operation

Here the expression in square brackets is Logback's %mdc.

As you can see, MDC is not cleaned up after exiting from withContext {}. And there is no top-level scope where MDCContext can be injected.

Even when this is possible, it is not convenient at all to put "empty" value of thread-local state in all root coroutines to ensure that it will not left undefined after running some code. Could you please fix it?

Should I create a separate issue and is there a chance that this design of ThreadContextElement will be changed in future and such usages will work?

elizarov · 2019-09-25T09:41:14Z

@frost13it Thanks for a use-case. I've reopened this way. Let's see if it would be somehow possible to fix it (ideally, in an efficient way).

elizarov · 2019-09-25T15:22:39Z

@frost13it I've looked at this problem again. Can you, please, elaborate on your actual use-case? In particular, you are giving an example where thread local value leaks into suspend fun main. Is it your real use-case?

Note, that you can wrap your top-level code into withContext(MDCContext(emptyMap())) { ... }. If you do this, you should see that your operationId is cleared. Would it be a solution for you?

frost13it · 2019-09-25T19:18:46Z

Unfortunately it is not a good solution for me.
I'm writing a logging library. What am I trying to achieve is to propagate contextual data (called LoggingContext) in a call graph consisting of both suspend and non-suspend functions without passing it explicitly.
Coroutine context allows to achieve that in suspend code, and ThreadLocal allows to achieve that in non-suspend single-threaded code. Provided that suspend functions can only be called from other suspend functions, I see the following solutions for this:

Take LoggingContext from the context of the coroutine currently running on the current thread. It would be possible if there was a ThreadLocal containing CoroutineContext of that coroutine. But as far as I know, no such ThreadLocal exists at the moment.
Make thread local value of LoggingContext match that in the coroutine context.
It seemed that ThreadContextElement fits perfectly here. But it is hard to guarantee that context of top-level coroutines will contain an initial value of LoggingContext. Forcing users of the library to add it manually is inconvenient and error-prone. Especially in the case with suspend fun main().
That is why I would love to see this issue fixed.

Don't you think that kotlinx-coroutines-slf4j (which works the same way) is pretty difficult to use correctly because of this issue? I'm afraid that for now the same is true for any similar library.

elizarov · 2019-09-26T09:58:41Z

We assume that kotlinx-coroutines-slf4j is used in the following way. Consider the application that is using SLF4J for logging, uses coroutines, and uses MDC. This application adds kotlinx-coroutines-slf4j to its dependencies and adds MDCContext(emptyMap()) to its top-level scope to give a "signal" to coroutine machinery that it is going to use MDCContext. After that, its MDC will be always correctly set/restored and reset and switching between different coroutines.

Unfortunately, this does not currently affect GlobalScope and requires some boiler-plate code from the application writer. E.g., if my application has code like val myScope = CoroutineScope(someContext) then I, as an application author, have to remember to add MDCContext(emptyMap()) to this scope's context.

What we can do to improve this situation is to give integration modules like kotlinx-coroutines-slf4j and ability to globally inject context elements to all scopes (including GlobalScope). This should considerably improve usability. All you'll need to do in this case is to add kotlinx-coroutines-slf4j to dependencies and that's it. What do you think?

P.S. The downside of this approach to consider is that anyone who adds kotlinx-coroutines-slf4j to their dependencies will get a performance hit even if they don't use MDC (as an afterthought -- there's no other reason to add kotlinx-coroutines-slf4j, but for MDC support).

frost13it · 2019-09-26T10:20:11Z

I think it would be sufficient in my case. It would be great to have such an ability in a library.

Note, that this fix has potentially severe performance impact since it always walk up the coroutine completion stack in search for parent UndispatchedCoroutine. :todo: figure out how to optimize it Fixes #985

frost13it · 2019-11-08T11:56:46Z

Are there any updates on this issue?

alamothe · 2020-06-08T01:34:44Z

Thanks @elizarov for your work and everyone else in the discussion.

We have solved this by making sure every time we start a coroutine, we put all thread locals that we care about in its context.

But I totally understand how this doesn't work for libraries.

akalini · 2020-07-28T14:36:11Z

Hello,

I am experiencing the same issue even with the code base that is not huge and is mostly under our control. Our services use MDC logging, opencensus tracing, thread locals, and GRPC contexts. It has been a huge challenge to wire everything up correctly to work with coroutine context propagation.

For the code under our control, we need to remember to seed all the different context elements every time somebody starts a coroutine. This leads to tight coupling, every piece of code that wants to start a coroutine needs to be aware of all the thread context elements in the entire codebase. And of course, it is very error-prone.

The problem is exacerbated by GRPC launching coroutines without giving everyone the opportunity to override the behavior. I imagine other libraries wanting to do the same.

The described behavior is surprising, to say the least. I don't totally understand why global injection is needed to fix this. In the examples above, there are clear scope/context entry and exit points. Could you explain why the restoreThreadContext is not invoked at the block's end? It's especially surprising behavior because if you remove delay invocation from the examples above, withContext works as expected!

qwwdfsad · 2020-10-20T15:35:34Z

One more request: https://youtrack.jetbrains.com/issue/KT-42582

This fix solves the problem of restoring thread-context when returning to another context in undispatched way. It impacts suspend/resume performance of coroutines that use ThreadContextElement since we have to walk up the coroutine completion stack in search for parent UndispatchedCoroutine. However, there is a fast-path to ensure that there is no performance impact in cases when ThreadContextElement is not used by a coroutine. Fixes #985

frost13it · 2021-02-01T13:30:49Z

Is there a decision on the #1577?

qwwdfsad · 2021-02-01T13:32:55Z

Yes, we are going to release the fix with the next release

frost13it · 2021-02-01T13:35:26Z

Magnificently!

This fix solves the problem of restoring thread-context when returning to another context in undispatched way. It impacts suspend/resume performance of coroutines that use ThreadContextElement and undispatched coroutines. The kotlinx.coroutines code poisons the context with special 'UndispatchedMarker' element and linear lookup is performed only when the marker is present. The code also contains description of an alternative approach in order to save a linear lookup in complex coroutines hierarchies. Fast-path of coroutine resumption is slowed down by a single context lookup. Fixes #985 Co-authored-by: Roman Elizarov <elizarov@gmail.com

This fix solves the problem of restoring thread-context when returning to another context in undispatched way. It impacts suspend/resume performance of coroutines that use ThreadContextElement and undispatched coroutines. The kotlinx.coroutines code poisons the context with special 'UndispatchedMarker' element and linear lookup is performed only when the marker is present. The code also contains description of an alternative approach in order to save a linear lookup in complex coroutines hierarchies. Fast-path of coroutine resumption is slowed down by a single context lookup. Fixes #985 Co-authored-by: Roman Elizarov <elizarov@gmail.com>

luispollo · 2021-02-28T00:44:58Z

Thanks for fixing this! I've been pulling my hair trying to understand why my MDC values were screwed up for months now without coming across this thread. Is there an ETA on the next release?

luispollo · 2021-02-28T19:10:00Z

We assume that kotlinx-coroutines-slf4j is used in the following way. Consider the application that is using SLF4J for logging, uses coroutines, and uses MDC. This application adds kotlinx-coroutines-slf4j to its dependencies and adds MDCContext(emptyMap()) to its top-level scope to give a "signal" to coroutine machinery that it is going to use MDCContext. After that, its MDC will be always correctly set/restored and reset and switching between different coroutines.

Until there's a new release, it would be great to have a clear example of what this "expected usage" looks like. Specifically, the documentation for kotlinx-coroutines-slf4j is extremely short and could benefit from a warning about the caveat caused by this bug and an example of a workaround. Might be best tracked as a separate issue, not sure.

cyberdelia · 2021-03-27T04:57:46Z

I'm seeing a strange side-effect which I believe is the result of this change. Using MDCContext will remove keys from the MDC if you call a suspendable function, which is a bit surprising, when in this simple example case you just call delay:

Without MDCContext:

runBlocking {
    MDC.put("key", "value")
    println(MDC.copyOfContextMap)
    delay(50)
    println(MDC.copyOfContextMap)
}

# {key=value}
# {key=value}

With MDCContext:

runBlocking(MDCContext()) {
    MDC.put("key", "value")
    println(MDC.copyOfContextMap)
    delay(50)
    println(MDC.copyOfContextMap)
}

# {key=value}
# null

qwwdfsad · 2021-03-27T16:16:01Z

Yes, this is the existing limitation of MDCContext exposed more strictly.

Quoting our KDoc:

Note that you cannot update MDC context from inside of the coroutine simply using MDC.put. These updates are going to be lost on the next suspension and reinstalled to the MDC context that was captured or explicitly specified in contextMap when this object was created on the next resumption.

This is a by-design limitation, that become much more clearer when more than one coroutine is launched.

E.g.:

runBlocking((MDCContext()) {
    launch(someDispatcher) {
        // mutate MDC
    }.join()
    // use MDC here as well
}

do you expect mutated MDC to be available in runBlocking coroutine. What if someDispatcher is Dispatchers.Unconfined and what if it's concurrent? And what if it's concurrent, but it happened to be dispatched on the same thread as runBlocking coroutine?

michail-nikolaev · 2021-05-07T13:26:48Z

It seems like there are some strange issues related to it - https://youtrack.jetbrains.com/issue/KT-46552

elizarov added the question label Feb 15, 2019

qwwdfsad added the docs KDoc and API reference label Feb 27, 2019

qwwdfsad added a commit that referenced this issue Feb 27, 2019

Clarify thread locals documentation

0b91810

Fixes #985

qwwdfsad mentioned this issue Feb 27, 2019

Clarify thread locals documentation #1016

Merged

qwwdfsad added a commit that referenced this issue Mar 6, 2019

Clarify thread locals documentation

b782adb

Fixes #985

qwwdfsad closed this as completed in ab5bbfc Mar 26, 2019

elizarov reopened this Sep 25, 2019

elizarov removed the question label Sep 25, 2019

elizarov added bug and removed docs KDoc and API reference labels Sep 28, 2019

frost13it mentioned this issue Dec 10, 2019

Add ability to inject context elements into GlobalScope #1696

Closed

qwwdfsad closed this as completed in 727c38f Mar 3, 2021

luispollo mentioned this issue Mar 4, 2021

fix(tracing): Ensure clean MDC context at top coroutine scopes spinnaker/keel#1845

Merged

frost13it mentioned this issue Sep 13, 2021

Thread local is not cleaned up sometimes #2930

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bug with `ThreadLocal.asContextElement` #985

Possible bug with `ThreadLocal.asContextElement` #985

alamothe commented Feb 15, 2019 •

edited

Loading

elizarov commented Feb 15, 2019

alamothe commented Feb 16, 2019 •

edited

Loading

alamothe commented Feb 16, 2019 •

edited

Loading

alamothe commented Feb 16, 2019

elizarov commented Feb 22, 2019

alamothe commented Feb 25, 2019

elizarov commented Feb 25, 2019

alamothe commented Feb 25, 2019

frost13it commented Sep 24, 2019

frost13it commented Sep 25, 2019

elizarov commented Sep 25, 2019

elizarov commented Sep 25, 2019

frost13it commented Sep 25, 2019

elizarov commented Sep 26, 2019 •

edited

Loading

frost13it commented Sep 26, 2019

frost13it commented Nov 8, 2019

alamothe commented Jun 8, 2020

akalini commented Jul 28, 2020

qwwdfsad commented Oct 20, 2020

frost13it commented Feb 1, 2021

qwwdfsad commented Feb 1, 2021 •

edited

Loading

frost13it commented Feb 1, 2021

luispollo commented Feb 28, 2021 •

edited

Loading

luispollo commented Feb 28, 2021

cyberdelia commented Mar 27, 2021

qwwdfsad commented Mar 27, 2021

michail-nikolaev commented May 7, 2021

Possible bug with ThreadLocal.asContextElement #985

Possible bug with ThreadLocal.asContextElement #985

Comments

alamothe commented Feb 15, 2019 • edited Loading

elizarov commented Feb 15, 2019

alamothe commented Feb 16, 2019 • edited Loading

alamothe commented Feb 16, 2019 • edited Loading

alamothe commented Feb 16, 2019

elizarov commented Feb 22, 2019

alamothe commented Feb 25, 2019

elizarov commented Feb 25, 2019

alamothe commented Feb 25, 2019

frost13it commented Sep 24, 2019

frost13it commented Sep 25, 2019

elizarov commented Sep 25, 2019

elizarov commented Sep 25, 2019

frost13it commented Sep 25, 2019

elizarov commented Sep 26, 2019 • edited Loading

frost13it commented Sep 26, 2019

frost13it commented Nov 8, 2019

alamothe commented Jun 8, 2020

akalini commented Jul 28, 2020

qwwdfsad commented Oct 20, 2020

frost13it commented Feb 1, 2021

qwwdfsad commented Feb 1, 2021 • edited Loading

frost13it commented Feb 1, 2021

luispollo commented Feb 28, 2021 • edited Loading

luispollo commented Feb 28, 2021

cyberdelia commented Mar 27, 2021

qwwdfsad commented Mar 27, 2021

michail-nikolaev commented May 7, 2021

Possible bug with `ThreadLocal.asContextElement` #985

Possible bug with `ThreadLocal.asContextElement` #985

alamothe commented Feb 15, 2019 •

edited

Loading

alamothe commented Feb 16, 2019 •

edited

Loading

alamothe commented Feb 16, 2019 •

edited

Loading

elizarov commented Sep 26, 2019 •

edited

Loading

qwwdfsad commented Feb 1, 2021 •

edited

Loading

luispollo commented Feb 28, 2021 •

edited

Loading