Skip to content

Essential Information on Virtual Threads

Richard Reingruber edited this page Dec 6, 2024 · 2 revisions

Virtual Threads are a new and exciting feature of Java. They promise performance and compatibility but there are also subtle differences to ordinary threads and even pitfalls which you should be aware of before making use of virtual threads in important systems.

On this page we collect essential information and references to other sources (first and foremost JEP 444 on Virtual Threads).

Being JDK and JVM developers we don't have own experience from building systems using virtual threads nevertheless we have gained knowledge porting VM continuations in the hotspot VM to PPC and supporting SapMachine users in their endeavors with virtual threads.

When to Use Virtual Threads

Virtual threads are a means to scale up and keep CPUs busy while waiting for something (typically I/O). Replacing platform threads with virtual threads alone will probably not improve the performance of the system. It is the scaling beyond the limited number of platform threads that will bring improvements in throughput.

So you should use virtual threads if your application:

  • Does a significant amount of I/O.
    It might also help if there are waits for java.util.concurrent locks and conditions if they are not an obstacle themselves for scaling.

  • Has at least 10,000 independent tasks at every point in time.

Vice versa, if your application is mostly doing computations, never waiting for input or if there are not that many independent tasks then you cannot expect improvements from virtual threads. In the former case (computation heavy) performance might even decrease. In the latter case it might be possible to refactor the application to create more independent tasks to get a speedup from virtual threads.

How to Use Virtual Threads

It is recommended to start a new temporary virtual thread per request. This will work well as creation and start of virtual threads is extremely lightweight. It merely means allocating a few objects and calling the runnable.

With virtual threads you will be able to achieve a higher throughput of requests since you can have more virtual threads than platform threads.

You might also want to improve the latency of requests. You can make use of virtual threads for this too if handling a request involves independent I/O operations (e.g. querying other systems) or accessing java.util.concurrent data structures that might block. Executing the operations concurrently in new temporary virtual threads will reduce the latency of the request if the system is not yet fully loaded. This comes at a cost though: stack traces of the temporary threads won't show the request context they are running in. This problem resembles the issues of asynchronous programming with callbacks.

ThreadLocal vs Scoped Values (2nd Preview)

Thread local values have design flaws that can have bigger impact with virtual threads (example is given here). Scoped values (still in preview) are supposed to reduce complexity and improve security and performance. It is recommended to use scoped values instead of thread local values.

Structured Concurrency (2nd Preview)

When splitting a task into subtasks to be executed concurrently in virtual threads it is easy to make mistakes that can cause issues like leaks and you will lose the context (e.g. the request handled by the parent task) which complicates analysis of thread dumps. Since jdk 19 ExecutorService can be used in try-with-resources statements which is helpful but Structured Concurrency will go beyond. It imposes the static structure of code blocks on dynamically created subtasks thereby forming a tree which is used to control the life time of subtasks.

Also, when dumping stacks with jcmd <pid> Thread.dump_to_file -format=json subtasks will be grouped and put into the context of their parent (see JDK 21 Documentation).

What NOT to do

Doing I/O while having a Java object locked

It is in general not optimal to hold a lock while doing I/O because I/O has significant latency which is added to the threads waiting for the lock. With virtual threads it is especially problematic because a virtual thread is pinned to its carrier thread while it has Java objects locked (see limitations).

Thread pools

A thread pool is the anti-pattern of how you should use virtual threads (see above). Creation and start of a virtual thread is (at least) an order of magnitude faster than creating a platform thread because it literally takes just few object allocations plus initialization to spin up a virtual thread. So the overhead of a pool is not amortized. Another issue with pooling could be that once GC has visited virtual threads (and their StackChunks) certain fast paths are not available anymore. The micro benchmark below demonstrates this.

Busy waits

Busy waits are never preempted by the scheduler (see limitations). Busy waiting threads block their carrier threads, maybe even preventing another virtual thread from running and doing the work they are waiting for causing a dead lock.

Long Calculations without Wait for I/O, Synchronization

The virtual thread scheduler does not preempt calculations so there is no value added by executing pure calculation tasks in virtual threads. After creating a couple of virtual threads for such tasks you won't be able to create more until prior tasks have completed. If a deadlock occurs might depend on the load of the system.

Limitations and Issues to be Aware of

All relevant limitations are related to situations were a virtual thread cannot unmount from the carrier.

If this is due to locking of a Java object monitor or due to JNI then the virtual thread is said to be pinned. In this case the scheduler will not compensate, in all other cases it will compensate for the blocked carrier by temporarily adding a new one to the pool (see also section "Executing virtual threads" in JEP 444)

The limitations can become bottlenecks and cause even deadlocks if all carriers are occupied by threads that don't unmount. How severely your application is affected will also depend on the load so it is advisable to conduct stress tests under high load.

Pinned: Java Object Monitor (synchronized Statement)

A thread is pinned if a thread is executing inside a synchronized block or method or if it is waiting to enter a synchronized block or method.

Should this become an issue it is recommended to replace the synchronization with classes from java.util.concurrent if possible.

Work has been done to overcome this limitation. It is still experimental but you can test drive early access builds of it.

Pinned: JNI

Active JNI calls pin a virtual thread to its carrier. This, for instance, can happen with classloading. If the loading class has a static initializer then the vm will call it with active JNI calls on the stack. The thread will be pinned while the static initializer is running.

Object::wait and Other 'Compensating' Operations

A virtual thread blocked in Object::wait cannot unmout. The scheduler compensates for this by adding a temporary carrier to the pool if needed (see Object::wait and Block::begin).

It is rather common that a larger number of threads are blocked in Object::wait to be notified about an event. These cases need the compensation to avoid deadlocks. In contrast to that it is uncommon that many threads are blocked in the attempt to enter a synchronized block or method as it would prevent scaling. So the compensation is not needed. At least in most cases it is not needed. There may be exceptional situations where many threads are blocked waiting to enter a synchronized block. These are problematic and can cause deadlocks.

Early access builds contain experimental work that allows to unmount a virtual thread when it calls Object::wait.

There are other operations handled likewise. Most of them are related to file I/O (filesystems rarely support asynchronous I/O).

java.util.concurrent Synchronizers and Locks

Locks in java.util.concurrent.lock might cause issues with the order that threads are notified when a lock becomes available.

If, e.g., virtual threads v1 and v2 are waiting for a lock that is released and v1 is notified instead of v2 which is pinned to its carrier then this can lead to a deadlock if no carrier is available for v1.

No Time Slice Based Preemption

The scheduler does not preempt virtual threads based on CPU time consumption. This can become an issue if there are too many virtual threads doing long computations.

Porting Existing Applications and Libraries to Virtual Threads

Due to the compatibility of the new APIs (e.g. VirtualThread is a subclass of java.lang.Thread), the initial effort might be not too high to change an existing application to make use of virtual threads.

Though chances are that under real load you face issues caused by the limitations of the current implementation (jdk 21).

Also it is not unlikely that the existing application requires several tuning cycles before you see performance improvements you hope for if there are bottlenecks that where not yet triggered because of the the limits of platform threads.

Trivial Micro Benchmark

A TrivialVthreadMicroBenchmark demonstrates some of the claims given here.

Configuration

jdk.virtualThreadScheduler.parallelism

Property to control the number of carrier threads in the pool. By default set to Runtime.getRuntime().availableProcessors().

jdk.virtualThreadScheduler.maxPoolSize

Property to control the maximum number of carrier threads in the pool. By default set to 256.

jdk.virtualThreadScheduler.minRunnable

Property to control when to temporarily add carrier threads to the pool. By default set to the maximum of 1 and parallelism / 2. If less carrier threads are available for execution of virtual threads, then the number of carriers may temporarily increase. This can improve throughput but starting a new platform thread also means overhead.

Refer to limitations for conditions where carrier threads are temporarily added to the pool

Observing Virtual Threads

Stack Dumps

Only the jcmd command Thread.dump_to_file will show the stacks of virtual threads.

The carrier threads are normally from a ForkJoin thread pool with the name prefix "ForkJoinPool-1".

In this example carrier threads that have a virtual thread mounted have stacks with the following frames on top

java.base/jdk.internal.vm.Continuation.run(Continuation.java:248)
java.base/java.lang.VirtualThread.runContinuation(VirtualThread.java:221)

The carriers are blocked running virtual threads. From looking at their stacks it is not possible to say what the virtual threads are doing. In the example all virtual threads are either waiting to start or blocked waiting for a Java object monitor pinned to their carrier except for one thread that owns the monitor and executes the method consumeCPU(). Note that the pinning is the bottleneck that prevented all but 4 virtual threads from starting. 4 threads could be started because 4 carrier threads were configured using the property jdk.virtualThreadScheduler.parallelism

Pinning can be avoided by using a java.util.concurrent.locks.ReentrantLock. All threads were able to start because when blocking on the lock the threads unmount from their carrier. Now only one carrier has a virtual thread mounted. Others are waiting for a virtual thread to become runnable.

If you select -format=json then tasks created with the Structured Concurrent API will be grouped according to the scopes (an example is given here)

Tracing

By setting jdk.tracePinnedThreads you will get a stack trace if a virtual thread cannot be unmounted because it is pinned. For a shorter trace you can specify -Djdk.tracePinnedThreads=short. Each stack will be printed at most once.

JFR

There are a bunch of events related to virtual threads.

Debugging

Virtual threads are hidden by default when debugging. You don't see them and they don't stop at breakpoints. You can change this behaviour when launching your application. One way to do this is to pass the option includevirtualthreads=y to the JDWP agent.

Example
java -agentlib:jdwp=transport=dt_socket,address=8000,server=y,suspend=n,includevirtualthreads=y Example.java

One reason for hiding virtual threads is that stopping at a breakpoint pins the virtual thread. If to many threads hit the breakpoint this will deadlock the vm.

Local variables in virtual threads can currently only be changed in the topmost frame.

Glossary

Carrier Thread

A Platform thread that the scheduler uses to mount a virtual thread for execution. The scheduler unmounts the virtual thread if an operation blocks unless the virtual thread is pinned or unmounting is not possible with the operation (see limitations).

Mounting and Unmounting a Virtual Thread

Mounting/unmounting is effectively switching context between carrier and virtual thread. The switching is done in user mode.

A carrier is blocked while a virtual thread is mounted. A virtual thread is blocked while unmounted.

Pinned

A virtual thread is said to be pinned to its carrier thread if

  • it is executing in a synchronized method or block
    (fixed in JDK 24 by JEP 491)
  • waiting to enter a synchronized method or block
    (fixed in JDK 24 by JEP 491)
  • or has a JNI call on stack.

The virtual thread cannot unmount when executing a blocking operation while pinned, i.e. it blocks its carrier. The scheduler does not compensate for this (see limitations).

Platform Thread

A Java thread that is not a virtual thread. It is mapped to a dedicated OS thread for execution. The mapping is never modified.

Virtual Thread

A Java thread handled entirely by the Java runtime in user mode. For execution it is mounted on a carrier thread. It is lightweight compared to platform threads. It can be created faster, it consumes less memory, it executes as fast as platform threads, and there can be many more virtual than platform threads.

References

JEP 444: Virtual Threads

https://openjdk.org/jeps/444

JEP 462: Structured Concurrency (Second Preview)

https://openjdk.org/jeps/462

JEP 464: Scoped Values (Second Preview)

https://openjdk.org/jeps/464

JDK 21 Documentation: Java Core Libraries, Concurrency

Covers Virtual Threads, Structured Concurrency, and Thread-Local Variables.
https://docs.oracle.com/en/java/javase/21/core/concurrency.html

Clone this wiki locally