-
Notifications
You must be signed in to change notification settings - Fork 96
Essential Information on Virtual Threads
Virtual Threads are a new and exciting feature of Java. They promise performance and compatibility but there are also subtle differences to ordinary threads and even pitfalls which you should be aware of before making use of virtual threads in important systems.
On this page we collect essential information and references to other sources (first and foremost JEP 444 on Virtual Threads).
Being JDK and JVM developers we don't have own experience from building systems using virtual threads nevertheless we have gained knowledge porting VM continuations in the hotspot VM to PPC and supporting SapMachine users in their endeavors with virtual threads.
Virtual threads are a means to scale up and keep CPUs busy while waiting for something (typically I/O). Replacing platform threads with virtual threads alone will probably not improve the performance of the system. It is the scaling beyond the limited number of platform threads that will bring improvements in throughput.
So you should use virtual threads if your application:
-
Does a significant amount of I/O.
It might also help if there are waits for java.util.concurrent locks and conditions if they are not an obstacle themselves for scaling. -
Has at least 10,000 independent tasks at every point in time.
Vice versa, if your application is mostly doing computations, never waiting for input or if there are not that many independent tasks then you cannot expect improvements from virtual threads. In the former case (computation heavy) performance might even decrease. In the latter case it might be possible to refactor the application to create more independent tasks to get a speedup from virtual threads.
It is recommended to start a new temporary virtual thread per request. This will work well as creation and start of virtual threads is extremely lightweight. It merely means allocating a few objects and calling the runnable.
With virtual threads you will be able to achieve a higher throughput of requests since you can have more virtual threads than platform threads.
You might also want to improve the latency of requests. You can make use of virtual threads for this too if handling a request involves independent I/O operations (e.g. querying other systems) or accessing java.util.concurrent data structures that might block. Executing the operations concurrently in new temporary virtual threads will reduce the latency of the request if the system is not yet fully loaded. This comes at a cost though: stack traces of the temporary threads won't show the request context they are running in. This problem resembles the issues of asynchronous programming with callbacks.
Thread local values have design flaws that can have bigger impact with virtual threads (example is given here). Scoped values (still in preview) are supposed to reduce complexity and improve security and performance. It is recommended to use scoped values instead of thread local values.
When splitting a task into subtasks to be executed concurrently in virtual threads it is easy to make mistakes that can cause issues like leaks and you will lose the context (e.g. the request handled by the parent task) which complicates analysis of thread dumps. Since jdk 19 ExecutorService can be used in try-with-resources statements which is helpful but Structured Concurrency will go beyond. It imposes the static structure of code blocks on dynamically created subtasks thereby forming a tree which is used to control the life time of subtasks.
Also, when dumping stacks with jcmd <pid> Thread.dump_to_file -format=json
subtasks will be grouped and put into the context of their parent (see JDK 21 Documentation).
It is in general not optimal to hold a lock while doing I/O because I/O has significant latency which is added to the threads waiting for the lock. With virtual threads it is especially problematic because a virtual thread is pinned to its carrier thread while it has Java objects locked (see limitations).
A thread pool is the anti-pattern of how you should use virtual threads (see above). Creation and start of a virtual thread is (at least) an order of magnitude faster than creating a platform thread because it literally takes just few object allocations plus initialization to spin up a virtual thread. So the overhead of a pool is not amortized. Another issue with pooling could be that once GC has visited virtual threads (and their StackChunks
) certain fast paths are not available anymore. The micro benchmark below demonstrates this.
Busy waits are never preempted by the scheduler (see limitations). Busy waiting threads block their carrier threads, maybe even preventing another virtual thread from running and doing the work they are waiting for causing a dead lock.
The virtual thread scheduler does not preempt calculations so there is no value added by executing pure calculation tasks in virtual threads. After creating a couple of virtual threads for such tasks you won't be able to create more until prior tasks have completed. If a deadlock occurs might depend on the load of the system.
All relevant limitations are related to situations were a virtual thread cannot unmount from the carrier.
If this is due to locking of a Java object monitor or due to JNI then the virtual thread is said to be pinned. In this case the scheduler will not compensate, in all other cases it will compensate for the blocked carrier by temporarily adding a new one to the pool (see also section "Executing virtual threads" in JEP 444)
The limitations can become bottlenecks and cause even deadlocks if all carriers are occupied by threads that don't unmount. How severely your application is affected will also depend on the load so it is advisable to conduct stress tests under high load.
Fixed in JDK 24 by JEP 491: Synchronize Virtual Threads without Pinning
A thread is pinned if a thread is executing inside a synchronized block or method or if it is waiting to enter a synchronized block or method.
Should this become an issue it is recommended to replace the synchronization with classes from java.util.concurrent if possible.
Work has been done to overcome this limitation. It is still experimental but you can test drive early access builds of it.
Active JNI calls pin a virtual thread to its carrier. This, for instance, can happen with classloading. If the loading class has a static initializer then the vm will call it with active JNI calls on the stack. The thread will be pinned while the static initializer is running.
Fixed in JDK 24 by JEP 491: Synchronize Virtual Threads without Pinning
A virtual thread blocked in Object::wait cannot unmout. The scheduler compensates for this by adding a temporary carrier to the pool if needed (see Object::wait and Block::begin).
It is rather common that a larger number of threads are blocked in Object::wait to be notified about an event. These cases need the compensation to avoid deadlocks. In contrast to that it is uncommon that many threads are blocked in the attempt to enter a synchronized block or method as it would prevent scaling. So the compensation is not needed. At least in most cases it is not needed. There may be exceptional situations where many threads are blocked waiting to enter a synchronized block. These are problematic and can cause deadlocks.
Early access builds contain experimental work that allows to unmount a virtual thread when it calls Object::wait.
There are other operations handled likewise. Most of them are related to file I/O (filesystems rarely support asynchronous I/O).
Locks in java.util.concurrent.lock might cause issues with the order that threads are notified when a lock becomes available.
If, e.g., virtual threads v1 and v2 are waiting for a lock that is released and v1 is notified instead of v2 which is pinned to its carrier then this can lead to a deadlock if no carrier is available for v1.
The scheduler does not preempt virtual threads based on CPU time consumption. This can become an issue if there are too many virtual threads doing long computations.
Due to the compatibility of the new APIs (e.g. VirtualThread
is a subclass of java.lang.Thread
), the initial effort might be not too high to change an existing application to make use of virtual threads.
Though chances are that under real load you face issues caused by the limitations of the current implementation (jdk 21).
Also it is not unlikely that the existing application requires several tuning cycles before you see performance improvements you hope for if there are bottlenecks that where not yet triggered because of the the limits of platform threads.
A TrivialVthreadMicroBenchmark demonstrates some of the claims given here.
-
Virtual threads start up much faster
-
There can be much more virtual threads than platform threads
-
Pooling platform threads is needed and not needed with virtual threads
-
Better performance with virtual threads than with platform threads.
First guess would be: better cache performance when using the stack memory of the few carrier threads. -
With thousands of platform threads it takes extremely long until the process has terminated
-
100K virtual threads are faster than 30K platform threads
-
Slowdown if GC has visited the virtual threads
-
The benchmark uses java.util.concurrent synchronization to avoid issues of java monitors if used with virtual threads.
If you want you can try to write an alternative implementation of CountDownLatch that is based on java monitors (i.e. synchronized). The benchmark will deadlock then because calling Object::wait() will pin the virtual thread.
Property to control the number of carrier threads in the pool. By default set to Runtime.getRuntime().availableProcessors()
.
Property to control the maximum number of carrier threads in the pool. By default set to 256.
Property to control when to temporarily add carrier threads to the pool. By default set to the maximum of 1 and parallelism / 2. If less carrier threads are available for execution of virtual threads, then the number of carriers may temporarily increase. This can improve throughput but starting a new platform thread also means overhead.
Refer to limitations for conditions where carrier threads are temporarily added to the pool
Only the jcmd command Thread.dump_to_file will show the stacks of virtual threads.
The carrier threads are normally from a ForkJoin thread pool with the name prefix "ForkJoinPool-1".
In this example carrier threads that have a virtual thread mounted have stacks with the following frames on top
java.base/jdk.internal.vm.Continuation.run(Continuation.java:248)
java.base/java.lang.VirtualThread.runContinuation(VirtualThread.java:221)
The carriers are blocked running virtual threads. From looking at their stacks it is not possible to say what the virtual threads are doing. In the example all virtual threads are either waiting to start or blocked waiting for a Java object monitor pinned to their carrier except for one thread that owns the monitor and executes the method consumeCPU()
. Note that the pinning is the bottleneck that prevented all but 4 virtual threads from starting. 4 threads could be started because 4 carrier threads were configured using the property jdk.virtualThreadScheduler.parallelism
Pinning can be avoided by using a java.util.concurrent.locks.ReentrantLock. All threads were able to start because when blocking on the lock the threads unmount from their carrier. Now only one carrier has a virtual thread mounted. Others are waiting for a virtual thread to become runnable.
If you select -format=json
then tasks created with the Structured Concurrent API will be grouped according to the scopes (an example is given here)
By setting jdk.tracePinnedThreads you will get a stack trace if a virtual thread cannot be unmounted because it is pinned. For a shorter trace you can specify -Djdk.tracePinnedThreads=short. Each stack will be printed at most once.
There are a bunch of events related to virtual threads.
Virtual threads are hidden by default when debugging. You don't see them and they don't stop at breakpoints. You can change this behaviour when launching your application. One way to do this is to pass the option includevirtualthreads=y
to the JDWP agent.
java -agentlib:jdwp=transport=dt_socket,address=8000,server=y,suspend=n,includevirtualthreads=y Example.java
One reason for hiding virtual threads is that stopping at a breakpoint pins the virtual thread. If to many threads hit the breakpoint this will deadlock the vm.
Local variables in virtual threads can currently only be changed in the topmost frame.
A Platform thread that the scheduler uses to mount a virtual thread for execution. The scheduler unmounts the virtual thread if an operation blocks unless the virtual thread is pinned or unmounting is not possible with the operation (see limitations).
Mounting/unmounting is effectively switching context between carrier and virtual thread. The switching is done in user mode.
A carrier is blocked while a virtual thread is mounted. A virtual thread is blocked while unmounted.
A virtual thread is said to be pinned to its carrier thread if
- it is executing in a synchronized method or block
(fixed in JDK 24 by JEP 491) - waiting to enter a synchronized method or block
(fixed in JDK 24 by JEP 491) - or has a JNI call on stack.
The virtual thread cannot unmount when executing a blocking operation while pinned, i.e. it blocks its carrier. The scheduler does not compensate for this (see limitations).
A Java thread that is not a virtual thread. It is mapped to a dedicated OS thread for execution. The mapping is never modified.
A Java thread handled entirely by the Java runtime in user mode. For execution it is mounted on a carrier thread. It is lightweight compared to platform threads. It can be created faster, it consumes less memory, it executes as fast as platform threads, and there can be many more virtual than platform threads.
Covers Virtual Threads, Structured Concurrency, and Thread-Local Variables.
https://docs.oracle.com/en/java/javase/21/core/concurrency.html