-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Garbage Collection #8154
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
8 Ignored Deployments
|
🟢 Turbopack Benchmark CI successful 🟢Thanks |
Logs
See job summary for details |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get through the whole review, but I left a few comments: some suggestions, some questions, and some observations.
pub fn run_gc( | ||
&self, | ||
idle: bool, | ||
_turbo_tasks: &dyn TurboTasksBackendApi<MemoryBackend>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this argument be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this is such a huge change, it would be nice if we could include some high-level benchmarks alongside it to make sure we're not significantly regressing on peak memory usage or CPU.
This all seems a bit different from a generational GC as I'm familiar with (in the context of tracing GCs). Maybe the terminology "generation" being used in a different context is what's confusing to me, though it's not wrong as a description of what this is. This seems more like a bucketed LRU cache.
In my understanding of generational GC:
- Objects in the older generation are less likely to be collected / iterated over. This seems like the inverse of that.
- Older objects get moved to a separate tier (possibly one of many tiers) of "survivors" that are less frequently traversed. We don't have any sort of logic for that.
That leads me to a few thoughts on potential future ways to improve this:
- Consider a "segmented LRU", which shares some similarities to generational tracing GCs: https://memcached.org/blog/modern-lru/
- There are edge-cases where LRU cache evictions can severely degrade if the cache is too small to contain all frequently accessed items. If we think this is potential concern, we could add some amount of randomization to cache eviction, which would lead to a more graceful degradation.
Yep that's true. We don't write a real GC in the sense of collecting unreferenced memory, as "unreferenced" doesn't exist in our system. We write a cache, where any cache entry might be access anytime in future, but also when can evict any cache entry anytime without hurting correctness (but only performance). So we want the opposite of a generational GC. We want older cache entries to be more likely to be evicted. But we also want the memory usage and compute time of cache entries to influence the evicting behavior. So we bucket cache entries into buckets of (currently) 100,000 items. We call them generations. When under memory pressure we start processing the oldest generation. We select 30% of cache entries that have the highest GC priority and GC collect them. The 70% remaining cache entries are pushed into the bucket at of the freshest generation. So they are intermixed with that generation. That will reconsider these entries for GC when we cycled through all old generations. There is a maximum of (currently) 200,000 items per bucket. Buckets are split evenly into two buckets when they get too full. Using buckets for age is kind of nice as it avoids having to include age (which is continuously increasing) into the priority and it also avoids having to sort all cache items into a very big priority queue. We basically don't have to sort anything until GC is invoked and then we only have to sort a "small" bucket of items. |
### Description Simplify and improve GC This improves the GC queue. The job of the GC queue is to find tasks that should be garbage collected. There are three factors which influence that: * age of the task: Time since last access. * memory usage of the task * compute duration of the task: CPU time spend to compute the task. Memory usage and compute duration combine into a GC priority by calculating: `(memory_usage + C1) / (compute_duration + C2)`. C1 and C2 and constants to fine tune the priority. The age of the task is constantly changing so a different scheme is used here: Every task has a generation in which is was last accessed. The generation is increased every 100,000 tasks. We accumulate tasks in the current generation in a concurrent queue. Once 100,000 tasks are reached (atomic counter), we increase the generation and pop 100,000 tasks from the queue into an `OldGeneration`. These old generations are stored in another queue. No storing is apply so far. These are just lists of task ids. Once we need to perform GC, we pop the oldest old generation from the queue, filter out all tasks that are in a higher generation (they have been accessed in the meantime), and sort the list by GC priority. Then we take the 30% top tasks and garbage collect them. Then remaining tasks are pushed to the front of the queue again, intermixed with other tasks into existing old generations until we reach a maximum of 200,000 tasks in a generation item. In that case the generation item is split into two items. ### Testing Instructions <!-- Give a quick description of steps to test your changes. -->
### Description Simplify and improve GC This improves the GC queue. The job of the GC queue is to find tasks that should be garbage collected. There are three factors which influence that: * age of the task: Time since last access. * memory usage of the task * compute duration of the task: CPU time spend to compute the task. Memory usage and compute duration combine into a GC priority by calculating: `(memory_usage + C1) / (compute_duration + C2)`. C1 and C2 and constants to fine tune the priority. The age of the task is constantly changing so a different scheme is used here: Every task has a generation in which is was last accessed. The generation is increased every 100,000 tasks. We accumulate tasks in the current generation in a concurrent queue. Once 100,000 tasks are reached (atomic counter), we increase the generation and pop 100,000 tasks from the queue into an `OldGeneration`. These old generations are stored in another queue. No storing is apply so far. These are just lists of task ids. Once we need to perform GC, we pop the oldest old generation from the queue, filter out all tasks that are in a higher generation (they have been accessed in the meantime), and sort the list by GC priority. Then we take the 30% top tasks and garbage collect them. Then remaining tasks are pushed to the front of the queue again, intermixed with other tasks into existing old generations until we reach a maximum of 200,000 tasks in a generation item. In that case the generation item is split into two items. ### Testing Instructions <!-- Give a quick description of steps to test your changes. -->
### Description Simplify and improve GC This improves the GC queue. The job of the GC queue is to find tasks that should be garbage collected. There are three factors which influence that: * age of the task: Time since last access. * memory usage of the task * compute duration of the task: CPU time spend to compute the task. Memory usage and compute duration combine into a GC priority by calculating: `(memory_usage + C1) / (compute_duration + C2)`. C1 and C2 and constants to fine tune the priority. The age of the task is constantly changing so a different scheme is used here: Every task has a generation in which is was last accessed. The generation is increased every 100,000 tasks. We accumulate tasks in the current generation in a concurrent queue. Once 100,000 tasks are reached (atomic counter), we increase the generation and pop 100,000 tasks from the queue into an `OldGeneration`. These old generations are stored in another queue. No storing is apply so far. These are just lists of task ids. Once we need to perform GC, we pop the oldest old generation from the queue, filter out all tasks that are in a higher generation (they have been accessed in the meantime), and sort the list by GC priority. Then we take the 30% top tasks and garbage collect them. Then remaining tasks are pushed to the front of the queue again, intermixed with other tasks into existing old generations until we reach a maximum of 200,000 tasks in a generation item. In that case the generation item is split into two items. ### Testing Instructions <!-- Give a quick description of steps to test your changes. -->
### Description Simplify and improve GC This improves the GC queue. The job of the GC queue is to find tasks that should be garbage collected. There are three factors which influence that: * age of the task: Time since last access. * memory usage of the task * compute duration of the task: CPU time spend to compute the task. Memory usage and compute duration combine into a GC priority by calculating: `(memory_usage + C1) / (compute_duration + C2)`. C1 and C2 and constants to fine tune the priority. The age of the task is constantly changing so a different scheme is used here: Every task has a generation in which is was last accessed. The generation is increased every 100,000 tasks. We accumulate tasks in the current generation in a concurrent queue. Once 100,000 tasks are reached (atomic counter), we increase the generation and pop 100,000 tasks from the queue into an `OldGeneration`. These old generations are stored in another queue. No storing is apply so far. These are just lists of task ids. Once we need to perform GC, we pop the oldest old generation from the queue, filter out all tasks that are in a higher generation (they have been accessed in the meantime), and sort the list by GC priority. Then we take the 30% top tasks and garbage collect them. Then remaining tasks are pushed to the front of the queue again, intermixed with other tasks into existing old generations until we reach a maximum of 200,000 tasks in a generation item. In that case the generation item is split into two items. ### Testing Instructions <!-- Give a quick description of steps to test your changes. -->
### Description Simplify and improve GC This improves the GC queue. The job of the GC queue is to find tasks that should be garbage collected. There are three factors which influence that: * age of the task: Time since last access. * memory usage of the task * compute duration of the task: CPU time spend to compute the task. Memory usage and compute duration combine into a GC priority by calculating: `(memory_usage + C1) / (compute_duration + C2)`. C1 and C2 and constants to fine tune the priority. The age of the task is constantly changing so a different scheme is used here: Every task has a generation in which is was last accessed. The generation is increased every 100,000 tasks. We accumulate tasks in the current generation in a concurrent queue. Once 100,000 tasks are reached (atomic counter), we increase the generation and pop 100,000 tasks from the queue into an `OldGeneration`. These old generations are stored in another queue. No storing is apply so far. These are just lists of task ids. Once we need to perform GC, we pop the oldest old generation from the queue, filter out all tasks that are in a higher generation (they have been accessed in the meantime), and sort the list by GC priority. Then we take the 30% top tasks and garbage collect them. Then remaining tasks are pushed to the front of the queue again, intermixed with other tasks into existing old generations until we reach a maximum of 200,000 tasks in a generation item. In that case the generation item is split into two items. ### Testing Instructions <!-- Give a quick description of steps to test your changes. -->
Description
Simplify and improve GC
This improves the GC queue.
The job of the GC queue is to find tasks that should be garbage collected. There are three factors which influence that:
Memory usage and compute duration combine into a GC priority by calculating:
(memory_usage + C1) / (compute_duration + C2)
. C1 and C2 and constants to fine tune the priority.The age of the task is constantly changing so a different scheme is used here:
Every task has a generation in which is was last accessed.
The generation is increased every 100,000 tasks.
We accumulate tasks in the current generation in a concurrent queue. Once 100,000 tasks are reached (atomic counter), we increase the generation and pop 100,000 tasks from the queue into an
OldGeneration
. These old generations are stored in another queue. No storing is apply so far. These are just lists of task ids.Once we need to perform GC, we pop the oldest old generation from the queue, filter out all tasks that are in a higher generation (they have been accessed in the meantime), and sort the list by GC priority.
Then we take the 30% top tasks and garbage collect them.
Then remaining tasks are pushed to the front of the queue again, intermixed with other tasks into existing old generations until we reach a maximum of 200,000 tasks in a generation item. In that case the generation item is split into two items.
Testing Instructions