Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Garbage Collection #8154

Merged
merged 15 commits into from
May 17, 2024
Merged

Improve Garbage Collection #8154

merged 15 commits into from
May 17, 2024

Conversation

sokra
Copy link
Member

@sokra sokra commented May 15, 2024

Description

Simplify and improve GC

This improves the GC queue.

The job of the GC queue is to find tasks that should be garbage collected. There are three factors which influence that:

  • age of the task: Time since last access.
  • memory usage of the task
  • compute duration of the task: CPU time spend to compute the task.

Memory usage and compute duration combine into a GC priority by calculating: (memory_usage + C1) / (compute_duration + C2). C1 and C2 and constants to fine tune the priority.

The age of the task is constantly changing so a different scheme is used here:

Every task has a generation in which is was last accessed.
The generation is increased every 100,000 tasks.

We accumulate tasks in the current generation in a concurrent queue. Once 100,000 tasks are reached (atomic counter), we increase the generation and pop 100,000 tasks from the queue into an OldGeneration. These old generations are stored in another queue. No storing is apply so far. These are just lists of task ids.

Once we need to perform GC, we pop the oldest old generation from the queue, filter out all tasks that are in a higher generation (they have been accessed in the meantime), and sort the list by GC priority.
Then we take the 30% top tasks and garbage collect them.
Then remaining tasks are pushed to the front of the queue again, intermixed with other tasks into existing old generations until we reach a maximum of 200,000 tasks in a generation item. In that case the generation item is split into two items.

Testing Instructions

@sokra sokra requested a review from a team as a code owner May 15, 2024 21:43
Copy link

vercel bot commented May 15, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
examples-nonmonorepo ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 17, 2024 2:47pm
rust-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 17, 2024 2:47pm
8 Ignored Deployments
Name Status Preview Comments Updated (UTC)
examples-basic-web ⬜️ Ignored (Inspect) Visit Preview May 17, 2024 2:47pm
examples-designsystem-docs ⬜️ Ignored (Inspect) Visit Preview May 17, 2024 2:47pm
examples-gatsby-web ⬜️ Ignored (Inspect) Visit Preview May 17, 2024 2:47pm
examples-kitchensink-blog ⬜️ Ignored (Inspect) Visit Preview May 17, 2024 2:47pm
examples-native-web ⬜️ Ignored (Inspect) Visit Preview May 17, 2024 2:47pm
examples-svelte-web ⬜️ Ignored (Inspect) Visit Preview May 17, 2024 2:47pm
examples-tailwind-web ⬜️ Ignored (Inspect) Visit Preview May 17, 2024 2:47pm
examples-vite-web ⬜️ Ignored (Inspect) Visit Preview May 17, 2024 2:47pm

Copy link
Contributor

github-actions bot commented May 15, 2024

🟢 Turbopack Benchmark CI successful 🟢

Thanks

Copy link
Contributor

github-actions bot commented May 15, 2024

⚠️ This change may fail to build next-swc.

Logs

error: failed to select a version for `swc_common`.
    ... required by package `swc_core v0.92.5`
    ... which satisfies dependency `swc_core = "^0.92.5"` of package `turbopack-binding v0.1.0 (https://github.com/vercel/turbo?rev=2f789bdf72d82754f49bf41c86752b3d831d9a6d#11e59b96)`
    ... which satisfies git dependency `turbopack-binding` (locked to 0.1.0) of package `next-swc-napi v0.0.0 (/root/actions-runner/_work/turbo/turbo/packages/next-swc/crates/napi)`
versions that meet the requirements `^0.33.26` are: 0.33.26

all possible versions conflict with previously selected packages.

  previously selected package `swc_common v0.33.24`
    ... which satisfies dependency `swc_common = "^0.33.20"` (locked to 0.33.24) of package `swc_core v0.90.33`
    ... which satisfies dependency `swc_core = "^0.90.33"` (locked to 0.90.33) of package `wasm v0.0.0 (/root/actions-runner/_work/turbo/turbo/packages/next-swc/crates/wasm)`

failed to select a version for `swc_common` which could resolve this conflict

See job summary for details

Copy link
Contributor

github-actions bot commented May 15, 2024

⚠️ CI failed ⚠️

The following steps have failed in CI:

  • Turbopack Rust tests (mac/win, non-blocking)

See workflow summary for details

Copy link
Member

@bgw bgw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get through the whole review, but I left a few comments: some suggestions, some questions, and some observations.

pub fn run_gc(
&self,
idle: bool,
_turbo_tasks: &dyn TurboTasksBackendApi<MemoryBackend>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this argument be removed?

Copy link
Member

@bgw bgw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is such a huge change, it would be nice if we could include some high-level benchmarks alongside it to make sure we're not significantly regressing on peak memory usage or CPU.


This all seems a bit different from a generational GC as I'm familiar with (in the context of tracing GCs). Maybe the terminology "generation" being used in a different context is what's confusing to me, though it's not wrong as a description of what this is. This seems more like a bucketed LRU cache.

In my understanding of generational GC:

  • Objects in the older generation are less likely to be collected / iterated over. This seems like the inverse of that.
  • Older objects get moved to a separate tier (possibly one of many tiers) of "survivors" that are less frequently traversed. We don't have any sort of logic for that.

That leads me to a few thoughts on potential future ways to improve this:

@sokra
Copy link
Member Author

sokra commented May 17, 2024

generational GC:

  • Objects in the older generation are less likely to be collected / iterated over. This seems like the inverse of that.

Yep that's true. We don't write a real GC in the sense of collecting unreferenced memory, as "unreferenced" doesn't exist in our system. We write a cache, where any cache entry might be access anytime in future, but also when can evict any cache entry anytime without hurting correctness (but only performance).

So we want the opposite of a generational GC. We want older cache entries to be more likely to be evicted. But we also want the memory usage and compute time of cache entries to influence the evicting behavior.

So we bucket cache entries into buckets of (currently) 100,000 items. We call them generations. When under memory pressure we start processing the oldest generation. We select 30% of cache entries that have the highest GC priority and GC collect them. The 70% remaining cache entries are pushed into the bucket at of the freshest generation. So they are intermixed with that generation. That will reconsider these entries for GC when we cycled through all old generations.

There is a maximum of (currently) 200,000 items per bucket. Buckets are split evenly into two buckets when they get too full.

Using buckets for age is kind of nice as it avoids having to include age (which is continuously increasing) into the priority and it also avoids having to sort all cache items into a very big priority queue. We basically don't have to sort anything until GC is invoked and then we only have to sort a "small" bucket of items.

@sokra sokra merged commit 864a6ad into main May 17, 2024
47 of 50 checks passed
@sokra sokra deleted the sokra/gc branch May 17, 2024 15:26
Neosoulink pushed a commit to Neosoulink/turbo that referenced this pull request Jun 14, 2024
### Description

Simplify and improve GC

This improves the GC queue.

The job of the GC queue is to find tasks that should be garbage
collected. There are three factors which influence that:

* age of the task: Time since last access.
* memory usage of the task
* compute duration of the task: CPU time spend to compute the task.

Memory usage and compute duration combine into a GC priority by
calculating: `(memory_usage + C1) / (compute_duration + C2)`. C1 and C2
and constants to fine tune the priority.

The age of the task is constantly changing so a different scheme is used
here:

Every task has a generation in which is was last accessed.
The generation is increased every 100,000 tasks.

We accumulate tasks in the current generation in a concurrent queue.
Once 100,000 tasks are reached (atomic counter), we increase the
generation and pop 100,000 tasks from the queue into an `OldGeneration`.
These old generations are stored in another queue. No storing is apply
so far. These are just lists of task ids.

Once we need to perform GC, we pop the oldest old generation from the
queue, filter out all tasks that are in a higher generation (they have
been accessed in the meantime), and sort the list by GC priority.
Then we take the 30% top tasks and garbage collect them.
Then remaining tasks are pushed to the front of the queue again,
intermixed with other tasks into existing old generations until we reach
a maximum of 200,000 tasks in a generation item. In that case the
generation item is split into two items.

### Testing Instructions

<!--
  Give a quick description of steps to test your changes.
-->
ForsakenHarmony pushed a commit to vercel/next.js that referenced this pull request Jul 25, 2024
### Description

Simplify and improve GC

This improves the GC queue.

The job of the GC queue is to find tasks that should be garbage
collected. There are three factors which influence that:

* age of the task: Time since last access.
* memory usage of the task
* compute duration of the task: CPU time spend to compute the task.

Memory usage and compute duration combine into a GC priority by
calculating: `(memory_usage + C1) / (compute_duration + C2)`. C1 and C2
and constants to fine tune the priority.

The age of the task is constantly changing so a different scheme is used
here:

Every task has a generation in which is was last accessed.
The generation is increased every 100,000 tasks.

We accumulate tasks in the current generation in a concurrent queue.
Once 100,000 tasks are reached (atomic counter), we increase the
generation and pop 100,000 tasks from the queue into an `OldGeneration`.
These old generations are stored in another queue. No storing is apply
so far. These are just lists of task ids.

Once we need to perform GC, we pop the oldest old generation from the
queue, filter out all tasks that are in a higher generation (they have
been accessed in the meantime), and sort the list by GC priority.
Then we take the 30% top tasks and garbage collect them.
Then remaining tasks are pushed to the front of the queue again,
intermixed with other tasks into existing old generations until we reach
a maximum of 200,000 tasks in a generation item. In that case the
generation item is split into two items.

### Testing Instructions

<!--
  Give a quick description of steps to test your changes.
-->
ForsakenHarmony pushed a commit to vercel/next.js that referenced this pull request Jul 29, 2024
### Description

Simplify and improve GC

This improves the GC queue.

The job of the GC queue is to find tasks that should be garbage
collected. There are three factors which influence that:

* age of the task: Time since last access.
* memory usage of the task
* compute duration of the task: CPU time spend to compute the task.

Memory usage and compute duration combine into a GC priority by
calculating: `(memory_usage + C1) / (compute_duration + C2)`. C1 and C2
and constants to fine tune the priority.

The age of the task is constantly changing so a different scheme is used
here:

Every task has a generation in which is was last accessed.
The generation is increased every 100,000 tasks.

We accumulate tasks in the current generation in a concurrent queue.
Once 100,000 tasks are reached (atomic counter), we increase the
generation and pop 100,000 tasks from the queue into an `OldGeneration`.
These old generations are stored in another queue. No storing is apply
so far. These are just lists of task ids.

Once we need to perform GC, we pop the oldest old generation from the
queue, filter out all tasks that are in a higher generation (they have
been accessed in the meantime), and sort the list by GC priority.
Then we take the 30% top tasks and garbage collect them.
Then remaining tasks are pushed to the front of the queue again,
intermixed with other tasks into existing old generations until we reach
a maximum of 200,000 tasks in a generation item. In that case the
generation item is split into two items.

### Testing Instructions

<!--
  Give a quick description of steps to test your changes.
-->
ForsakenHarmony pushed a commit to vercel/next.js that referenced this pull request Jul 29, 2024
### Description

Simplify and improve GC

This improves the GC queue.

The job of the GC queue is to find tasks that should be garbage
collected. There are three factors which influence that:

* age of the task: Time since last access.
* memory usage of the task
* compute duration of the task: CPU time spend to compute the task.

Memory usage and compute duration combine into a GC priority by
calculating: `(memory_usage + C1) / (compute_duration + C2)`. C1 and C2
and constants to fine tune the priority.

The age of the task is constantly changing so a different scheme is used
here:

Every task has a generation in which is was last accessed.
The generation is increased every 100,000 tasks.

We accumulate tasks in the current generation in a concurrent queue.
Once 100,000 tasks are reached (atomic counter), we increase the
generation and pop 100,000 tasks from the queue into an `OldGeneration`.
These old generations are stored in another queue. No storing is apply
so far. These are just lists of task ids.

Once we need to perform GC, we pop the oldest old generation from the
queue, filter out all tasks that are in a higher generation (they have
been accessed in the meantime), and sort the list by GC priority.
Then we take the 30% top tasks and garbage collect them.
Then remaining tasks are pushed to the front of the queue again,
intermixed with other tasks into existing old generations until we reach
a maximum of 200,000 tasks in a generation item. In that case the
generation item is split into two items.

### Testing Instructions

<!--
  Give a quick description of steps to test your changes.
-->
ForsakenHarmony pushed a commit to vercel/next.js that referenced this pull request Aug 1, 2024
### Description

Simplify and improve GC

This improves the GC queue.

The job of the GC queue is to find tasks that should be garbage
collected. There are three factors which influence that:

* age of the task: Time since last access.
* memory usage of the task
* compute duration of the task: CPU time spend to compute the task.

Memory usage and compute duration combine into a GC priority by
calculating: `(memory_usage + C1) / (compute_duration + C2)`. C1 and C2
and constants to fine tune the priority.

The age of the task is constantly changing so a different scheme is used
here:

Every task has a generation in which is was last accessed.
The generation is increased every 100,000 tasks.

We accumulate tasks in the current generation in a concurrent queue.
Once 100,000 tasks are reached (atomic counter), we increase the
generation and pop 100,000 tasks from the queue into an `OldGeneration`.
These old generations are stored in another queue. No storing is apply
so far. These are just lists of task ids.

Once we need to perform GC, we pop the oldest old generation from the
queue, filter out all tasks that are in a higher generation (they have
been accessed in the meantime), and sort the list by GC priority.
Then we take the 30% top tasks and garbage collect them.
Then remaining tasks are pushed to the front of the queue again,
intermixed with other tasks into existing old generations until we reach
a maximum of 200,000 tasks in a generation item. In that case the
generation item is split into two items.

### Testing Instructions

<!--
  Give a quick description of steps to test your changes.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants