Reorganize src/scheduler/gc_work.rs #1279

wks · 2025-02-26T04:58:48Z

The file src/scheduler/gc_work.rs contains about 1200 lines of code, with more than 20 structs and more than 50 impl items.

There are simply too many items, and
they are not ordered or grouped in the way they are used. Not by use cases, and not in the temporal order in a GC.

This made it difficult for developers to find the right type or function when they want to make modification to MMTk. But one design goal of MMTk is to make this easy.

Reorganizing this file into multiple files

We can categorize the work packets and traits by their use cases.

Unless specified, items listed below are struct items.

Generic GC scheduling work packets.
- ScheduleCollection
- StopMutators
- Prepare
- PrepareMutator
- PrepareCollector
- Release
- ReleaseMutator
- ReleaseCollector
- VMPostForwarding (running concurrently with Release)
Root scanning
- ScanMutatorRoots
- ScanVMSpecificRoots
- ProcessEdgesWorkRootsWorkFactory (implements RootsWorkFactory defined in src/vm/scanning.rs)
- enum RootsKind
Tracing (transitive closure)
- Slots/edges
  - ProcessEdgesBase (This is all about processing slots instead of edges).
  - trait ProcessEdgesWork (This part really needs to be refactored. ProcessEdgesWork::trace_object shouldn't be specific to slot-enqueued tracing.)
  - SFTProcessEdges
  - PlanProcessEdges
  - ProcessEdgesWorkTracer (implements ObjectTracer defined in src/vm/scanning.rs)
  - UnsupportedProcessEdges
- Nodes
  - trait ScanObjectsWork
  - ScanObjects
  - PlanScanObjects
  - ProcessRootNode (for pinning and transitive-pinning roots)
Finalizer and weak references
- ProcessEdgesWorkTracerContext (implements TracerContext defined in src/vm/scanning.rs)
- VMProcessWeakRefs
- VMForwardWeakRefs

Refactoring slots-processing work packets

In the early stage of the Rust port, MMTk was very focused on slot-enqueuing tracing because that's what OpenJDK and JikesRVM do. One design decision which I think is bad is that it mixed up "edge processing" and "slots processing".

Slot processing: For each slot in a list of slots, load from it, call trace_object, and store back forwarded reference.
Edge visiting: The trace_object(obj) -> new_obj method. This effectively visits an object graph edge that points to obj, regardless whether obj is loaded from a given Slot or obtained in VM-specific ways in Scanning::scan_object_and_trace_edges. In the end, if a VM binding only supports node-enqueuing tracing (such as Ruby), we have to wrap ProcessEdgesWork behind the ObjectTracer trait which only provides the trace_object method (that is, ProcessEdgesWorkTracer).

I had a branch that refactors the trace_object into a dedicated type and leaves the ProcessEdgesWork dedicated to slot processing. The code is here: #1278, but it is very old.

And #599 contains more information about my intended refactoring.

The text was updated successfully, but these errors were encountered:

k-sareen · 2025-02-26T05:08:17Z

We really should rename all of them at the same time. For example ProcessEdgesWorkTracerContext and ProcessEdgesWorkRootsWorkFactory should be renamed. They're too wordy and while are related to ProcessEdgesWork, they don't need to have it in their name.

wks · 2025-02-26T05:20:01Z

We really should rename all of them at the same time. For example ProcessEdgesWorkTracerContext and ProcessEdgesWorkRootsWorkFactory should be renamed. They're too wordy and while are related to ProcessEdgesWork, they don't need to have it in their name.

Maybe we just call it DefaultRootsWorkFactory. There shouldn't be more than one implementation of RootsWorkFactory inside mmtk-core, and they are only instantiated inside mmtk-core, not in the VM binding.

qinsoon · 2025-02-26T05:30:34Z

I think 'too long' is not a good reason to rename a type, if it is clear and not ambiguous. We don't want a short name that is not clear while the old name is more or less accepted.

The criteria for choosing a name when it is first introduced should be different from renaming when the name was already introduced. We need a better reason to rename.

k-sareen · 2025-02-26T06:01:49Z

I mean frankly, they are confusing as well. The verbosity doesn't help.

qinsoon · 2025-02-26T06:05:34Z

I mean frankly, they are confusing as well. The verbosity doesn't help.

Can you elaborate in which way they are confusing?

k-sareen · 2025-02-26T06:13:40Z

I just don't understand why ProcessEdgesWork needs to be in the type name there. All it's suggesting is that the RootsWorkFactory implementation is using ProcessEdgesWork work packet underneath. But that's already clear from the generic type signature, so why put it in the name as well? Same for ProcessEdgesWorkTracer{,Context}.

ProcessEdgesWorkTracerContext is further confusing because there is the similarly named ProcessEdgesWorkTracer and the ObjectTracerContext and then to understand that whole system, you have to look at all three implementations and keep it in your working memory.

In general, I think we are eager to abstract away things into traits when they may not really be necessary. For example, the aforementioned ProcessEdgesWorkTracer and ProcessEdgesWorkTracerContext are the only users of the ObjectTracer and ObjectTracerContext traits. But like I don't understand why it had to be made into a separate trait for example.

wks · 2025-02-26T06:23:01Z

Regarding #1281, maybe we can move VectorQueue back to the scheduler module, specifically in src/scheduler/gc_work/tracing/mod.rs. Although ObjectQueue is about tracing, so are all the ProcessEdges/ScanObjects work packets. And it mainly serves the trace_object method (wherever it is defined).

The trait ObjectQueue was intended to make the trace_object method generic so that newly visited object can be queued anywhere, not just a vector inside ProcessEdgesBase. Specifically, LXR needs to queue objects differently from the regular ProcessEdgesWork work packet and the Scanning::process_weak_ref function. Instead of queuing the objects, LXR actually needs a callback for newly traced objects. But the name ObjectQueue is still OK I think because we all know what "enqueuing" mean in tracing.

wks · 2025-02-26T06:28:32Z

I just don't understand why ProcessEdgesWork needs to be in the type name there. All it's suggesting is that the RootsWorkFactory implementation is using ProcessEdgesWork work packet underneath. But that's already clear from the generic type signature, so why put it in the name as well? Same for ProcessEdgesWorkTracer{,Context}.

I added it because I think ProcessEdgesWork is something that needs to go away. I imagine when we do the refactoring, if we can't get rid of all ProcessEdgesWork in one commit, we will keep it around for a while, and we will have another implementation, such as TracerDelegateRootsWorkFactory (if TracerDelegate were the right name, but I highly doubt). Then we can remove ProcessEdgesWorkRootsWorkFactory when the refactoring finishes.

ProcessEdgesWorkTracerContext is further confusing because there is the similarly named ProcessEdgesWorkTracer and the ObjectTracerContext and then to understand that whole system, you have to look at all three implementations and keep it in your working memory.

Maybe we can rename it to ProcessEdgesWorkObjectTracerContext just to make the naming consistent.

In general, I think we are eager to abstract away things into traits when they may not really be necessary. For example, the aforementioned ProcessEdgesWorkTracer and ProcessEdgesWorkTracerContext are the only users of the ObjectTracer and ObjectTracerContext traits. But like I don't understand why it had to be made into a separate trait for example.

They are made into traits because we don't want to expose their implementations to the users. And a principle of defining a general API is depending on interfaces (traits) instead of depending on implementations. And as I said before, there may be multiple implementations when we do the refactoring. So using traits can hide all those details from the VM binding.

k-sareen · 2025-02-26T06:31:56Z

Maybe we can rename it to ProcessEdgesWorkObjectTracerContext just to make the naming consistent.

I feel like that is making it worse, personally. I feel like this may need to be discussed with everyone, because I do still believe longer type names are bad since it just adds to the noise.

wks · 2025-02-26T06:37:38Z

... longer type names are bad since it just adds to the noise.

That's intended. I want everyone to know it is bad so we want to remove it every time we see it. It is a constant reminder that the status quo is not ideal and it needs some fundamental changes.

qinsoon · 2025-02-26T07:34:21Z

I want everyone to know it is bad so we want to remove it every time we see it.

Please don't do this. The better way is to document and write this down clearly. For example, I have no clue that you deliberately use bad names to 'convey' the idea.

wks · 2025-02-26T07:49:29Z

I want everyone to know it is bad so we want to remove it every time we see it.

Please don't do this. The better way is to document and write this down clearly. For example, I have no clue that you deliberately use bad names to 'convey' the idea.

You are right. It's better to mention this intention in the comment.

wks mentioned this issue Feb 26, 2025

Make work packet sizes easily configurable #1281

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reorganize src/scheduler/gc_work.rs #1279

Reorganize src/scheduler/gc_work.rs #1279

wks commented Feb 26, 2025

k-sareen commented Feb 26, 2025

wks commented Feb 26, 2025

qinsoon commented Feb 26, 2025

k-sareen commented Feb 26, 2025

qinsoon commented Feb 26, 2025

k-sareen commented Feb 26, 2025 •

edited

Loading

wks commented Feb 26, 2025

wks commented Feb 26, 2025

k-sareen commented Feb 26, 2025

wks commented Feb 26, 2025

qinsoon commented Feb 26, 2025

wks commented Feb 26, 2025

Reorganize src/scheduler/gc_work.rs #1279

Reorganize src/scheduler/gc_work.rs #1279

Comments

wks commented Feb 26, 2025

Reorganizing this file into multiple files

Refactoring slots-processing work packets

k-sareen commented Feb 26, 2025

wks commented Feb 26, 2025

qinsoon commented Feb 26, 2025

k-sareen commented Feb 26, 2025

qinsoon commented Feb 26, 2025

k-sareen commented Feb 26, 2025 • edited Loading

wks commented Feb 26, 2025

wks commented Feb 26, 2025

k-sareen commented Feb 26, 2025

wks commented Feb 26, 2025

qinsoon commented Feb 26, 2025

wks commented Feb 26, 2025

k-sareen commented Feb 26, 2025 •

edited

Loading