Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize src/scheduler/gc_work.rs #1279

Open
wks opened this issue Feb 26, 2025 · 12 comments
Open

Reorganize src/scheduler/gc_work.rs #1279

wks opened this issue Feb 26, 2025 · 12 comments

Comments

@wks
Copy link
Collaborator

wks commented Feb 26, 2025

The file src/scheduler/gc_work.rs contains about 1200 lines of code, with more than 20 structs and more than 50 impl items.

  • There are simply too many items, and
  • they are not ordered or grouped in the way they are used. Not by use cases, and not in the temporal order in a GC.

This made it difficult for developers to find the right type or function when they want to make modification to MMTk. But one design goal of MMTk is to make this easy.

Reorganizing this file into multiple files

We can categorize the work packets and traits by their use cases.

Unless specified, items listed below are struct items.

  • Generic GC scheduling work packets.
    • ScheduleCollection
    • StopMutators
    • Prepare
    • PrepareMutator
    • PrepareCollector
    • Release
    • ReleaseMutator
    • ReleaseCollector
    • VMPostForwarding (running concurrently with Release)
  • Root scanning
    • ScanMutatorRoots
    • ScanVMSpecificRoots
    • ProcessEdgesWorkRootsWorkFactory (implements RootsWorkFactory defined in src/vm/scanning.rs)
    • enum RootsKind
  • Tracing (transitive closure)
    • Slots/edges
      • ProcessEdgesBase (This is all about processing slots instead of edges).
      • trait ProcessEdgesWork (This part really needs to be refactored. ProcessEdgesWork::trace_object shouldn't be specific to slot-enqueued tracing.)
      • SFTProcessEdges
      • PlanProcessEdges
      • ProcessEdgesWorkTracer (implements ObjectTracer defined in src/vm/scanning.rs)
      • UnsupportedProcessEdges
    • Nodes
      • trait ScanObjectsWork
      • ScanObjects
      • PlanScanObjects
      • ProcessRootNode (for pinning and transitive-pinning roots)
  • Finalizer and weak references
    • ProcessEdgesWorkTracerContext (implements TracerContext defined in src/vm/scanning.rs)
    • VMProcessWeakRefs
    • VMForwardWeakRefs

Refactoring slots-processing work packets

In the early stage of the Rust port, MMTk was very focused on slot-enqueuing tracing because that's what OpenJDK and JikesRVM do. One design decision which I think is bad is that it mixed up "edge processing" and "slots processing".

  • Slot processing: For each slot in a list of slots, load from it, call trace_object, and store back forwarded reference.
  • Edge visiting: The trace_object(obj) -> new_obj method. This effectively visits an object graph edge that points to obj, regardless whether obj is loaded from a given Slot or obtained in VM-specific ways in Scanning::scan_object_and_trace_edges. In the end, if a VM binding only supports node-enqueuing tracing (such as Ruby), we have to wrap ProcessEdgesWork behind the ObjectTracer trait which only provides the trace_object method (that is, ProcessEdgesWorkTracer).

I had a branch that refactors the trace_object into a dedicated type and leaves the ProcessEdgesWork dedicated to slot processing. The code is here: #1278, but it is very old.

And #599 contains more information about my intended refactoring.

@k-sareen
Copy link
Collaborator

We really should rename all of them at the same time. For example ProcessEdgesWorkTracerContext and ProcessEdgesWorkRootsWorkFactory should be renamed. They're too wordy and while are related to ProcessEdgesWork, they don't need to have it in their name.

@wks
Copy link
Collaborator Author

wks commented Feb 26, 2025

We really should rename all of them at the same time. For example ProcessEdgesWorkTracerContext and ProcessEdgesWorkRootsWorkFactory should be renamed. They're too wordy and while are related to ProcessEdgesWork, they don't need to have it in their name.

Maybe we just call it DefaultRootsWorkFactory. There shouldn't be more than one implementation of RootsWorkFactory inside mmtk-core, and they are only instantiated inside mmtk-core, not in the VM binding.

@qinsoon
Copy link
Member

qinsoon commented Feb 26, 2025

I think 'too long' is not a good reason to rename a type, if it is clear and not ambiguous. We don't want a short name that is not clear while the old name is more or less accepted.

The criteria for choosing a name when it is first introduced should be different from renaming when the name was already introduced. We need a better reason to rename.

@k-sareen
Copy link
Collaborator

I mean frankly, they are confusing as well. The verbosity doesn't help.

@qinsoon
Copy link
Member

qinsoon commented Feb 26, 2025

I mean frankly, they are confusing as well. The verbosity doesn't help.

Can you elaborate in which way they are confusing?

@k-sareen
Copy link
Collaborator

k-sareen commented Feb 26, 2025

I just don't understand why ProcessEdgesWork needs to be in the type name there. All it's suggesting is that the RootsWorkFactory implementation is using ProcessEdgesWork work packet underneath. But that's already clear from the generic type signature, so why put it in the name as well? Same for ProcessEdgesWorkTracer{,Context}.

ProcessEdgesWorkTracerContext is further confusing because there is the similarly named ProcessEdgesWorkTracer and the ObjectTracerContext and then to understand that whole system, you have to look at all three implementations and keep it in your working memory.

In general, I think we are eager to abstract away things into traits when they may not really be necessary. For example, the aforementioned ProcessEdgesWorkTracer and ProcessEdgesWorkTracerContext are the only users of the ObjectTracer and ObjectTracerContext traits. But like I don't understand why it had to be made into a separate trait for example.

@wks
Copy link
Collaborator Author

wks commented Feb 26, 2025

Regarding #1281, maybe we can move VectorQueue back to the scheduler module, specifically in src/scheduler/gc_work/tracing/mod.rs. Although ObjectQueue is about tracing, so are all the ProcessEdges/ScanObjects work packets. And it mainly serves the trace_object method (wherever it is defined).

The trait ObjectQueue was intended to make the trace_object method generic so that newly visited object can be queued anywhere, not just a vector inside ProcessEdgesBase. Specifically, LXR needs to queue objects differently from the regular ProcessEdgesWork work packet and the Scanning::process_weak_ref function. Instead of queuing the objects, LXR actually needs a callback for newly traced objects. But the name ObjectQueue is still OK I think because we all know what "enqueuing" mean in tracing.

@wks
Copy link
Collaborator Author

wks commented Feb 26, 2025

I just don't understand why ProcessEdgesWork needs to be in the type name there. All it's suggesting is that the RootsWorkFactory implementation is using ProcessEdgesWork work packet underneath. But that's already clear from the generic type signature, so why put it in the name as well? Same for ProcessEdgesWorkTracer{,Context}.

I added it because I think ProcessEdgesWork is something that needs to go away. I imagine when we do the refactoring, if we can't get rid of all ProcessEdgesWork in one commit, we will keep it around for a while, and we will have another implementation, such as TracerDelegateRootsWorkFactory (if TracerDelegate were the right name, but I highly doubt). Then we can remove ProcessEdgesWorkRootsWorkFactory when the refactoring finishes.

ProcessEdgesWorkTracerContext is further confusing because there is the similarly named ProcessEdgesWorkTracer and the ObjectTracerContext and then to understand that whole system, you have to look at all three implementations and keep it in your working memory.

Maybe we can rename it to ProcessEdgesWorkObjectTracerContext just to make the naming consistent.

In general, I think we are eager to abstract away things into traits when they may not really be necessary. For example, the aforementioned ProcessEdgesWorkTracer and ProcessEdgesWorkTracerContext are the only users of the ObjectTracer and ObjectTracerContext traits. But like I don't understand why it had to be made into a separate trait for example.

They are made into traits because we don't want to expose their implementations to the users. And a principle of defining a general API is depending on interfaces (traits) instead of depending on implementations. And as I said before, there may be multiple implementations when we do the refactoring. So using traits can hide all those details from the VM binding.

@k-sareen
Copy link
Collaborator

Maybe we can rename it to ProcessEdgesWorkObjectTracerContext just to make the naming consistent.

I feel like that is making it worse, personally. I feel like this may need to be discussed with everyone, because I do still believe longer type names are bad since it just adds to the noise.

@wks
Copy link
Collaborator Author

wks commented Feb 26, 2025

... longer type names are bad since it just adds to the noise.

That's intended. I want everyone to know it is bad so we want to remove it every time we see it. It is a constant reminder that the status quo is not ideal and it needs some fundamental changes.

@qinsoon
Copy link
Member

qinsoon commented Feb 26, 2025

I want everyone to know it is bad so we want to remove it every time we see it.

Please don't do this. The better way is to document and write this down clearly. For example, I have no clue that you deliberately use bad names to 'convey' the idea.

@wks
Copy link
Collaborator Author

wks commented Feb 26, 2025

I want everyone to know it is bad so we want to remove it every time we see it.

Please don't do this. The better way is to document and write this down clearly. For example, I have no clue that you deliberately use bad names to 'convey' the idea.

You are right. It's better to mention this intention in the comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants