-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Axioms that good definitions of ObjectReference must satisfy #1044
Comments
One thing that concerns me is that most of the 'axioms' are deduced from the current code. As the code may not be correct, the conclusions drawn from the code may not be correct. We should define the semantics of I will discuss them one by one.
Did you mean
We do not require
These are all based on the current implementation. The discussion already mentioned alternatives without comparing equality. I think the actual question we would need to answer is that whether we allow different object references to refer to the same object. |
Yes. They are deduced from the current code, for good reasons. MMTk implements the GC, and MMTk needs a concept to express the idea that satisfies the axioms I listed. For example, when copying an object, the object will have a from-space copy and a to-space copy, and they are different, when seen from the GC's point of view. Currently, we say that they have two different
It should be able to instantiate
OK. "It must be possible to ..." should be a more accurate expression.
Yes. That's the concern. That is, whether two ObjectReference instances that have different bit-by-bit representation may be considered equal. Another important thing is whether we consider the from-space copy and the to-space copy as equal, as I discussed before. (I also updated the original post and added one axiom for that.) |
With issue #1170 addressed and PR #1195 merged, we now define
ObjectReference
as a non-zero word-aligned address within an object, and we agree with this definition at least for the current implementation of mmtk-core. This issue summarizes discussions about the definition ofObjectReference
before that change so that we can come back and find our previous discussions if we discuss this topic again.TL;DR: From time to time, we discuss the possibility to change the current definition of
ObjectReference
. In theory, it should be opaque to mmtk-core. But I have discovered that not all definitions are good. This issue summarizes what a good definition should be, enumerate some popular definitions and discuss whether they are good.Related links:
Not all definitions of ObjectReference are good
There is an argument that
ObjectReference
may be opaque to mmtk-core. However, some definitions won't work at all, and others won't work efficiently. I am trying to list statements that needs to be true for all good definitions of ObjectReference. Any definition that satisfies all of those statements should work.It must be able to instantiate ObjectReference from Address
Copying GCs will copy objects, and create a new
ObjectReference
for the to-space copy.Linear scanning will identify objects at addresses (using the global VO bit or a local bitmap), and generate
ObjectReference
for those addresses.Conservative stack scanning scans the stack for addresses with VO bit set, then we convert the Address to ObjectReference verbatim.
Handles don't satisfy this statement. Handles are implemented with indirection tables, and creating a handle implies adding a new entry in the indirection table. This simply costs too much, and probably needs synchronization. After forwarding, we will need to delete old handles from indirection tables because they point to from-space copies. Moreover, handles are often local to mutator threads. A more rational definition will be that we define the content in an indirection table entry (which is an address) as an
ObjectReference
. In this way, if an object is moved, mmtk-core will use the new address as the newObjectReference
, and "forwarding a reference in a slot" will become "updating the address in the indirection table entry for the handle in the slot".It must be efficient to get the start address of the object from the ObjectReference.
Given an
ObjectReference
, it must be able to get the start address of an object (i.e. whateveralloc
returns). Its current API isObjectModel::ref_to_object_start()
.It must be efficient to get a unique address inside the object from the ObjectReference.
Given an
ObjectReference
, it must be able to get an address that is guaranteed to be inside the object, and this address needs to bt unique for the same object. Its current API isObjectModel::ref_to_address()
.That address is used for:
ObjectReference
points to an object in a given spaceIn all case, the address is guaranteed to be in the same space where the object is allocated.
It must be efficient to do equality test for ObjectReference.
Currently we do equality test between
ObjectReference
values in a few places:trace_object
, we test if the object has been forwarded by comparingnew_object == object
.trace_object
so that it returnsOption<ObjectReference>
so that we know if an object is forwarded without equality tests.HashSet
(Bug. See: Proper implementation of the treadmill algorithm #517)ReferenceProcessor
where it usesHashSet
to de-duplicateObjectReference
instances.ObjectReference
to implementEq
.HashSet
to record visited objects.We may refactor them to make
Eq
unnecessary, but it will be counterintuitive if we can't compareObjectReference
for equality.It must be hashable
As mentioned above, we sometimes put
ObjectReference
inside hash sets.When copied, the from-space copy and the to-space copy are considered different objects.
This means when copying an object, the original
ObjectReference
refers to the from-space copy of the object, and the to-space copy of the object will have a differentObjectReference
, and they must not compare equal. The process of "forwarding a reference in a slot" means replace the oldObjectReference
in the slot with the newObjectReference
so that it now points to the to-space copy.At the language level, "a reference to an object" does not change even if the GC moves the object. In other words, the high-level language is oblivious of object movement as a result of GC (unless object pinning is performed which allows the user to reveal the address of an object safely). The high-level language is also oblivious of duplicated copies of objects in concurrent copying GCs, such as Shenandoah, ZGC and Sapphire. That's why the VM (or the GC?) must implement a kind of equality operator that compares them as equal at the language level during concurrent copying, when the object has two copies simultaneously. This means language-level identities (such as unique IDs of language-level objects) are not good definitions of
ObjectReference
.Other statements that should be true
ObjectReference doesn't have to be the content of slots.
An object field (slot) can hold a handle, a fat pointer, an interior pointer, a tagged pointer, etc.
ObjectReference
as the address in the indirection table entry.(pointer, offset)
), we can defineObjectReference
as the pointer part of the fat pointer.ObjectReference
as the highest address that (1) is not higher than the interior pointer, and (2) VO bit is set at that address.ObjectReference
as the address without the tag bits.In all cases, we can update the slot if an object is forwarded.
Examples of valid definitions
Starting address
Obviously. OpenJDK uses starting addresses of objects as
ObjectReference
.Address at an offset from the object start.
JikesRVM does this.
Potential definitions
Tagged union of pointer and non-pointer value
Ruby does this. If a Ruby
VALUE
points to an object, its last three bits are all 0. The pointer will not have any tag bits. Other values (true
,false
,nil
, small integers, etc.) are not references to objects. So we can simply defineObjectReference
as "starting address" (or "address at an offset" if we add additional data in the front).Tagged pointer without type info
V8 does this. The last bit is
1
if a slot holds a reference. The second last bit is 0 if it is a strong reference, and 1 if it is a weak reference. We may defineObjectReference
as the address without the tag bits. MMTk won't be aware of those bits, and the binding is still able to update fields for forwarding.We may define "the address with tag bits" (i.e. the slot content) as
ObjectReference
. MMTk will be able to generate address, but always with0b01
as the last two bits. It is trivial to get the starting address and an in-object address by removing the tags. However, the VM binding will need to implement theEq
and theHash
trait manually and ignore the tag bits. This may not be the most efficient way to do it.Tagged pointer or fat pointer with embedded type info
Some VMs may embed type information inside the pointer, or fat pointer. I heard JRocket did this, but never saw its implementation. This probably will not work because MMTk will have a hard time getting the type info when constructing an
ObjectReference
from anAddress
. It's not completely impossible, but it will need to load the type information from the object body, which may be inefficient. As I mentioned above, for such VMs, we can define the address part of the tagged pointer or fat pointer asObjectReference
.Interior pointer
Probably not a good idea because every time it needs to get the object start or the unique "in-object address", it needs to scan the VO bit bitmap backwards. We may introduce an
InteriorPointer
type in mmtk-core, but as I mentioned above, it is not necessary.The text was updated successfully, but these errors were encountered: