ObjectReference should be opaque #686

steveblackburn · 2022-10-18T08:54:58Z

ObjectReference is supposed to be an opaque type defined by the VM.

However, it has not been consistently implemented this way, with some points in the code assuming that it is an address.

This is a case of abstraction leakage. The way the runtime encodes an object reference should be opaque to MMTk. This has been the design intention since the original MMTk. This opacity should not come at any cost due to MMTk's design. In the case of a VM for which an ObjectReference encoded as the address of the start of the object, then the conversion will cost nothing. For other VMs (such as JikesRVM) which have a constant offset, then the conversion will cost an addition. For VMs that use an indirection (ie a handle) the conversion will cost an indirection.

wks · 2022-10-19T04:22:17Z

I grepped all use cases of ObjectReference.to_address() in mmtk-core.

SFT                 src/memory_manager.rs:652:    SFT_MAP.get_checked(object.to_address()).is_in_space(object)
page meta           src/policy/mallocspace/global.rs:92:        let page_addr = conversions::page_align_down(object.to_address());
debug               src/policy/mallocspace/global.rs:314:        let address = object.to_address();
page meta           src/policy/mallocspace/global.rs:427:                let current_page = object.to_address().align_down(BYTES_IN_PAGE);
VO-bit              src/policy/mallocspace/metadata.rs:157:    has_object_alloced_by_malloc(object.to_address())
debug               src/policy/sft.rs:171:            conversions::chunk_align_down(object.to_address()),
VO-bit              src/policy/markcompactspace.rs:380:            alloc_bit::unset_addr_alloc_bit(obj.to_address());
VO-bit              src/policy/markcompactspace.rs:395:                to = new_object.to_address() + copied_size;
debug               src/policy/sft_map.rs:70:        let object_sft = self.get_checked(object.to_address());
fwd meta            src/util/object_forwarding.rs:89:            new_object.to_address().as_usize() | ((FORWARDED as usize) << shift),
fwd meta            src/util/object_forwarding.rs:188:        new_object.to_address().as_usize(),
side meta impl      src/util/metadata/global.rs:51:            MetadataSpec::OnSide(metadata_spec) => metadata_spec.load(object.to_address()),
side meta impl      src/util/metadata/global.rs:76:                metadata_spec.load_atomic(object.to_address(), ordering)
side meta impl      src/util/metadata/global.rs:103:                metadata_spec.store(object.to_address(), val);
side meta impl      src/util/metadata/global.rs:129:                metadata_spec.store_atomic(object.to_address(), val, ordering);
side meta impl      src/util/metadata/global.rs:165:                object.to_address(),
side meta impl      src/util/metadata/global.rs:203:                metadata_spec.fetch_add_atomic(object.to_address(), val, order)
side meta impl      src/util/metadata/global.rs:229:                metadata_spec.fetch_sub_atomic(object.to_address(), val, order)
side meta impl      src/util/metadata/global.rs:255:                metadata_spec.fetch_and_atomic(object.to_address(), val, order)
side meta impl      src/util/metadata/global.rs:281:                metadata_spec.fetch_or_atomic(object.to_address(), val, order)
side meta impl      src/util/metadata/global.rs:313:                metadata_spec.fetch_update_atomic(object.to_address(), set_order, fetch_order, f)
hdr meta impl       src/util/metadata/header_metadata.rs:73:        object.to_address() + self.byte_offset()
debug               src/util/reference_processor.rs:408:            trace!(" ~> {:?} (retained)", referent.to_address());
sanity (deprecated) src/util/sanity/memory_scan.rs:27:                if object.to_address() == unsafe { Address::from_usize(value) } {
VO-bit impl         src/util/alloc_bit.rs:26:    ALLOC_SIDE_METADATA_SPEC.store_atomic::<u8>(object.to_address(), 1, Ordering::SeqCst);
VO-bit impl         src/util/alloc_bit.rs:40:    ALLOC_SIDE_METADATA_SPEC.store_atomic::<u8>(object.to_address(), 0, Ordering::SeqCst);
VO-bit impl         src/util/alloc_bit.rs:49:    ALLOC_SIDE_METADATA_SPEC.store::<u8>(object.to_address(), 0);
VO-bit impl         src/util/alloc_bit.rs:53:    is_alloced_object(object.to_address())
SFT                 src/scheduler/gc_work.rs:533:        let sft = unsafe { crate::mmtk::SFT_MAP.get_unchecked(object.to_address()) };

These can be categorised:

SFT: (2) finding the SFT of an object
page meta: (2) accessing the "active page" metadata (ACTIVE_PAGE_METADATA_SPEC). Specific to MallocSpace, but currently unused.
VO-bit: (3) accessing the VO-bit metadata using address.
VO-bit impl: (4) wrapper functions that access the VO-bit metadata using ObjectRefrence.
fwd meta: (2) accessing the forwarding metadata (LOCAL_FORWARDING_POINTER_SPEC).
hdr meta impl: (1) finding the location of the header bit. Part of the header metadata implementation.
side meta impl: (10) use the "address of ObjectRefrence" to access the side metadata. Part of metadata implementation.
debug: (4) displaying the "address of ObjectReference" in logging or panic! message.
sanity: (1) a deprecated use case in sanity GC.

So MMTk core attempts to get the address of ObjectReference mainly for the purpose of accessing metadata. It also uses the address to find SFT. I think both are easy to adapt to the case where ObjectReference can be anything. But different ObjectReference implementations may have different performance and different opportunities for optimisation. I'll discuss that later.

wks · 2022-10-19T06:30:10Z

How to get the SFT without ObjectReference::to_address()?

We can use the ObjectModel::ref_to_address instead. The contract is, according to the documentation:

Return an address guaranteed to be inside the storage associated with an object.

I think this is enough. The SFTMap maps much coarser regions (chunks, spaces) to SFT instances. As long as no object crosses chunks, it is OK to use any bytes within an object.

We use SFT during tracing, and it is performance critical. I don't think a subtraction matters that much (as long as ObjectModel::ref_to_address is implemented as a simple subtraction without actually loading from the object or any indirection table, and that function can be properly inlined by Rust) because tracing is essentially memory bound, not CPU bound. Even SFT lookup is memory bound. The SFT table lookup should be the real bottleneck.

wks · 2022-10-19T07:09:18Z

For metadata access, I think it depends on the granularity of the metadata, and whether it is header metadata and side metadata.

The finest grain metadata I know is the field logging bitmap. It has one bit per word, or half of a word if OpenJDK uses compressed OOPS. It has nothing to do with ObjectReference itself. It is indexed from field addresses.

Per-object header metadata are already implemented by the VM. Currently we allows the VM to configure its bit_offset and num_of_bits, but the contract of an in-header metadata should be simply something that can perform load/store/atomic-read-modify-write operations on a per-object basis, like this:

trait PerObjectMetadataSpec<T> {  // T is the result type, such as u8, u16, ...
    fn load(objref: ObjectReference) -> T;
    fn store(objref: ObjectReference, new_value: T);
    fn atomic_add(objref: ObjectRefrence, rhs: T) -> T;
    fn atomic_update<F>(objref: ObjectReference, f: F) -> T where F: Fn(T) -> T;
    // more atomic read-modify-write operations go here
}

Because ObjectReference is opaque, it allows the VM to lookup the location of the bits anywhere (not even necessarily in the "header", or in the object). The num_of_bits and bit_offset can both be an implementation detail to the VM.

Per-object side metadata needs to be implemented with MMTk core's aid. Side metadata essentially maps addresses in MMTk spaces to metadata bits, not objects. So the mapping from ObjectReference to side-metadata bits is actually

ObjectReferece ---> address in the object ---> metadata associated to that address

To make it "per-object", there must be a one-to-one correspondence (bijection) between ObjectReference and the "address in an object".

While the result of Object::object_start_ref(object) is sound, it may not be ideal if doing so requires loading from the object itself (for example, JikesRVM with address-based hashing enabled, in which case the offset between the start and the ObjectReference is not a constant).

Object::ref_to_address may be OK, but the contract is not strong enough. The contract (documentation) says

Return an address guaranteed to be inside the storage associated with an object.

But it does not require ObjectModel::ref_to_address(objref) to always return the same address given the same objref if it is not moved. If the function can return a different address at different time (then it is not a function in the mathematical sense), it will not always map to the same side metadata bits.

I suggest we do introduce a concept. I made up the term "canonical object reference address", or "canonical in-object address", or "canonical side metadata data address of an object". Whatever, that address shall satisfy:

Given the same ObjectReference, that address is unique. (so each object always maps to the same object bit)
Different ObjectReference must map to different addresses. (so no two objects are mapped to the same metadata bit)
It must be in the storage region of the object. (to guarantee the metadata address is mapped as long as the object is allocated)
It should be easy to compute from ObjectReference. (for performance, not for correctness)

So we can lookup per-object side metadata like metadata_spec.load(ObjectModel::to_canonical_in_object_address(objref)).

Coarser grain side metadata include page-granular or space-granular metadata.

MallocSpace has an "active pages" metadata, but it is currently unused. I don't know what should be the replacement.

If we consider SFT table as one-word-per-chunk metadata, we can use ObjectModel::ref_to_address like I discussed in the previous post (#686 (comment)).

steveblackburn · 2022-10-25T09:21:44Z

Summarizing comments I've made elsewhere:

We should have two opaque types:

ObjectReference which refers to a VM heap object
InternalReference which refers to a field within a VM heap object

Both of these types should have functions to load a value from the heap and store to the heap:

from_heap() loads a reference from a heap location
to_heap() stores a reference to a heap location

The above functions deal with the likelihood that the in-heap representation of each type differs from the default value representation. For example, these functions can abstract over pointer compression.

Both of these types should have a object_start() function which returns the address of the start of the underlying object.

Additionally, InternalReference should have a field_address() function which returns the address of the field represented by the internal reference.

The implementation of the types (data and function) is left to the VM binding. The core must treat them as opaque.

When enqueuing edges to be traced, these may either be pointers to ObjectReferences or pointers to InternalReferences. We propose to treat these separately (ie not use something like a tagged union for both kinds of pointers to reference).

qinsoon · 2022-10-26T01:16:48Z

Both of these types should have a object_start() function which returns the address of the start of the underlying object.

When you say 'the address of the start of the underlying object', is it the same as the allocation address that we return from our alloc() function? If not, do we need methods to convert between the object start and the allocation address?

qinsoon · 2022-10-26T02:41:59Z

Both of these types should have a object_start() function which returns the address of the start of the underlying object.

When you say 'the address of the start of the underlying object', is it the same as the allocation address that we return from our alloc() function? If not, do we need methods to convert between the object start and the allocation address?

We discussed this. Our alloc() function returns the object start (in mutator time). The VM can put their object header and payload at the object start. However, Steve said that if MMTk copies the object, and the binding wants to increase the object size during copying and store auxiliary data, the object start is not what the copying allocator returns.

I feel this would cause some confusion as well: our alloc() API returns object start. But what our allocator's alloc() returns may or may not be object start (depending on whether it is in mutator phase or GC phase).

With object_start(), MMTk would mark any kind of side metadata for an object reference on their object start, including the valid object bit. Should ObjectReference provide a method that MMTk can get an ObjectReference from an address? For example, when MMTk is doing linear scan, it finds valid object bit on a certain address. Can MMTk give bindings the address, and get an ObjectReference from it?

qinsoon · 2023-06-23T03:21:53Z

We need to be clear about what needs to be done for this issue.

udesou added the P-normal Priority: Normal. label Nov 13, 2023

wks mentioned this issue Dec 8, 2023

Axioms that good definitions of ObjectReference must satisfy #1044

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ObjectReference should be opaque #686

ObjectReference should be opaque #686

steveblackburn commented Oct 18, 2022

wks commented Oct 19, 2022 •

edited

Loading

wks commented Oct 19, 2022

wks commented Oct 19, 2022

steveblackburn commented Oct 25, 2022 •

edited

Loading

qinsoon commented Oct 26, 2022

qinsoon commented Oct 26, 2022

qinsoon commented Jun 23, 2023

ObjectReference should be opaque #686

ObjectReference should be opaque #686

Comments

steveblackburn commented Oct 18, 2022

wks commented Oct 19, 2022 • edited Loading

wks commented Oct 19, 2022

wks commented Oct 19, 2022

steveblackburn commented Oct 25, 2022 • edited Loading

qinsoon commented Oct 26, 2022

qinsoon commented Oct 26, 2022

qinsoon commented Jun 23, 2023

wks commented Oct 19, 2022 •

edited

Loading

steveblackburn commented Oct 25, 2022 •

edited

Loading