Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major refactoring of address ranges (formerly PMAs) #312

Open
wants to merge 12 commits into
base: refactor/uarch-state-accesses
Choose a base branch
from

Conversation

diegonehab
Copy link
Contributor

@diegonehab diegonehab commented Mar 5, 2025

This is a major restructuring of the way we deal with occoupied physical address ranges in the machine.

In the past, a pma_entry object used static polymorphism to hold a pma_memory, pma_device, or pma_empty object.
There was a peek function pointer that could read from the address range.
pma_memory owned an actual chunk of host memory, either mmapped or calloced, correspnding to the address range.
pma_device had a driver pointer and a context pointer.
The driver structure had function pointers what received the context pointer to read from and write to the device.
VirtIO was divided further into a virtual device interface that held more context information.
The whole thing was all backward and counter-intuitive.
It relied too much on PMA (Physical Memory Attributes) from RISC-V.

We now have an honest-to-god class hierarchy that uses dynamic polymorphism instead of a context and function pointers.
The PMA are now exclusively used to define how the address range will be visible from inside the machine.
The rest of the interface uses normal words rather than obscure PMA flag names.
Here is the hierarchy:

address_range
	pristine_address_range
		plic_address_range
		htif_address_range
		clint_address_range
		virtio_address_range
			virtio_console_address_range
			virtio_p9fs_address_range
			virtio_net_address_range
				virtio_net_user_address_range
				virtio_net_tuntap_address_range
	memory_address_range
	shadow_state_address_range
	shadow_uarch_state_address_range
	shadow_tlb_state_address_range

address_range is the base class.
It defines a description, a start, a length, and the pma_flags.
address_range has a virtual interface for memory-like ranges that is active when the virtual is_memory() return true.
These include methods such as get_host_memory(), mark_dirty_page() etc.
Similarly, address_range has a virtual interface for device-like ranges that is active when is_device() returns true.
This includes read_device() and write_device().
Finally, it has a virtual peek() method that exposes the content that goes into the Merkle tree.
There are no pure-virtual methods: these interfaces by default simply gracefully fail/are ignored when used.
This means there are no exceptions when they are used.

Address_range has a contains_relative() and a contains_absolute() to tell if a subrange is inside it, whether it is defined in terms of absolute physical addresses or addresses relative to the start of the address range itself.

pm-type-name was renamed to poor-type-name.

All device address ranges derive from pristine_address_range and override is_device() to return true.
They appears as zeroed out from the point of view of peek().
The state of a device is either irrelevant to the Merkle tree (e.g. VirtIO is not reproducible) or its state is kept somewhere else (e.g., in a shadow address range).
They do not implement get_host_memory() or mark_dirty_page(), but do implement read_device() and write_device().

Memory_address_range models a range that behaves like real memory and is_memory() returns true for them.
It allocates and owns host memory, either by mmap or calloc.
Get_host_memory() and mark_dirty_page() work as expected, but read_device() and write_device() fail.
Peek() returns the contents stored in the associated host memory.
The ranges for RAM, uarch RAM, the DTB, flash drives, PMAS are all memory_address_range objects.

The three shadow address ranges only implement a special peek method.
Shadow state, shadow uarch state, and shadow TLB do not store the data for the range in host memory.
These get reconstructed on demand when peeked.
They are invisible from the inside of the machine.
These shadow address ranges are not memory and neither are they devices.
Both is_memory() and is_device() return false for them.

We used to have a shadow PMAs address range that was a device.
It is now a simple memory address range.
Much like the DTB address range, it gets filled from the configuration during initialization.

All the factory code was removed.
We now have one .cpp and one .h file for each of these different address range classes.

PMAs are now simply the description of what an address range is from the inside of the machine.
That is, the pair of uint64_t values, istart and ilength.
Istart is divided into the actual start of the range, and the flags (with fields M, IO, R, W, X, IR, IW, DID).
Notably, there is no E field anymore.
And empty PMA is a PMA with length == 0.
The bit was reallocated to the DID, so there are now 32 possible values for the field.
This is now spelled out as Driver ID, since it is also used for things that are not devices.
DID = 0 is now reserved for empty PMAs.
The memory DID is now 1.
This means a completely zeroed out PMA istart/ilength pair maps perfectly to the empty PMA.

PMAs stuff was moved to pmas.h, pmas-defines.h, and pmas-constants.h.
The constants for the start and length of all address ranges has been moved to address-range-defines.h and address-range-constants.h.
They are prefixed AR_ instead of PMA_
This has been changed everywhere, including the Lua bind and the assembly tests, unfortunately.

Memory_range_descr has been renamed address_range_description and the file renamed accordingly.
Machine::get_memory_ranges() was renamed to machine::get_address_ranges().
This is because some of these address ranges are for devices.
The old name was misleading.

The machine now has a m_ars vector of unique_ptr to address_range objects.
This is the vector that is searched by the new machine::find_address_range() method (replacing the old machine::find_pma_entry()).
It also has a m_s.pmas vector, holding indices into m_ars.
This is the one that the new find_pma() function in find-pma.h (replacing the old find_pma_entry() in find-pma-entry.h) goes over.
It is also the one that is reflected in the new pmas memory address range (old shadow pmas).
It also has a m_merkle_ars vector, holding indices into m_ars.
This is sorted by start, and is the list that is used by the Merkle tree.
Finally, it has a m_virtio_ars vector, holding indices into m_ars for the VirtIO address ranges.
There is no longer any need for an empty sentinel in any of these vectors.
An empty address range PMA is /still/ needed to mark the end of the list of PMAs in the PMAs memory address range.

The method i_state_access::read_pma_entry() is now simply i_state_access::read_pma(), and it always returns an address_range indexed indirectly via m_s.pmas then m_ars.
State access methods do not need to declare a pma_entry type as before.
The new machine::read_pma() method implements this, so state-access classes can simply call it.

The new mock_address_range class is now simply an std::variant with all address range classes that can appear during replays.

Renamed i-virtual-machine.h to i-machine.h, jsonrpc-virtual-machine.h to jsonrpc-machine.h, virtual-machine.h to local-machine.h, and clua-i-virtual-machine.cpp to clua-i-machine.cpp.
Having the name "virtual" in the class/file names was confusing.

There is now a single machine::store_address_range() method that does the right thing depending on whether it is a device or memory address range.

Reimplemented machine::read_memory(), machine::fill_memory(), machine::write_memory() in a much simpler way.

---- TODO

We should reduce the log2_tree_size() to 56 (or whatever is the address space of RISC-V)

We should change the top tree part to be Patricia.

We should add a message to every use of assert as cond && "message"

We now have PAGE_ this or that defined in riscv/merkle-tree/PMA. We should use a single definition.

There is now read_memory, which works even outside of memory address ranges.
This is a pity.
read_address_range returns the actual address_range object, not the contents thereof
Maybe add contents to the name of these functions? Maybe this is overkill.

We should use std::format instead of all the other ways we create formatted strings for example when we want to throw an exception.
But we need to wait for gcc-13 to be available in Debian and we can't use it on the uarch.

@diegonehab diegonehab changed the base branch from main to refactor/uarch-state-accesses March 5, 2025 14:29
@diegonehab diegonehab force-pushed the refactor/pma branch 3 times, most recently from d597e5a to b4b8330 Compare March 5, 2025 20:26
diegonehab and others added 8 commits March 5, 2025 20:29
PMA is only used for the two-word description of address ranges in memory
All other constants use AR (for address range)
Also simplified machine::read_memory()
This increases the space for DID
Checking for empty address range simply checks its length.
Shadow address ranges are neither device nor memory.
This is because they cannot be read/written with read_device/write_device.
They do not implement get_host_memory() either.
i-virtual-machine        -> i-machine
virtual-machine          -> local-machine
jsonrpc-virtual-machine  -> jsonrpc-machine
clua-i-virtual-machine   -> clua-i-machine
poor-type-name is more legible
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

2 participants