Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul DUT Database Structure #40

Merged
merged 8 commits into from
Dec 29, 2019
Merged

Overhaul DUT Database Structure #40

merged 8 commits into from
Dec 29, 2019

Conversation

ginty
Copy link
Member

@ginty ginty commented Dec 27, 2019

This update overhauls how the DUT data and state are stored within Rust.
Originally it was stored in a hierarchical tree structure that closely followed the logical structure of the device (an OO approach, similar to the model architecture from O1) and this changes it to be more like a traditional database where each object type is stored in a dedicated collection, analogous to a database table.

The driver for this change was that it was becoming increasingly convoluted to get an object that was deeply nested within the DUT. For example, something like this was required to get a register object:

// The caller would have to supply the names of all objects in the hierarchy, something like:
// get_reg("dut.core0.adc0", "my_memory_map", "my_address_block", "my_reg");
let dut = DUT.lock().unwrap();
let model = dut.get_model(&self.model_path)?;
let map = model.memory_maps.get(&self.memory_map).unwrap();
let ab = map.address_blocks.get(&self.address_block).unwrap();
let reg = ab.registers.get(&self.reg).unwrap();

Whereas, now a reg can be extracted directly if you have it's ID:

// Simply called via something like:
// get_reg(5);
let dut = DUT.lock().unwrap();
let reg = dut.get_reg(self.reg_id)?;

Another problem was that it became clear that it was going to be really difficult to create collections of objects with different parentage, for example to create a bit collection containing bits from multiple registers. With this update such things become trivial - a bit collection is simply a vector of bit IDs and that can be easily resolved to bit objects by fetching them from the database table.

The crux of this change is contained in https://github.com/Origen-SDK/o2/blob/ids/rust/origen/src/core/dut.rs

The DUT now contains collections (vectors) for each primary object type.
Note that nesting can still be employed within each object type, however this should be reserved for cases where there is tight coupling and it doesn't make much sense to ever hold the child object without the parent. e.g. for regs I plan to have the data about the named fields stored as a child of the reg instead of in a dedicated collection.

Each collection is simply a regular Rust vector and the ID of an object is its position within the vector.

The vectors are intentionally not made public to ensure that we have an abstraction in case we need to tweak the internal structure in future and to guard against un-intentional corruption of the database (e.g. by deleting an object which would screw up all IDs). Instead, access to the database is via a common set of methods implemented at the DUT top-level - each type has a create_<type>, get_<type> and get_mut_<type> methods.

I took a look at this id_arena create as an alternative approach but I couldn't really see the point in it. It has the same constraints as the vector approach - you can't delete records and each collection can contain only one type.
The major problem with it though is that there is no way to clear/empty it, which would be a problem for us since we use a global/static DUT and need to clear out the database when the target is changed. With the vector approach that is trivial by keeping the same vectors but simply calling clear() on them to get rid of the previous DUT's data.
@pderouen, note that in the id_arena documentation it shows an example of building up an AST structure, however you could equally use a std vector as I have done here.

@coreyeng, I'm afraid that one consequence of this change (having already asked you to change name references to id), is that I have had to go back to using name for the main string identifier for objects. id is used to refer to the numeric ID. Sorry.

Objects may or may not contain their own ID, but that can be easily added if there is a need for it. Objects can easily store references to other objects through their ID - e.g. a register contains a vector of bit IDs.
I think that in general this approach is going to nicely sidestep all of the borrow checker pain that would come from storing direct references to objects in Rust.

Avoiding Deadlocks on DUT

In the course of making this update I came across an issue whereby Rust can deadlock if a higher level method locks the DUT and then a called lower-level method tries to do the same.

The solution for this is that we will need to be strict about only allowing top-level functions (i.e. those invoked from Python) to lock the DUT and these will then pass the dut reference down to lower level functions that need it.

Here is an example of how to implement a function that takes a dut reference - https://github.com/Origen-SDK/o2/blob/ids/rust/origen/src/core/model/mod.rs#L64

@coreyeng
Copy link
Member

So the IDs are just a counter then? With the DUT at 0? Could the path be used as the ID instead of just a counter? The path should always be unique. Is looks like its still floating around as a more-readable hierarchical view.

Also, if you're storing IDs like this would a HashMap be more efficient? Especially for larger projects?

Have you run into any issues with borrowing from the structure whilst modifying it? I've run into that with HashMaps, but I think the same exists with vectors? I guess I haven't tried it. But, would something like moving or copying a register from one ID to another cause the borrow checker to complain (since its being borrowed as immutable to read, then mutable again to edit the other). In this example you'd just make a new one and replace it afterwards to work around the borrower checker, but if things get deeply nested it might not be as easy (or maybe it still is?).

I really like this change though. It should make moving around the DUT much easier and still not have to deal lifecycles or unsafe mechanisms. Thanks!

My pin stuff I think is still pretty self-contained so it should be easy to move over to this.

@ginty
Copy link
Member Author

ginty commented Dec 28, 2019

Hi @coreyeng,

So the IDs are just a counter then? With the DUT at 0? Could the path be used as the ID instead of just a counter? The path should always be unique. Is looks like its still floating around as a more-readable hierarchical view.

Yeah, though note that you seem to be thinking about the model collection here when really we need an approach that works with the other collections too.
I meant to mention how the top-level model was handled in the notes, so thanks for bringing that up. It had to be handled slightly differently than all other models because it is unique in not having a parent_id. So I decided to just handle it manually and have it such that the model with ID 0 is always the top-level.

The model path could be used as a unique ID unless we ran into problems if we ever needed to instantiate models which are not part of the main hierarchy. However, other objects like registers are more problematic. Their name is not guaranteed to be unique, so we would need to concatenate it with the parent's path or something, and then we are getting into the complication of manually guaranteeing uniqueness without any obvious benefit over this automatically-guaranteed-to-be-unique system.

One advantage of using a integers as keys vs. strings are that you don't really need to worry about dealing with shared references (though to be fair, shared refs to strings are handled pretty easily). Most of the time I've just used usize as the key arg rather than &usize since they automatically copy due to them living on the stack and copying from there being such a cheap operation.

Also, if you're storing IDs like this would a HashMap be more efficient? Especially for larger projects?

Not sure why it would be more efficient?
I did think of using a HashMap, but I thought that my_vec[20] would be quicker than my_hash["my_id"] or even my_hash[20] since at the very least the step of hashing the key is absent.
The current arch of accessing the data via an API does mean that we could change the backend to a HashMap easily enough if it was more performant though.

One potential performance issue with the vector is that the vector needs to be stored in contiguous memory, and so there can be a penalty if it outgrows is current memory allocation and then needs to be copied to a bigger location in order to grow. However, the vector API allows you to pre-reserve space to prevent this, so if this becomes an issue in practice we could pre-reserve 20,000 reg slots (for example), or even make that under app control and pull it in from the app config.

Have you run into any issues with borrowing from the structure whilst modifying it? I've run into that with HashMaps, but I think the same exists with vectors? I guess I haven't tried it. But, would something like moving or copying a register from one ID to another cause the borrow checker to complain (since its being borrowed as immutable to read, then mutable again to edit the other). In this example you'd just make a new one and replace it afterwards to work around the borrower checker, but if things get deeply nested it might not be as easy (or maybe it still is?).

Haven't really run into problems, but then again assigning to a new ID is not something I envisaged at all.
I don't think its really any different to the old system in this respect. Basically if you just want to read it use DUT.lock().unwrap().get_<type>(my_id) and if you want a mutable reference for edit do the same but use get_mut_<type>(my_id).
Then just modify it and discard the reference when you are done, there is no concept of saving back to the database and/or changing to a new ID after modification.
As an aside, I have used in-line scopes a few times recently to get around borrow issues and that works well:

// Within some function, open a scope to do something
{
  let my_reg = DUT.lock().unwrap().get_mut_reg(my_id);
  my_reg.some_attribute = some_val;
  // Now close the scope which will drop the mutable reference used above
}
// Continue with function

I really like this change though. It should make moving around the DUT much easier and still not have to deal lifecycles or unsafe mechanisms. Thanks!

My pin stuff I think is still pretty self-contained so it should be easy to move over to this.

Excellent!

@info-rchitect
Copy link
Member

@ginty I preface this question with the fact I have not reviewed the Rust code. If two registers have the same name but are stored under different sub-blocks or address maps, how does the DUT know how to find the correct ID?

@ginty
Copy link
Member Author

ginty commented Dec 28, 2019

Hi @info-rchitect, actually fetching a register by ID is a low-level concept and user APIs would not know anything about the ID.
The address map which contains the register of interest will contain a HashMap that matches reg names to IDs. From a user perspective, a register would always be fetched by something like my_address_map.reg("blah").
Then internally it will call a method on the address map to get the ID of the given register, then fetch it from the table.

@info-rchitect
Copy link
Member

Hi @info-rchitect, actually fetching a register by ID is a low-level concept and user APIs would not know anything about the ID.
The address map which contains the register of interest will contain a HashMap that matches reg names to IDs. From a user perspective, a register would always be fetched by something like my_address_map.reg("blah").
Then internally it will call a method on the address map to get the ID of the given register, then fetch it from the table.

OK that makes sense, I was thinking that map had to be made somewhere.

@ginty
Copy link
Member Author

ginty commented Dec 28, 2019

OK that makes sense, I was thinking that map had to be made somewhere.

Yeah, the relationship info is all still in there. The difference is that previously the relationship was inherent by the data being stored hierarchically, whereas now the data is stored flat but IDs are embedded within the objects to define the relationships.

@coreyeng
Copy link
Member

For pretty small projects, a Vector vs. HashMap is probably moot. But I'm not sure how the Vector search could optimize past a just an exhaustive one without further interaction from us. We could implement the comparison methods and see if get will use a binary search on the IDs, but I don't think it'll just do that unless I'm missing something. If we get to where we need 20,000 registers, even with the hashing overhead the HashMap should win by pretty big margin. An exhaustive search each time you want a register will add up a lot. Though, like you said, its an implementation detail and we can always revisit it if need be.

The model path could be used as a unique ID unless we ran into problems if we ever needed to instantiate models which are not part of the main hierarchy. However, other objects like registers are more problematic. Their name is not guaranteed to be unique, so we would need to concatenate it with the parent's path or something, and then we are getting into the complication of manually guaranteeing uniqueness without any obvious benefit over this automatically-guaranteed-to-be-unique system.

This is fair and since this is hidden from the user anyway it'd only be useful for debugging.

Anyway, looks good to me! If you want to merge this I'll pull it into #21 and give it a go. Thanks!

@ginty
Copy link
Member Author

ginty commented Dec 29, 2019

@coreyeng, there's no dynamic search aspect to looking up an item from a vector via its index.
The size of the object type contained by the vector is known at compile-time, so when looking up an item Rust will simply do vector_start_address + (index * size_per_item).
Therefore the lookup time is constant and does not increase with the number of items in the vector.

@ginty ginty merged commit 978a1de into master Dec 29, 2019
@ginty ginty deleted the ids branch December 29, 2019 19:22
@coreyeng coreyeng mentioned this pull request Dec 31, 2019
@coreyeng coreyeng mentioned this pull request Jan 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants