-
Notifications
You must be signed in to change notification settings - Fork 644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shared memory between calls #673
Conversation
…per memory allocation
cc @DaniPopes who may also have thoughts |
In general very supportive of this, tbh I would have expected the same or better performance. Will see to play with it a little bit to check it out. Would like to see if we can remove Rc<RefCell<>>, maybe the overhead that we see is related to the dynamic borrow checks inside refcell. Snailtracer should be a good benchmark to check as init of evm is small in comparison of the work that interpreter is doing. |
I'm very happy you like it! After this PR I can try to do a
I've found a way to remove the |
Same as #660 (comment) |
#582 was merged, can you rebase this PR? @lorenzofero |
I can't really perform a rebase because I have conflicts in multiple commits and it's really cumbersome to get out. I'm working on a merge with conflicts resolution 👍 . I hope it's not a problem! It will take some time though, I would like to keep as close as possible how you managed some memory functions |
Hey @rakita @DaniPopes @gakonst, now tests are passing if you want to check it out again. I managed to keep all Dani's updates of memory related functions inside Lastly, is there a new way to run benches now? I've seen in the readme that something has changed but it seems not up to date. Thanks! |
You can run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- cachegrind
- main (8206193): 431,577,511
- Shared memory between calls #673 (c7945eb): 445,967,393
This is great btw, I think we can get perf regression down to neutral or positive |
6947afc
to
0e349b5
Compare
Thanks and thank you for the very detailed feedback you provided; I'll try to resolve all the comments as soon as possible |
572e15b
to
07ece23
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is what's causing tests to fail
Today if I have time I'll take a look at tests failing with fn set_ptr(&mut self, checkpoint: usize) {
if self.data.len() > 0 {
assume!(checkpoint < self.data.len());
self.current_ptr = unsafe { self.data.as_mut_ptr().add(checkpoint) };
}
} |
e94d183
to
23a1191
Compare
Hi @lorenzofero, how is the performance on this? In essence, I expected this to be more impactful but if this is not the case unfortunately there is no good reason to include it. |
Hey @rakita I tried to run again Cachegrin after Dani's first measurements
Which is around -3.5% in performance. After current changes, this is what I get:
which is around -2.7%. @DaniPopes maybe I can try with this now to see if we can squeeze a little bit more
Yeah I get that the more complex the transaction, the better the performance. For simple stuff on average I expect it to be in some way worse, as benches suggest. |
Hey, I hope you don't find all of this comments pedantic. I tried to use a different model for the shared memory similar to ethereum/evmone#481 (comment) that you can see here on my fork: thedevbirb#2. This model has some benefits imo:
Here is the result of running
If you like it, I can bring the changes to this branch or in a new pr |
It is fine, this is a tough decision as you invested a lot of time into this, and the benefits are unfortunately small, I have a few examples of this where the idea didn't pan out as expected (gas block i am looking at you, we had performance boost of 5-7% and the code was merged but it was very hard to use, so after few months it was reverted) so in the end it was more like a research effort to get data.
This seems better, not all transactions are going to be 1024 calls deep or use 30M gas, so this is more reasonable. But with this,
And spilling context (in this case memory) from one Interpreter to another would be fine if we would gain something significant on the other hand having just one place for memory opens things for new ideas. This is probably going to look better if we switch from recursive calls to loop calls (if the loop approach turns out okay). I am on the fence here but let us include it, will review it in detail in the next few days (@DaniPopes already did an amazing job there). |
Ok I brought the changes here for reviewing. However, I was wondering if it was worth it to open a new PR which supersedes this one with this new model, since both commit history and github conversation here are becoming a little messy imo. Let me know what you think about it! |
It is messy but it I fine for it to be in one place, so people can follow what is happening. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should reintroduce the memory limit.
Other parts look good!
($interp:expr, $offset:expr, $len:expr) => { | ||
if let Some(new_size) = | ||
crate::interpreter::next_multiple_of_32($offset.saturating_add($len)) | ||
{ | ||
#[cfg(feature = "memory_limit")] | ||
if new_size > ($interp.memory_limit as usize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we reintroduce the memory limit?
/// Memory checkpoints for each depth | ||
checkpoints: Vec<usize>, | ||
/// How much memory has been used in the current context | ||
current_len: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably put memory limit here, it feels like a better place
/// Get the last memory checkpoint | ||
#[inline(always)] | ||
fn last_checkpoint(&self) -> usize { | ||
*self.checkpoints.last().unwrap_or(&0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*self.checkpoints.last().unwrap_or(&0) | |
self.checkpoints.last().cloned().unwrap_or_default() |
cloned
would remove the reference from the option.
It should be good now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! Amazing work @lorenzofero !
I'm very happy we've found the right approach for this and got some good performance improvements. Thanks a lot @rakita and @DaniPopes for all the support in the last month; it has been a pleasure working with you. |
I tried to give a shot to #445 just for fun and as a challenge with this PR.
This introduces a new struct called
SharedMemory
which is indeed shared between calls.About the implementation
Memory
implementationcurrent_slice
which is internally used to refer to the portion of data reserved for the current context. This requires twounsafe
methodsget_current_slice
andget_current_slice_mut
which deferences the raw pointercurrent_slice
pointer is updated when entering a new context (which is when a newInterpreter
instance is created) using thenew_context_memory
method and when exiting it, which happens when thereturn_value
method ofInterpreter
is calledPerformance
I'd like a feedback on this because:
Anyway, this is the result on running
cargo bench --all
onmain
and then on my branch:I also tried to run cachegrind as explained here #582 and it is indeed slower, of about a 10% or less. Gas limit required is between 2^22 and 2^23. Maybe also here I'm allocating memory which is not used very much. It depends on the bytecode of the snailtracer.
Thanks in advance for any feedback!