Skip to content
This repository has been archived by the owner on Jul 5, 2024. It is now read-only.

proposal: set EnableMemory to false when tracing #502

Closed
lispc opened this issue May 9, 2022 · 21 comments
Closed

proposal: set EnableMemory to false when tracing #502

lispc opened this issue May 9, 2022 · 21 comments

Comments

@lispc
Copy link
Collaborator

lispc commented May 9, 2022

and "reconstruct" memory in rust side by re-interprete memory related opcodes like MSTORE/MLOAD/MSTORE8/COPYCODETOMEMORY/...

The reason for this is that "EnableMemory: true" costs huge amount of geth server resource(cpu, mem, network). It becomes unviable to set EnableMemory:true.

More details: scroll-tech/go-ethereum#94 (comment)

@CPerezz
Copy link
Member

CPerezz commented May 10, 2022

We benchmarked most of that stuff and it did not seem really significant compared to the proving times TBH.

Is there anything I'm missing in here?

@ed255
Copy link
Member

ed255 commented May 10, 2022

I have tested the same block mentioned in scroll-tech/go-ethereum#94 (comment) both with EnableMemory: true and EnableMemory: false and can confirm the results. With false it takes ~ 10 seconds; with true after 2m30s the geth process runs out of memory (I tried on a 32 GB memory machine). I wonder how much memory would be needed to succeed, and how long would it take.

The proposal to reconstruct the memory based on the steps that touch it seems correct to me. We'll need to be careful for bugs but it seems doable.

Here are other ideas that come to my mind:

  1. Assuming that there are no massive transactions, instead of query the trace by block, query the trace for each tx (this way geth doesn't need to keep the whole block trace in memory at the same time). The problem with this solution is that a tx may be as big as the block
  2. If we could modify geth, we could change the traces to include memory diffs (within the same call context) instead of the full memory dump per step. And then from those diffs recalculate the full memory in our side. The diff could be as simple as two paremeters:
    • The changed bytes within a memory range. Example: 0x0000-0x0002: 0x0102
    • The new memory length (used to avoid sending zeros on memory expansion)

Another note: if geth requires so much memory to hold the memory dump of each step, probably we'll face the same issue as well, so maybe it makes sense that we don't try to keep the full memory dump of each step. This could be achieved by reconstructing memory at the same time we're iterating the steps to do the gen_associated_ops, and only keep a single memory dump per call context. This reconstruction could come from processing the memory opcodes like @lispc suggest, or if we manage to get memory diffs, by applying the diffs.

@lispc
Copy link
Collaborator Author

lispc commented May 10, 2022

@ed255

"I wonder how much memory would be needed to succeed" I tried a 64GiB server, it also failed.
"query the trace for each tx": If your trace a tx on geth, geth will have to re-run every tx previous to the queried tx in that block. So the computing complexity will change from O(N) to O(N**2). I am afraid this is not a good idea.

@lispc
Copy link
Collaborator Author

lispc commented May 10, 2022

@CPerezz agree that this time is much smaller than proving time. But they are not something that can be compared and summed.

We cannot prove a 30M gas block in 13 seconds, maybe 10m is a better proving time estimation. So for a real zkrollup system, we have to setup tens of ( if not hundreds) proving server.

If geth is optimized enough so it can serve traceBlock "realtime", the final cluster will be: 1 geth server + tens of proving machine.
If we use current geth and trace memory, a geth node respond a "traceBlock" call using minutes and cannot serve multi calls at the same time, so the final cluster will be: tens of geth server + tens of proving machine.

@pinkiebell
Copy link
Contributor

If we could modify geth, we could change the traces to include memory diffs

this can be achieved by using custom js tracers, example.
And once the performance of the js tracer becomes a problem, we can add a custom native go tracer to geth.

@lispc
Copy link
Collaborator Author

lispc commented May 13, 2022

If we could modify geth, we could change the traces to include memory diffs

this can be achieved by using custom js tracers, example. And once the performance of the js tracer becomes a problem, we can add a custom native go tracer to geth.

js tracer will be slow since we need to record stack/storage/gas etc of every step. And for custom go tracer we need to modify geth code, which we should try to avoid unless we really have to cos it brings long term maintenance burden

@lispc
Copy link
Collaborator Author

lispc commented May 13, 2022

Another approach: ethereum/go-ethereum#24873
(I think even such simple deduplication will help a lot)

@CPerezz
Copy link
Member

CPerezz commented May 16, 2022

If we could modify geth, we could change the traces to include memory diffs

this can be achieved by using custom js tracers, example. And once the performance of the js tracer becomes a problem, we can add a custom native go tracer to geth.

I think @pinkiebell has a point here.
While it's true what @lispc mentioned in the geth issue. I think it's much more realistic and fast to go for a js tracer, make sure it works, and then implement a Go one which extracts what we exactly need. As otherwise, we will always be pending on adding this to Geth, which might take time. And might even not be desired (although seems a good idea to me).

Should we add an issue and try to see the performance of a Go tracer first??

@ed255
Copy link
Member

ed255 commented May 16, 2022

I think @pinkiebell has a point here. While it's true what @lispc mentioned in the geth issue. I think it's much more realistic and fast to go for a js tracer, make sure it works, and then implement a Go one which extracts what we exactly need. As otherwise, we will always be pending on adding this to Geth, which might take time. And might even not be desired (although seems a good idea to me).

Should we add an issue and try to see the performance of a Go tracer first??

While I don't have much faith on the performance when using the js tracer to obtain the memory dumps (probably differential memory dumps), I agree that the best approach is to try it and figure out how good it performs. I can explore this and get some results.

Actually we may use the internal tracer with EnableMemory = false and then a custom js tracer just for the memory; or a single js tracer with everything (depending on which one is faster).

@ed255
Copy link
Member

ed255 commented May 17, 2022

While I don't have much faith on the performance when using the js tracer to obtain the memory dumps (probably differential memory dumps), I agree that the best approach is to try it and figure out how good it performs. I can explore this and get some results.

I've done a small exploration on the js tracer and found out that it doesn't expose any method to get the memory length in a step; adding that method is very straight forward, but since it's not there it makes the exploration more difficult. I opened a PR in go-ethereum to add this method but they are planning on replacing the JS engine so my PR is on hold.

@s1na
Copy link

s1na commented May 17, 2022

geth will have to re-run every tx previous to the queried tx in that block. So the computing complexity will change from O(N) to O(N**2). I am afraid this is not a good idea.

This becomes a problem specially when the state is not available on disk or memory and has to be regenerated again for every tx. But if state is available the actual computing overhead shouldnt be too much. You can check that via this script

@lispc
Copy link
Collaborator Author

lispc commented May 20, 2022

I analyzed(traced with mem trace enabled) every tx inside the mentioned block. There are 94 txs in total, 93 of them can be traced successfully(the biggest trace is of ~300MiB size, summing all is about ~900MiB), the failed one(always makes geth OOM) is an Arbitrum Sequencer tx, having calldata of ~100KiB. In the Arbitrum contract, the calldata are copied to memory and hashed. So snapshots of mem for every step will be HUGE..

@ed255
Copy link
Member

ed255 commented May 20, 2022

Now that the js tracer memory.length() method has been merged we can access the memory snapshots nicely from within the js tracer.

I did some tests with a js tracer that returns memory diffs (so that a full memory snapshot can be reconstructed). Here's the tracer source and the script to test it

And here are some quick benchmark results on a 32GB ram machine:

  1. Get a regular trace with EnableMemory = false: 6s
  2. Run the js tracer to get memory diffs: 14s

In both cases the memory consumption is similar, and nearly doesn't grow.

I think these numbers are acceptable for our use case. We could actually make the two calls (1 and 2) and combine the results, or prepare a js tracer that mimics the regular tracer but replaces memory snapshots by diffs. And given these results I think reconstructing the memory from our side will not be needed.

UPDATE: There seems to be a timeout in the test I did with the js tracer, which I need to investigate

@pinkiebell
Copy link
Contributor

@ed255
You can increase the default timeout by adding a timeout property

{
  timeout: '1000s',
  tracer: '......'
  }

@ed255
Copy link
Member

ed255 commented May 20, 2022

@ed255 You can increase the default timeout by adding a timeout property

{
  timeout: '1000s',
  tracer: '......'
  }

Thanks. So I've added a timeout of 60m and ran the diff memory js tracer, and well, the memory consumption is stable, but the tracer has been running for 50 minutes now and hasn't finished yet. This tracer does a linear scan on the memory for each step; I guess that's not very efficient (I just implemented a simple diff algorithm).

I'll continue with a few more experiments, but maybe reconstructing the memory as @lispc suggested may be the best option. It shouldn't bee too complex and it would be optimal.

@lispc
Copy link
Collaborator Author

lispc commented May 23, 2022

@ed255 Will you plan to do this recently? If not, we planned to start working on this issue("reconstructing the memory") probably 2-3 weeks later.

@ed255
Copy link
Member

ed255 commented May 23, 2022

I've done another experiment on the js tracer with a different approach. In this approach I've used this tracer which instead of diffing the memory at each step, it only reads the chunk of memory that changes based on the opcode and its arguments. So it's actually similar to the idea of reconstructing the memory, but in this case the script only reconstructs the position and length in memory that will change, and stores whatever is there, so the result is still a diff of memory for every step.

This tracer when run on block 0xe0f2b8 (the same as previous tests) takes ~10s. I think that's a viable solution! So with this result, I think both the options of using this tracer and reconstructing memory on our side are valid options. I think using the tracer is a bit less clean (because we'd need 2 rpc calls, one where we send js code), but we always get the correct memory contents; whereas reconstructing memory on our side only requires 1 rpc call, but we may have incorrect results due to bugs in the short term (because we'll need to read from the source, which may not be implemented yet for some cases) but in the long term that will not be a problem.

So in summary I think both options are equally viable. And in either case, I think it will make sense to avoid keeping a complete memory snapshot per step, and only keep one "live" memory snapshot per call context, that evolves as we process each step.

@CPerezz
Copy link
Member

CPerezz commented May 24, 2022

@ed255 That's huge news!!!!! Thanks for this exploratory work!!!!

Which of both ways are you more decided for now?

@ed255
Copy link
Member

ed255 commented May 24, 2022

@ed255 That's huge news!!!!! Thanks for this exploratory work!!!!

Which of both ways are you more decided for now?

I really don't have a preference: I think both options are OK and once implemented can be equally clear and easy to understand. So I'd leave it as a decision for the person who implements the solution!

@ed255 Will you plan to do this recently? If not, we planned to start working on this issue("reconstructing the memory") probably 2-3 weeks later.

@lispc I haven't planned working on this in the short term, so feel free to take this task :)

@lispc
Copy link
Collaborator Author

lispc commented May 24, 2022

We have started reconstruct in Rust side. Maybe we can try it two weeks later

@ed255
Copy link
Member

ed255 commented Aug 6, 2022

Resolved via #614

@ed255 ed255 closed this as completed Aug 6, 2022
@ed255 ed255 moved this from 🆕 New to ✅ Done in zkEVM Community Edition Aug 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

5 participants