Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Ability to fork network for testing purpose #12442

Closed
xlc opened this issue Oct 7, 2022 · 10 comments
Closed

Ability to fork network for testing purpose #12442

xlc opened this issue Oct 7, 2022 · 10 comments
Labels
J0-enhancement An additional feature request.

Comments

@xlc
Copy link
Contributor

xlc commented Oct 7, 2022

I would like to fork a relaychain or parachain from latest height to conduct testing. Ethereum have a lot of very good toolchain to allow developer to have an local fork Eth mainnet for testing & experiment purpose.

While we do have tools like https://github.com/maxsam4/fork-off-substrate that allow export the onchain data to create a new chain, it is not the idea use case for testing purpose. It is slow and can take long time to produce a new genesis but the main issue is that the new network height starts from block 0. This makes some block number based code not testable on the fork.

try-runtime is a bit closer to what I want but it doesn't provide the ability to produce new blocks on top of existing network.

We could improve try-runtime to integrate with manual-seal so we can have an instant fork of the mainnet and able to submit transaction to it to do testing. It should also implements RPC methods so we can query onchain state of the fork network.

cc @kianenigma

@bkchr
Copy link
Member

bkchr commented Oct 7, 2022

This would also require to overwrite the signature verify host functions to always return true, so that we could include any kind of extrinsic etc. This isn't very complicated, just wanted to leave this here.

@bkchr bkchr added the J0-enhancement An additional feature request. label Oct 7, 2022
@kianenigma
Copy link
Contributor

https://github.com/polytope-labs/substrate-simnode is a good external tool for now.

Skipping signature checks in try-runtime (or in any case) as Basti said is quite easy. We can also just add a new runtime api for try-runtime that receives an extrinsics, and just applies the Call. Although then you miss out on some other aspects like SignedExtra

@kianenigma
Copy link
Contributor

We could improve try-runtime to integrate with manual-seal so we can have an instant fork of the mainnet and able to submit transaction to it to do testing. It should also implements RPC methods so we can query onchain state of the fork network.

Currently, try-runtime is independent of the client. There is no client. This makes development on it quite easier. I am not sure if we can/should try and add what you are asking now, or first try and integrate try-runtime into the client, and afterwards attempt this.

@bkchr
Copy link
Member

bkchr commented Oct 11, 2022

I also thought about this over the years. I once wanted to create "Polkadot Doppelgaenger" :D Basically a different binary that is having the the special host functions to accept any signature. In general I don't think that we need that much from Substrate to make this work. In my head you would have in your own chain some kind of different binary, that is setting up the service to use manual seal for example. Then you start it with the db from your main net. I probably forget something, but this should work? I'm happy to accept prs that improve the setup of this node. However, in general it is probably more some kind of tutorial that is missing here?

@pepyakin
Copy link
Contributor

I have a gut feeling that it's not enough to substitute only signature verification. I am pretty sure that in the long tail there are cases which would require doing something else from the Substrate Runtime Interface or maybe even won't work at all.

I think it's worth looking into making the runtime aware of that it is being try-runtimed. There are two options:

  1. Runtime. There is a variable that is available for the Runtime to read (probably in a form of a host function). If it is true, then the runtime behaves in another way. There is a tiny possibility (depending on the exact substituted logic) that the code compiled in can be used for exploiting this machinery in production.
  2. Static. The runtime is compiled with a special feature flag that enables certain code paths for mocking.

For both cases, there is no need for the Substrate Runtime Interface cooperation (besides the variable for the runtime case). The runtime can override host functions (and this feature is already used for the swizzling of calls to storage in Cumulus) and change their behavior arbitrarily. Also, perhaps more importantly, it allows adding extra asserts1.


And while we are here and I have your attention, a bit of aside: fuzzing. I hope I don't need to convince you about how efficient is fuzzing. Unfortunately, often it is hard to apply. However, I would argue that runtimes and similar STFs are good targets for fuzzing given you take out/mock the things we are discussing here. So I can't help but think that the very same switches could open up a road for fuzzing of the runtimes. The interaction between those features goes even further: one may want to fuzz from genesis state, but IMO it's also good to fuzz against some recent state.

Footnotes

  1. Writing this I remembered that there is this classic issue https://github.com/paritytech/substrate/issues/2082. If you think about it, you will realize that it's similar to try-runtime & co. So naturally it has some additional motivating examples that I wrote with my contracts hat on.

@bkchr
Copy link
Member

bkchr commented Oct 11, 2022

I have a gut feeling that it's not enough to substitute only signature verification. I am pretty sure that in the long tail there are cases which would require doing something else from the Substrate Runtime Interface or maybe even won't work at all.

For sure, but most of the stuff you are mentioning sounds to me like it could be solved with the "wasm override" feature. Aka you just fork off the main net. Keep this thing running and then use some wasm that is equal to the one on chain, but compiled with some special flags that give you more outputs or has special checks enabled.

And while we are here and I have your attention, a bit of aside: fuzzing. I hope I don't need to convince you about how efficient is fuzzing. Unfortunately, often it is hard to apply. However, I would argue that runtimes and similar STFs are good targets for fuzzing given you take out/mock the things we are discussing here. So I can't help but think that the very same switches could open up a road for fuzzing of the runtimes. The interaction between those features goes even further: one may want to fuzz from genesis state, but IMO it's also good to fuzz against some recent state.

For fuzzing we had recently this issue: paritytech/polkadot-sdk#251 I think in general fuzzing is a great idea and given the predefined interface of a runtime, aka the Call enum it should be really easy to write some fuzzer for it. We would need such a "learning fuzzer" that generates input based on the code paths taken. Having this running against something like the Polkadot runtime would be really nice. We could test all the possible inputs etc and check that we don't panic etc. Combined with other ideas of @kianenigma et.al. about having these extra invariant checks in the runtime, this could be really powerful. An extension to this could be to use the real runtime state, while I don't really think that it will bring that much, it could still be a nice addition. However, this clearly falls into try-runtime territory for me, aka download some recent state and use it for executing XY.

@pepyakin
Copy link
Contributor

pepyakin commented Oct 11, 2022

For fuzzing we had recently this issue: paritytech/polkadot-sdk#251

Yes, thanks, I am aware. Me and Kian already discussed those things over time. But I agree on the points.

For sure, but most of the stuff you are mentioning sounds to me like it could be solved with the "wasm override" feature. Aka you just fork off the main net. Keep this thing running and then use some wasm that is equal to the one on chain, but compiled with some special flags that give you more outputs or have special checks enabled.

Yes, that's true, but that's not the point. I think it's an important consideration so I will reiterate it here to just be safe that we are on the same page.

I think it might be a good idea to place the responsibility for mocking on the wasm runtime and not on the host functions in Substrate Runtime Interface. That is, for the signature verification case, the wasm would swizzle the calls to the host functions.

In that case, Substrate Runtime Interface does not gain any special magical code paths for mocking. But as soon as we swallow the "runtime knows about try-runtime" pill, we can get additional benefits and cover other adjacent use cases, such as extra assertions, panic-on-warn, fuzzing, etc.

This kind of approach has a downside, testing the "golden master"1 will not work as is. However, I don't think it's a big problem. Ad-hoc conditional compilation (meaning made by the developer) can be done right. We could introduce the "blessed" way of doing things. It can be anything, like a feature flag for FRAME-based runtimes that swizzles host functions and/or a CLI tool that accepts a wasm file as input, statically link it with the object file that provides mocked implementation of the interesting host functions, etc. Those are the implementation details, but the point is it should be good enough.

Footnotes

  1. i.e. the final version of the runtime that is about to get into voting for the upgrade.

@pmikolajczyk41
Copy link
Contributor

Some time ago we (Aleph Zero team) were playing a bit with such idea:

  • we ran a chain for some time (~simulating a real chain like mainnet)
  • we killed the nodes and removed their keystores (~simulating a fork, where we do not have access to the session keys)
  • we ran new nodes (with new keys) with a temporal runtime/node (by the code-substitute mechanism) that was accepting new keys and thus the building process could go forward
  • when the new session came, we gave up the substituted runtime (optionally: by updating to a target one) and thus we were working with a 'forked' chain (optionally: with a new runtime)

While we could have both the whole history (full database, blocks already populated from the initial run) we needed some txs (traffic) to actually be able to test the new runtime. This is where fuzzing / automated traffic simulation could come in. As a side note, which touches somehow paritytech/polkadot-sdk#251 I will just mention, that we have already developed a mini-framework of bots that are capable of performing short scenarios (like approval aggregation for a multisig action, bunch of random transfers or vesting actions). This occurred to be kinda decent way of simulating traffic and covering functionalities.


In my humble opinion (however, feel free to convince me other) extending try-runtime in this direction is not a good idea. For now, all the functionalities offered by try-runtime are working within RemoteExternalities, which is perfect for these usecases. However, running a full fork should be done rather in full, i.e. with real externalities, database, possibly on multiple nodes etc. Mixing two very different environments and making one tool for everything will likely lead to either hard maintenance and very tangled dependencies in code or a bulky tool, with clearly separate sub-functionalities.

@xlc
Copy link
Contributor Author

xlc commented Oct 13, 2022

I spend two days trying to use smoldot & https://github.com/polytope-labs/substrate-simnode to implement this but found neither are suitable. smoldot is not mature enough and it is simply not possible to build a Substrate based node without pulling native runtime, which is something I want to avoid for generic tool.

Maybe build a new test node from scratch will be easier. We can always pull in components from smoldot/substrate such as WasmExecutor to avoid too much duplicated code.

This is my draft design: https://hackmd.io/VWG4_1rkSAaRl5KtZup_Pw

@kianenigma kianenigma moved this from 📕 Backlog to ⌛️ Sometime-soon in (Nominated) Proof of Stake Oct 19, 2022
@xlc
Copy link
Contributor Author

xlc commented Oct 21, 2022

https://github.com/AcalaNetwork/chopsticks is now MVP ready

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
J0-enhancement An additional feature request.
Projects
Status: Done
Development

No branches or pull requests

5 participants