Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: modular testutil.Network #18145

Closed
tac0turtle opened this issue Oct 17, 2023 · 4 comments · Fixed by #18389
Closed

[Feature]: modular testutil.Network #18145

tac0turtle opened this issue Oct 17, 2023 · 4 comments · Fixed by #18389

Comments

@tac0turtle
Copy link
Member

Summary

Testutil.Network is being used to spin up n number of validators using CometBFT. The Cosmos SDK and its users use this package in order to test their applications.

We have seen more testing frameworks come up for different needs, levels of testing and for different consensus

  • starship
  • cometMock
  • Rollkit

We should swap testutil.Network to an interface that can be used to against many different environments.

Problem Definition

Streamline testing with different environments without needing to rewrite tests

Proposed Feature

Modify testutil.Network into an interface that testing framework authors can implement so that e2e tests can be run against many different environments.

@alexanderbez
Copy link
Contributor

Specifically, I'd like to see us use more practical environments to test RCs and regression test against prior releases. Most likely Starship will help us a ton here.

@Anmol1696
Copy link
Contributor

@alexanderbez that is something we are working on. Here are some more details.
If there is access to a development kubernetes cluster, (which we could hook up with gh-actions) then we should be able to create a large scale multi-node environment, and depending on resource consumption (which we can optimize for), should be able to run regression tests parallely as well.

I am working towards full monitoring support (so we could also generate reports), and also perform benchmarking and performance reports as well, and be able to compare releases in a much more realistic and automated fashion.

The one thing i am not super sure about, is what exactly to be running in the regression tests. This is where reusing the testutil.Network and being able to run e2e tests (even a subset) against multiple backends (including Starship) will help alot.

Some of limitations of an e2e system that runs everything as a blackbox, is lack of access directly to keepers and app internals. This makes it slightly more tricky around testing weird edge cases, but it would really make sense for regression tests. If we can abstract out all the special calls to the network in the testutil.Network interface, then we could implment the handlers for Starship, to be able to reuse e2e tests for regression testing as well. This could even include some admin functions (which we could run partially with cometmock).

@tac0turtle tac0turtle moved this from 👀 To Do to 📚 In review in Cosmos-SDK Nov 7, 2023
@github-project-automation github-project-automation bot moved this from 📚 In review to 🥳 Done in Cosmos-SDK Nov 8, 2023
elias-orijtech added a commit to elias-orijtech/cosmos-sdk that referenced this issue Nov 10, 2023
This change implements a replacement for the current simulator based
on testutil/network. Most of the changes are porting the module specific
message generators to no longer rely on SimulationState, and to generate
"real" messages, not simulator messages. The simulator driver is in
simapp, as part of the IntegationTestSuite.

The new approach aims to improve simulation in two important ways:

- Simulation should more closely mimic a real network. The current
simulator message delivery is implemented parallel to non-simulator
message delivery, leading to loss of fidelity and higher maintenance.
One symptom is cosmos#13843.
- Simulation should be layered on top of modules, not part of modules.
This means that modules should not import simulation packages, nor refer
to its generator package (x/module/simulation). This should eventually
fix cosmos#7622.

There are also downsides, however. Where the current simulator is too
high level, testutil/network is too low level: it runs a real network
of validators which is difficult to control. For example:

- AppHashes differ between runs, because modules may depend on non-
deterministic state such as block header timestamps.
- The validators runs in separate goroutines, which makes it hard to
query app state without introducing race conditions.
- Blocks are produced according tot time, and not under control by the
test driver. This makes it hard to trigger processing of messages in
particular blocks, which ruins determinism.

Some of the issues may be worked around, for example by forcing the
block headers to be deterministic; however, the real fix is to make
testutil/network itself deterministic, providing the goldilock level
of simulation: close enough to a real network, yet deterministic enough
to generate the same chain state for a given random seed.

A deterministic testutil/network is part of cosmos#18145.

Future work includes:

- Porting of the remaining module message generators.
- Generating (and verifying) deterministic AppHashes, allowing reliable
replay when a problematic message is detected. Depends on cosmos#18145.
- Save/reload of state for faster debugging cycles.
- Removal of the old simulator, most importantly the reference to it from
 module code.

 Updates cosmos#14753 (Simulator rewrite epic)
 Updates cosmos#7622 (reducing imports from modules to simulator)
 Updates cosmos#13843 (using real message delivery for simulation)
@tac0turtle tac0turtle removed this from Cosmos-SDK Nov 14, 2023
@robert-zaremba
Copy link
Collaborator

I see that task has been closed. What is the decision and the future direction?

@tac0turtle
Copy link
Member Author

we merged the modularity and starship is working on integrating the interface

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants