Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(actions)!: add full sync test #3582

Merged
merged 29 commits into from
Mar 2, 2022
Merged

feat(actions)!: add full sync test #3582

merged 29 commits into from
Mar 2, 2022

Conversation

gustavovalverde
Copy link
Member

@gustavovalverde gustavovalverde commented Feb 18, 2022

Motivation

Most of the motivation for this PR is available at #1592

Specifications

  • We need to run a full sync test on every PR before merging to main
  • Preferably this test should run at least once before starting the merge process
  • Multiple tests must be done to accomplish the best balance between performance and cost in GCP

Designs

  • Use a custom entrypoint.sh to overcome some limitations from GCP docker parser

Solution

  • Execute the full sync test using an environment variable
  • WIP

Review

Reviewer Checklist

  • Code implements Specs and Designs
  • Tests for Expected Behaviour
  • Tests for Errors

Follow Up Work

@teor2345
Copy link
Contributor

We got to 87%, then the job timed out at 6 hours:

Feb 22 02:03:48.658 INFO net="Main": zebrad::commands::start: estimated progress to chain tip sync_percent=87.748 % current_height=Height(1382111) remaining_sync_blocks=192980 time_since_last_state_block=PT0S

https://github.com/ZcashFoundation/zebra/runs/5279457879?check_suite_focus=true#step:7:8774

@gustavovalverde
Copy link
Member Author

Yeah, after 50% it started to get slow. I'll try a bigger machine to see if there's an impact there. Any other recommendations (including changes in the configuration file) are welcome @teor2345

@gustavovalverde
Copy link
Member Author

gustavovalverde commented Feb 22, 2022

Still running here: https://cloudlogging.app.goo.gl/RxUMH5twUULrijba9

@teor2345
Copy link
Contributor

teor2345 commented Feb 22, 2022

Yeah, after 50% it started to get slow. I'll try a bigger machine to see if there's an impact there. Any other recommendations (including changes in the configuration file) are welcome @teor2345

I'd suggest:

@teor2345
Copy link
Contributor

I can do the checkpoint lists now, they're the quick part of that ticket.

@gustavovalverde
Copy link
Member Author

Could it be something related to the amount of data (blocks) we're getting to be processed? At the beginning the network and disks had a big spike, and ended a bit after the sync call (just were the red rectangle ends), which makes me thing the disk shouldn't be the ones halting the process, as they can handled the spikes nicely.

image

I'll increase the machine type anyways.

@teor2345
Copy link
Contributor

I updated the checkpoint lists in:

If we use that PR and checkpoint_sync = true, the sync should be a lot faster.

@teor2345
Copy link
Contributor

Could it be something related to the amount of data (blocks) we're getting to be processed?

It could be a concurrency issue. We can limit the number of blocks in the queue by fixing:

But let's try the checkpoints first, they should speed things up a lot.

@teor2345
Copy link
Contributor

teor2345 commented Feb 28, 2022

I'm currently running this branch with:

PR #3662 didn't change any code, but it was half an hour faster than the original. This could have happened due to the switch from Rust 1.58 to 1.59.

If this performance is consistent, we might not need to make any more performance improvements right now.

Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this PR is ready to merge yet, because it disables a lot of existing tests.

@teor2345 teor2345 dismissed their stale review February 28, 2022 22:21

We'll remove the blocking changes

@gustavovalverde gustavovalverde changed the title add(actions): full sync test feat(actions): full sync test Mar 1, 2022
@gustavovalverde gustavovalverde changed the title feat(actions): full sync test feat(actions): add full sync test Mar 1, 2022
@gustavovalverde gustavovalverde changed the title feat(actions): add full sync test feat(actions)!: add full sync test Mar 1, 2022
Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're all done here, thanks for your patience working through all the sync issues.

@gustavovalverde gustavovalverde requested a review from a team as a code owner March 2, 2022 12:18
mergify bot added a commit that referenced this pull request Mar 2, 2022
mergify bot added a commit that referenced this pull request Mar 2, 2022
@mergify mergify bot merged commit db966f2 into main Mar 2, 2022
@mergify mergify bot deleted the add-full-sync-test branch March 2, 2022 14:15
upbqdn pushed a commit that referenced this pull request Mar 3, 2022
* feat(actions)!: add full sync test (#3582)

* add(tests): full sync test

* fix(test): add build

* fix(deploy): escape double dashes '--' correctly

* fix(test): remove unexpected --no-capture arg

error: Found argument '--nocapture' which wasn't expected, or isn't valid in this context

* refactor(docker): use default executable as entrypoint

* refactor(startup): add a custom entrypoint

* fix(test): add missing TEST_FULL_SYNC variable

* test(timeout): use the biggest machine

* fix

* fix(deploy): use latest successful image

* typo

* refactor(docker): generate config file at startup

* revert(build): changes were made to docker

* fix(docker): send variables correctly to the entrypoint

* test different conf file approach

* fix(env): add RUN_TEST env variable

* ref: use previous approach

* fix(color): use environment variable

* fix(resources): use our normal machine size

* fix(ci): double CPU and RAM for full sync test

* fix(test): check for zebrad test output in the correct order

The mempool is only activated once, so we must check for that log first.
After mempool activation, the stop regex is logged at least once.
(It might be logged before as well, but we can't rely on that.)

When checking that the mempool didn't activate,
wait for the `zebrad` command to exit,
then check the entire log.

* fix(ci): run full sync test with full compiler optimisations

* fix(tests): reintroduce tests and run full sync on approval

* fix(tests): reduce the changelog

Co-authored-by: teor <teor@riseup.net>

* fix(ci): update CI job path triggers (#3692)

* ci(test): re-run tests when snapshot data changes

* fix(ci): rebuild state when disk format changes

* fix(ci): rebuild rust docs when code or dependencies change

* doc(ci): explain why we run jobs when files change

Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>

* fix(build): use the right multistage target (#3700)

* fix(review): only assign one reviewer to general Rust reviews (#3708)

If we assign two teams, GitHub assigns two reviewers.

* fix(ci): change the color-eyre ignore to a tracing-subscriber ignore

* fix(ci): ignore duplicate darling dependencies

* doc(ci): remove an alternative resolution doc

Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>
teor2345 added a commit that referenced this pull request Mar 7, 2022
* Upgrade some dependencies

* Upgrade some dependencies

* Upgrade dependencies for zebrad

* Upgrade tracing dependencies

* Revert `tor` & `arti`

* Upgrade `criterion` & `pin-project` in `deny.toml`

* Remove some dependencies from `skip-tree` in `deny.toml`

* Revert some the versions of dependencies because of duplicates

* Revert proptest regressions

* Upgrade dependencies, then ignore some more duplicates (#3716)

* feat(actions)!: add full sync test (#3582)

* add(tests): full sync test

* fix(test): add build

* fix(deploy): escape double dashes '--' correctly

* fix(test): remove unexpected --no-capture arg

error: Found argument '--nocapture' which wasn't expected, or isn't valid in this context

* refactor(docker): use default executable as entrypoint

* refactor(startup): add a custom entrypoint

* fix(test): add missing TEST_FULL_SYNC variable

* test(timeout): use the biggest machine

* fix

* fix(deploy): use latest successful image

* typo

* refactor(docker): generate config file at startup

* revert(build): changes were made to docker

* fix(docker): send variables correctly to the entrypoint

* test different conf file approach

* fix(env): add RUN_TEST env variable

* ref: use previous approach

* fix(color): use environment variable

* fix(resources): use our normal machine size

* fix(ci): double CPU and RAM for full sync test

* fix(test): check for zebrad test output in the correct order

The mempool is only activated once, so we must check for that log first.
After mempool activation, the stop regex is logged at least once.
(It might be logged before as well, but we can't rely on that.)

When checking that the mempool didn't activate,
wait for the `zebrad` command to exit,
then check the entire log.

* fix(ci): run full sync test with full compiler optimisations

* fix(tests): reintroduce tests and run full sync on approval

* fix(tests): reduce the changelog

Co-authored-by: teor <teor@riseup.net>

* fix(ci): update CI job path triggers (#3692)

* ci(test): re-run tests when snapshot data changes

* fix(ci): rebuild state when disk format changes

* fix(ci): rebuild rust docs when code or dependencies change

* doc(ci): explain why we run jobs when files change

Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>

* fix(build): use the right multistage target (#3700)

* fix(review): only assign one reviewer to general Rust reviews (#3708)

If we assign two teams, GitHub assigns two reviewers.

* fix(ci): change the color-eyre ignore to a tracing-subscriber ignore

* fix(ci): ignore duplicate darling dependencies

* doc(ci): remove an alternative resolution doc

Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>

Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants