Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After actix upgrade, panic are not always properly propagated, and tests that should_panic fail. #3901

Closed
bowenwang1996 opened this issue Feb 3, 2021 · 8 comments · Fixed by #4013
Assignees
Labels
A-testing Area: Unit testing / integration testing

Comments

@bowenwang1996
Copy link
Collaborator

After #3869, chunks_recovered_from_full_timeout_too_short fails, which is likely related to the block timeout we set and the speed up from actix version upgrade.

@bowenwang1996 bowenwang1996 added the A-testing Area: Unit testing / integration testing label Feb 3, 2021
@SkidanovAlex SkidanovAlex changed the title Investigate chunks_recovered_from_full_timeout_too_short failure After actix upgrade, tests cannot panic and always pass. Unless they are should_painc, in which case they always fail Feb 5, 2021
@SkidanovAlex SkidanovAlex assigned frol and unassigned SkidanovAlex Feb 5, 2021
@SkidanovAlex
Copy link
Collaborator

SkidanovAlex commented Feb 5, 2021

@frol after looking into it, I observe the following:

#3869 updates all the tests except for the ones in chunk_management.rs to use System::buider() instead of System::run(). I applied the same change, in the following way:

fn chunks_produced_and_distributed_common(
    validator_groups: u64,
    drop_from_1_to_4: bool,
    block_timeout: u64,
) {
    init_test_logger();
    System::builder().stop_on_panic(true).run(move || {

It is consistent with how you updated other tests.

With this, when I run the test chunks_recovered_from_full_timeout_too_short (that is should_panic), I can see in the output that it indeed panics

thread 'chunks_recovered_from_full_timeout_too_short' panicked at 'assertion failed: `(left == right)`
  left: `2`,
 right: `3`', chain/client/tests/chunks_management.rs:106:21
stack backtrace:

This is expected behavior. However, the test doesn't stop and continues running. The panic doesn't propagate, and doesn't cause the test to fail.

@SkidanovAlex SkidanovAlex changed the title After actix upgrade, tests cannot panic and always pass. Unless they are should_painc, in which case they always fail After actix upgrade, panic are not always properly propagated, and tests that should_panic fail. Feb 5, 2021
@bowenwang1996
Copy link
Collaborator Author

@matklad do you happen to know what may have caused this?

@SkidanovAlex
Copy link
Collaborator

This is not a new problem, and we had infrastructure to handle it, but it was (inadvertently?) removed in the linked PR:

https://github.com/near/nearcore/pull/3869/files#diff-6e234c46051e0a6a189a87fdf9c34622301856c8df033bb8d3eac07688f71904L217

@frol
Copy link
Collaborator

frol commented Feb 6, 2021

@SkidanovAlex the old way of setting up init_stop_on_panic was not working with the new version of actix, so I replaced it with what I assumed would work and which seemed to work (since CI passed even for should_panic tests). I wanted to investigate the issue with this specific test (#3869 (comment)), but after a brief chat with @bowenwang1996 I was under impression that the test itself is broken, and as he mentioned, the plan was to address this test in a separate issue.

@bowenwang1996 would you like me to take a look into it?

@SkidanovAlex
Copy link
Collaborator

SkidanovAlex commented Feb 6, 2021

@frol the panics are not propagated from child processes. E.g. if you do a [should_panic] test that just does assert!(false), it will panic (and pass), but if you add assert!(false) at the beginning of on_block_accepted in client.rs and run any test, the test will not stop on panic (they are still likely to fail because the presence of the assert is likely to break their logic, but the panic itself will not cause them to fail).

@frol
Copy link
Collaborator

frol commented Feb 7, 2021

@SkidanovAlex Thank you for digging into it and providing very specific repro examples. I totally understand the issue. My only question is whether @bowenwang1996 still wants to take care of the issue or should I finish it up.

@matklad
Copy link
Contributor

matklad commented Feb 8, 2021

@bowenwang1996 no, I don't. I do have a general observation that propagating panics across synchronization points in rust requires some explicit code.

One bit of Rust knowledge that might help debug this: Rust panic is a two-stage process. First, the panic hook is invoked at the stack frame where the panic occurs. Then the stack is unwound.

The assert message and stack trace are printed in the first phase, not after the stack is unwond. So it indeed might be the case that you see assertion failure, but the panic is swallowed.

@bowenwang1996
Copy link
Collaborator Author

My only question is whether @bowenwang1996 still wants to take care of the issue or should I finish it up

I was mistaken as to why the test was broken, so @frol please take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Area: Unit testing / integration testing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants