After actix upgrade, panic are not always properly propagated, and tests that `should_panic` fail. #3901

bowenwang1996 · 2021-02-03T20:27:31Z

After #3869, chunks_recovered_from_full_timeout_too_short fails, which is likely related to the block timeout we set and the speed up from actix version upgrade.

The text was updated successfully, but these errors were encountered:

SkidanovAlex · 2021-02-05T21:06:49Z

@frol after looking into it, I observe the following:

#3869 updates all the tests except for the ones in chunk_management.rs to use System::buider() instead of System::run(). I applied the same change, in the following way:

fn chunks_produced_and_distributed_common(
    validator_groups: u64,
    drop_from_1_to_4: bool,
    block_timeout: u64,
) {
    init_test_logger();
    System::builder().stop_on_panic(true).run(move || {

It is consistent with how you updated other tests.

With this, when I run the test chunks_recovered_from_full_timeout_too_short (that is should_panic), I can see in the output that it indeed panics

thread 'chunks_recovered_from_full_timeout_too_short' panicked at 'assertion failed: `(left == right)`
  left: `2`,
 right: `3`', chain/client/tests/chunks_management.rs:106:21
stack backtrace:

This is expected behavior. However, the test doesn't stop and continues running. The panic doesn't propagate, and doesn't cause the test to fail.

bowenwang1996 · 2021-02-05T21:17:50Z

@matklad do you happen to know what may have caused this?

SkidanovAlex · 2021-02-05T21:21:45Z

This is not a new problem, and we had infrastructure to handle it, but it was (inadvertently?) removed in the linked PR:

https://github.com/near/nearcore/pull/3869/files#diff-6e234c46051e0a6a189a87fdf9c34622301856c8df033bb8d3eac07688f71904L217

frol · 2021-02-06T08:23:11Z

@SkidanovAlex the old way of setting up init_stop_on_panic was not working with the new version of actix, so I replaced it with what I assumed would work and which seemed to work (since CI passed even for should_panic tests). I wanted to investigate the issue with this specific test (#3869 (comment)), but after a brief chat with @bowenwang1996 I was under impression that the test itself is broken, and as he mentioned, the plan was to address this test in a separate issue.

@bowenwang1996 would you like me to take a look into it?

SkidanovAlex · 2021-02-06T18:16:33Z

@frol the panics are not propagated from child processes. E.g. if you do a [should_panic] test that just does assert!(false), it will panic (and pass), but if you add assert!(false) at the beginning of on_block_accepted in client.rs and run any test, the test will not stop on panic (they are still likely to fail because the presence of the assert is likely to break their logic, but the panic itself will not cause them to fail).

frol · 2021-02-07T18:26:04Z

@SkidanovAlex Thank you for digging into it and providing very specific repro examples. I totally understand the issue. My only question is whether @bowenwang1996 still wants to take care of the issue or should I finish it up.

matklad · 2021-02-08T09:12:21Z

@bowenwang1996 no, I don't. I do have a general observation that propagating panics across synchronization points in rust requires some explicit code.

One bit of Rust knowledge that might help debug this: Rust panic is a two-stage process. First, the panic hook is invoked at the stack frame where the panic occurs. Then the stack is unwound.

The assert message and stack trace are printed in the first phase, not after the stack is unwond. So it indeed might be the case that you see assertion failure, but the panic is swallowed.

bowenwang1996 · 2021-02-08T15:21:29Z

My only question is whether @bowenwang1996 still wants to take care of the issue or should I finish it up

I was mistaken as to why the test was broken, so @frol please take a look

Resolves #3901.

bowenwang1996 assigned SkidanovAlex Feb 3, 2021

bowenwang1996 added the A-testing Area: Unit testing / integration testing label Feb 3, 2021

SkidanovAlex changed the title ~~Investigate chunks_recovered_from_full_timeout_too_short failure~~ After actix upgrade, tests cannot panic and always pass. Unless they are should_painc, in which case they always fail Feb 5, 2021

SkidanovAlex assigned frol and unassigned SkidanovAlex Feb 5, 2021

SkidanovAlex changed the title ~~After actix upgrade, tests cannot panic and always pass. Unless they are should_painc, in which case they always fail~~ After actix upgrade, panic are not always properly propagated, and tests that should_panic fail. Feb 5, 2021

frol mentioned this issue Feb 18, 2021

refactor(jsonrpc): Add structured errors to query method in ViewClient and RPC #3944

Merged

frol mentioned this issue Feb 25, 2021

fix: Fix tests unable to fail #4013

Merged

near-bulldozer bot closed this as completed in #4013 Mar 1, 2021

near-bulldozer bot pushed a commit that referenced this issue Mar 1, 2021

fix: Fix tests unable to fail (#4013)

69eddab

Resolves #3901.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After actix upgrade, panic are not always properly propagated, and tests that `should_panic` fail. #3901

After actix upgrade, panic are not always properly propagated, and tests that `should_panic` fail. #3901

bowenwang1996 commented Feb 3, 2021

SkidanovAlex commented Feb 5, 2021 •

edited

Loading

bowenwang1996 commented Feb 5, 2021

SkidanovAlex commented Feb 5, 2021

frol commented Feb 6, 2021

SkidanovAlex commented Feb 6, 2021 •

edited

Loading

frol commented Feb 7, 2021

matklad commented Feb 8, 2021

bowenwang1996 commented Feb 8, 2021

After actix upgrade, panic are not always properly propagated, and tests that should_panic fail. #3901

After actix upgrade, panic are not always properly propagated, and tests that should_panic fail. #3901

Comments

bowenwang1996 commented Feb 3, 2021

SkidanovAlex commented Feb 5, 2021 • edited Loading

bowenwang1996 commented Feb 5, 2021

SkidanovAlex commented Feb 5, 2021

frol commented Feb 6, 2021

SkidanovAlex commented Feb 6, 2021 • edited Loading

frol commented Feb 7, 2021

matklad commented Feb 8, 2021

bowenwang1996 commented Feb 8, 2021

After actix upgrade, panic are not always properly propagated, and tests that `should_panic` fail. #3901

After actix upgrade, panic are not always properly propagated, and tests that `should_panic` fail. #3901

SkidanovAlex commented Feb 5, 2021 •

edited

Loading

SkidanovAlex commented Feb 6, 2021 •

edited

Loading