Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: error when actor panics directly #3697

Merged
merged 1 commit into from
Oct 6, 2020
Merged

Conversation

alanshaw
Copy link
Member

@alanshaw alanshaw commented Sep 9, 2020

Attempting to resolve filecoin-project/test-vectors#87, quoting @anorth:

If an actor implementation panics directly (rather than calling Abortf) then the evaluation is undefined. There is no exit code corresponding to this. The result should not go on chain. A panic (which could also come from some actor dependency) may indicate a transient state or error that cannot be replicated by other nodes and thus cannot form part of consensus. E.g. an out-of-memory.

I believe that returning a "fatal" ActorError instead of SysErrSenderInvalid(1) prevents the message from being recorded on chain since it's caught in ApplyMessage here.

@whyrusleeping
Copy link
Member

@alanshaw @anorth we explicitly are treating nothing as a fatal error (as far as I know). I'd rather have a miner create an invalid block than have the chain halt because everyone set block X as their head and evaluating it causes a fault.

@Kubuxu
Copy link
Contributor

Kubuxu commented Sep 9, 2020

Yeah, this should not be a fatal error but one of the Sys* errors that signifies actors doing something really bad.

@alanshaw
Copy link
Member Author

alanshaw commented Sep 10, 2020

Okay, well, just so you know, there are a bunch of places in the runtime where Fatalf is being used (18 in total) which we might want to check up on 😅:

Screenshot 2020-09-10 at 09 15 07

Just to reiterate, the case where this kind of error happens should be extremely rare and this change would cause the message to not make it into a block in the first place. If I understand correctly and we allow this kind of message into a block, then another node applying this message is likely to not reach the same state conclusion anyway.

I'm super new to the code base so I'm happy to take your advice on this. If you're still set on returning a Sys* code then we could use 4 - its unused, and we could call it...SysErrActorMalfunction?

@alanshaw alanshaw force-pushed the fix/actor-panic-fatal branch from 7519bb7 to 4d9f8c3 Compare September 11, 2020 11:51
alanshaw pushed a commit to filecoin-project/test-vectors that referenced this pull request Sep 11, 2020
This PR adds the `apply_message_failures` post condition to the schema. There's currently only 1 test that uses this `no-exit-code` but it currently does not error. If you want to see the new field in JSON output then you'll need to checkout the branch from filecoin-project/lotus#3697 and add a replace to go.mod `replace github.com/filecoin-project/lotus => ../lotus`.

This also updates to lotus 0.7 and specs-actors 0.9.8.

resolves #134
@alanshaw alanshaw mentioned this pull request Sep 16, 2020
21 tasks
@anorth
Copy link
Member

anorth commented Oct 1, 2020

I'd rather have a miner create an invalid block than have the chain halt because everyone set block X as their head and evaluating it causes a fault.

Ok, can we not use SysErrInvalidSender at least? Use one of the reserved exit codes, and we can rename it later.

@arajasek arajasek force-pushed the fix/actor-panic-fatal branch from 4d9f8c3 to 7b55625 Compare October 6, 2020 09:35
@alanshaw alanshaw requested a review from raulk as a code owner October 6, 2020 09:35
@arajasek arajasek changed the base branch from master to asr/spec-v1 October 6, 2020 09:35
Base automatically changed from asr/spec-v1 to next October 6, 2020 21:34
@arajasek arajasek merged commit dfaabb4 into next Oct 6, 2020
@arajasek arajasek deleted the fix/actor-panic-fatal branch October 6, 2020 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact/consensus Impact: Consensus
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test that a panic in actor code results in undefined execution
6 participants