Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC: FVM Debug Dual Execution #8841

Merged
merged 4 commits into from
Jun 29, 2022
Merged

PoC: FVM Debug Dual Execution #8841

merged 4 commits into from
Jun 29, 2022

Conversation

vyzo
Copy link
Contributor

@vyzo vyzo commented Jun 10, 2022

See filecoin-project/ref-fvm#592

Depends on:

How it works

When LOTUS_FVM_DEBUG=1, dual execution is triggered with actor debugging for side effect. If also LOTUS_FVM_DEBUG_BUNDLE_V8 is also specified, then the bundle is loaded and execution is redirected from the canonical (consensus) actors to the actors in thr bundle during debug execution.

Some niceties to consider

  • Cache at init the debug bundle in an in memory overlay blockstore and reuse it in each debug execution. This will eliminate load iverhead.
  • Add an output syscall to fvm that captures output during debug execution and is noop otherwise.

@vyzo vyzo requested a review from a team as a code owner June 10, 2022 14:12
@vyzo vyzo marked this pull request as draft June 10, 2022 14:14
@vyzo vyzo changed the base branch from master to release/v1.16.0 June 10, 2022 14:14
@vyzo
Copy link
Contributor Author

vyzo commented Jun 10, 2022

CI will have to rerun once the ffi helper publishes the relevant build artifacts (not there yet, have been testing locally with a devnet)

@vyzo vyzo marked this pull request as ready for review June 10, 2022 17:42
Copy link
Contributor

@arajasek arajasek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking this on! As a PoC, I think this looks good. I think using envvars to dictate where to look for potential overrides is good enough.

The biggest thing I'd like to see is some way to automatically make this happen for select state computations (eg. when estimating gas, or when StateReplay is called). That is not directly relevant to this work, but really needs to happen soon.

chain/vm/fvm.go Outdated Show resolved Hide resolved
chain/vm/fvm.go Outdated Show resolved Hide resolved
chain/vm/fvm.go Outdated Show resolved Hide resolved
chain/vm/fvm.go Outdated Show resolved Hide resolved
@jennijuju
Copy link
Member

@arajasek id prefer this PR to be base off master, let me know your thoughts!

@arajasek
Copy link
Contributor

@arajasek id prefer this PR to be base off master, let me know your thoughts!

I'm not sure -- there are users who want to use them, and ideally they would be able to do so in the minimum release.

But it is ultimately nice-to-have, and any change late in the release process is undesirable, so happy to go into master instead.

@vyzo
Copy link
Contributor Author

vyzo commented Jun 22, 2022

I'll leave the rebase up to you, updating the code now; I just have to move the util functions and it's done.

@arajasek arajasek force-pushed the feat/debug-execution branch from 8c0675d to 19c9e87 Compare June 23, 2022 03:58
@arajasek arajasek changed the base branch from release/v1.16.0 to master June 23, 2022 15:39
@arajasek arajasek force-pushed the feat/debug-execution branch 3 times, most recently from 865d5a3 to 330344f Compare June 28, 2022 20:54
@arajasek arajasek force-pushed the feat/debug-execution branch from 330344f to 8e4d42c Compare June 28, 2022 23:17
chain/vm/vmi.go Outdated
return NewFVM(ctx, opts)
}

// Remove after v16 upgrade, this is only to support testing and validation of the FVM
if useFvmForMainnetV15 && opts.NetworkVersion >= network.Version15 {
if os.Getenv("LOTUS_FVM_DEBUG") == "1" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay with this duplication cuz it's getting dropped soon

@arajasek arajasek force-pushed the feat/debug-execution branch from 8e4d42c to a52d584 Compare June 28, 2022 23:24
@codecov
Copy link

codecov bot commented Jun 28, 2022

Codecov Report

Merging #8841 (f9cf25f) into master (cafb110) will decrease coverage by 0.11%.
The diff coverage is 12.33%.

❗ Current head f9cf25f differs from pull request most recent head 906463b. Consider uploading reports for the commit 906463b to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8841      +/-   ##
==========================================
- Coverage   40.73%   40.62%   -0.12%     
==========================================
  Files         705      705              
  Lines       78574    78692     +118     
==========================================
- Hits        32009    31965      -44     
- Misses      41101    41239     +138     
- Partials     5464     5488      +24     
Impacted Files Coverage Δ
chain/actors/manifest.go 82.35% <0.00%> (-9.46%) ⬇️
chain/gen/genesis/genesis.go 46.29% <0.00%> (ø)
chain/vm/vmi.go 40.00% <0.00%> (-26.67%) ⬇️
chain/vm/fvm.go 29.61% <7.08%> (-14.93%) ⬇️
chain/consensus/filcns/upgrades.go 33.68% <66.66%> (-0.07%) ⬇️
chain/stmgr/utils.go 25.78% <66.66%> (+0.24%) ⬆️
storage/wdpost/wdpost_sched.go 75.49% <0.00%> (-5.89%) ⬇️
chain/consensus/filcns/weight.go 70.58% <0.00%> (-5.89%) ⬇️
storage/pipeline/currentdealinfo.go 71.42% <0.00%> (-4.77%) ⬇️
chain/events/events_called.go 83.90% <0.00%> (-4.40%) ⬇️
... and 21 more

chain/consensus/filcns/upgrades.go Outdated Show resolved Hide resolved
itests/lite_migration_test.go Outdated Show resolved Hide resolved
chain/vm/vmi.go Outdated Show resolved Hide resolved
chain/actors/manifest.go Outdated Show resolved Hide resolved
chain/vm/fvm.go Outdated Show resolved Hide resolved
chain/vm/fvm.go Outdated Show resolved Hide resolved
chain/vm/fvm.go Outdated Show resolved Hide resolved
Comment on lines +570 to +580
go func() {
defer wg.Done()
ret, err = vm.main.ApplyMessage(ctx, cmsg)
}()

go func() {
defer wg.Done()
if _, err := vm.debug.ApplyMessage(ctx, cmsg); err != nil {
log.Errorf("debug execution failed: %w", err)
}
}()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... this doesn't work as the debug messages will have different gas fees. That means:

  1. The balances will be different.
  2. Some messages may just fail.

This will lead to really annoying and hard to debug issues. We can merge this as a WIP_JUST_TESTING patch, but we need to make that clear.

The right way to do this would be to:

  1. Apply in debug mode (disabling gas accounting?). Balances will be correct because we charge for gas by charging for the gas limit up-front, then refunding any leftovers. So the balance will be correct for the duration of the message execution, just not after the message is done executing.
  2. Revert.
  3. Apply normally.
  4. Go back to 1 for the next message.

But:

  1. We can't currently perform that revert.
  2. That is even more likely to cause problems...

I think the answer here is to leave this as a super experimental feature, but be very clear that this is just for debugging and absolutely not to be relied on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll throw on a disclaimer.

Honestly, the biggest use I envision getting out of this is actually intentionally failing messages, and using the actor error string to convey debug information (eg. adding actor_error!("i reached this if branch because the miner power is non-zero"); to the miner_actor).

chain/consensus/filcns/upgrades.go Outdated Show resolved Hide resolved
Comment on lines +1496 to 1497
var newManifestData manifest.ManifestData
if err := store.Get(ctx, newManifest.Data, &newManifestData); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause us to load the ManifestData twice, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even use this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, solely to check the number of entries, which I can't do from newManifest directly because there's no way (yet) to count the number of entries in a Manifest...

chain/stmgr/utils.go Show resolved Hide resolved
@arajasek arajasek merged commit 709fe5c into master Jun 29, 2022
@arajasek arajasek deleted the feat/debug-execution branch June 29, 2022 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants