Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support chain lookbacks in test vectors #381

Open
raulk opened this issue Mar 10, 2022 · 10 comments
Open

support chain lookbacks in test vectors #381

raulk opened this issue Mar 10, 2022 · 10 comments

Comments

@raulk
Copy link
Member

raulk commented Mar 10, 2022

This doesn't really belong in this repo, but capturing here to track somewhere. We can move it elsewhere, or break into various issues in various repos for implementation.

Problem

Message-class test vectors don't support ReportConsensusFault messages. Verifying consensus faults relies on the "lookback getter" mechanism in Lotus. The implementation requires an base tipset, and it walks the chain backwards til the desired epoch. It then loads the state tree for that epoch.

If we use that logic as-is at message extraction, we would end up with a very heavy test vector, because in order to get from the head to the desired tipset, we'd have to walk every intermediate tipset.

We also don't have a place in the test vector schema to specify the base tipset. That data is also out of scope for message-class test vectors.

Potential solution

At extraction time

  1. When extracting the test vector, use a special LookbackGetter that records the input and the root CID of the state tree that was returned. Do this in a way similar to how we record Randomness (with a special randomness recorder).
  2. As the state tree is accessed, the blockstore tracer already records all blocks into the local blockstore.
  3. Enhance the test vector schema to store recorded lookback responses, in a way that we store recorded randomness. Potentially in a map epoch => state tree root.
  4. When forming the CAR, make sure to include all blocks that were recorded.

At replay time

  1. When replaying the test vector, use a special LookbackGetter that returns a loaded state tree with the root CID that was found in the vector.
@Stebalien
Copy link
Member

Could we memoize the expected result from the extern? I.e.:

  1. How much gas the operation takes internally.
  2. Whether there was or was not a fault.
  3. Who was at fault.

@raulk
Copy link
Member Author

raulk commented Mar 11, 2022

There's no gain in that:

  1. It would make us lose test coverage of VerifyConsensusFault, which is undesirable.
  2. We still have to modify the test vector schema.

@Stebalien
Copy link
Member

It would make us lose test coverage of VerifyConsensusFault, which is undesirable.

We don't currently have a verify consensus fault implementation inside the FVM.

@raulk
Copy link
Member Author

raulk commented Mar 11, 2022

But we're changing how it's wired on the client side: https://github.com/filecoin-project/lotus/pull/8293/files#diff-9c51c6db9b9f6bea9717d162a7f80db6e0e144032849d83281750e3226d17ccaR45

So I'd rather exercise that code in the fullest form possible than stub it out.

@Stebalien
Copy link
Member

Ah, wait, you're planning on executing these test vectors from lotus? Then yes, I agree. I was focused on test vectors for the FVM itself.

@arajasek
Copy link
Contributor

arajasek commented Mar 11, 2022

I think this proposal is reasonable. Really the answer is "do whatever we do for randomness". Randomness has a much further lookback than this particular chain lookback, so this is a strictly easier problem.

@raulk If I can interest you in a much hackier suggestion to immediately unblock this: Return the vector's base state. This getter we're missing is only used by the VM to determine what a faulted miner's worker key was at the lookback tipset. This lookback is also gas-less (it uses the state manager directly).

That means that unless any of the messages you're running into involve a miner that changed their worker key between the fault and the report, just returning the base state is totally fine. The likelihood of a change happening is also very low because our fault detectors work basically instantaneously (so the "confirm worker key" message would have like 1-2 epochs to land between the fault occurrence and the report).

It's a hack, but it's not worse than what we have right now.

@raulk
Copy link
Member Author

raulk commented Mar 11, 2022

That's an interesting idea, might be good enough for now, but rather dangerous for a system that's intended to verify correctness unequivocally.

Re: similarities to randomness, we do stub those calls with recorded final outputs, but it's a bit different to the verify consensus fault syscall:

  • in the case of beacon randomness, it is entirely externally sourced so there's little logic to test inside the client
  • in the case of chain randomness, even though it is internally sourced (i.e. the Filecoin chain), there's also little lock to test inside the client

I am inclined to record two things inside the vector:

  1. The final output from verify consensus fault (as @Stebalien suggests) -- for testers that do not have a full client at their disposal, e.g. the conformance tests in the FVM
  2. The result from the lookback (the root CID of the state tree) -- for testers that do have a full client at their disposal, e.g. Lotus.

@arajasek
Copy link
Contributor

Re: similarities to randomness, we do stub those calls with recorded final outputs, but it's a bit different to the verify consensus fault syscall:

I'm not sure I understand / agree. They're all fundamentally testing the same aspect of a chain -- "did you walk up the chain correctly and stop at the right point?" This is the case for the lookback tipset, chain randomness, and beacon randomness (which, within a Filecoin client, is sourced from the chain too). From my perspective, it's:

  • Beacon randomness: Walk up the chain, stop at the right point, load the block headers, and draw the right beacon out of them
  • Chain randomness: Walk up the chain, stop at the right point, load the block headers, and draw the right ticket out of them
  • Lookback state: Walk up the chain, stop at the right point, load the block headers, and return the right Parent State Root

And all of this is gas-less, so they seem completely analogous to me.

@raulk
Copy link
Member Author

raulk commented Mar 11, 2022

Re: similarities to randomness, we do stub those calls with recorded final outputs, but it's a bit different to the verify consensus fault syscall:

I was comparing to stubbing the verify consensus fault syscall (what Steb was suggesting). By recording and playing that call, we lose coverage of the logic of that call if the tester supports it.

@raulk
Copy link
Member Author

raulk commented Mar 11, 2022

@arajasek The workaround worked. Let's rely on that for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants