Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

grandpa: extend protocol for session key registration and usage #7398

Open
andresilva opened this issue Oct 26, 2020 · 3 comments
Open

grandpa: extend protocol for session key registration and usage #7398

andresilva opened this issue Oct 26, 2020 · 3 comments
Labels
J0-enhancement An additional feature request. U3-nice_to_have Issue is worth doing eventually. Z3-substantial Can be fixed by an experienced coder with a working knowledge of the codebase.

Comments

@andresilva
Copy link
Contributor

andresilva commented Oct 26, 2020

In order to avoid "benign" equivocations that are caused by operational errors (e.g. restoring an old database while losing the grandpa voter state could lead the authority to vote twice for the same round) we should introduce a more robust protocol for key registration and usage, thus making sure that session keys aren't reused in the same context.

A protocol is suggested here https://hackmd.io/@rgbPIkIdTwSICPuAq67Jbw/BkCOQ8CvP but should still undergo further formalization.

@andresilva andresilva added J0-enhancement An additional feature request. U3-nice_to_have Issue is worth doing eventually. Z3-substantial Can be fixed by an experienced coder with a working knowledge of the codebase. labels Oct 26, 2020
@burdges
Copy link

burdges commented Nov 17, 2020

We think nodes could determine when they're fully synced using this, which benefits the relative time or and approval assignments subprotocols. As discussed in w3f/polkadot-spec#168 we'll need to expand upon the above document for this use case:

Initially, a validator Vlad starts up and begins syncing the relay chain. At start, Vlad loads its session secret keys and their certificates signed by the node's controller key. I presume the controller already registered this controller certificate bundle on-chain, but if not then tell me. We'll punt doing any runtime updates to a controller certificate bundle for another year.

We've now two back/full cert modes, automatic counter mode and manual --force-back-cert=[counter] mode. In both, Vlad takes no action unless they know secret keys for the controller certificate bundle registered on-chain, meaning they wait but also maybe they resume waiting. In automatic mode, Vlads waits longer until they observe "mostly sensible timestamps", and then determines the old counter from their back/full cert on chain, or sets counter=0 if we've no back/full cert registered on-chain.

In both modes, Vlad creates a fresh tag and a fresh grandpa back/full cert containing tag, counter=counter+1, a timestamp, its controller public key, and its controller certificates bundle hash. Vlad signs this back/full cert with all their grandpa keys, so Ed25519, ECDSA secp256k1, and BLS, and most others too, so BABE sr25519, Sassafras JubJub Ring-VRF, etc. Vlad gossips the signed back/full cert to other validators.

Any relay chain block producer should include a fresh back/full cert only if the tag changes, the counter increases, the timestamp is somewhat sensible, and if the controller certificates bundle hash matches that registered under its controller public key.

We could miss-judge the chain sync slowing in automatic mode, due to the "mostly sensible timestamps" heuristic. We thus reissue back/full certs with larger counter values as those come in, except we reissue with exponential back off. We permanently halt reissues with counter increases if any back/full cert with our tag ever gets finalized.

At some point, Vlad witnesses live grandpa votes finalizing its back/full cert, so then Vlad knows it lags behind the chain head by less the grandpa finality time, and less than the time since it issued that back/full cert.

We address how various subsytems handle this information in automatic and manual mode:

  • Approval assignments always waits until GRANDPA finalizes our back/full cert.
  • GRANDPA, BABE/Sassafras and relative time both similarly wait in automatic mode. In fact, GRANDPA waits until three blocks after GRANDPA finalizes our back/full cert. Yet, both begin immediately using their system clock in manual mode. Actually manual mode exists primarily to save the chain from locked states when too many validators drop out, etc.
  • Anything slashable for equivocations like GRANDPA and BABE/Sassafras never signs any block after which our tag changes, which then avoids node operators being slashed for equivocations.

We should certify long-term transport keys from the controller, but they should implicitly provide their own back/full cert at the transport layer when opening connections, so they need not participate in this on-chain system really. I've no idea how much this exists but we can ask @tomaka

We two niggling questions remaining:

  • We could make tag be a public key for a fresh sr25519 or ed25519 key that never leaves the node. Is this useful somewhere?

@burdges
Copy link

burdges commented Dec 10, 2020

I think this superseds paritytech/polkadot-sdk#93

@Polkadot-Forum
Copy link

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/ux-of-distributing-multiple-binaries-take-2/2854/2

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
J0-enhancement An additional feature request. U3-nice_to_have Issue is worth doing eventually. Z3-substantial Can be fixed by an experienced coder with a working knowledge of the codebase.
Projects
None yet
Development

No branches or pull requests

3 participants