-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IF: Update finalizer safety information and rules for how a finalizer signs #2070
Comments
When starting nodeos, if there are no entries for finalizer keys that are configured for nodeos, then nodeos should automatically create entries for them. The values in this new entry should be:
|
written by Areg Instant Finality transition, finalizer safety information, and disaster recoveryIF transitionThe transition to IF has a start block and an end block where the end block is a descendant of the start block. Within any given branch of the blockchain, if an IF transition exists within it, the start block is indicated by being the first block in that branch which has a finality header extension and it is defined as the first block in the branch in which a successful When the end block is processed and irreversibility advances forward to include the start block as an irreversible block, the Note that all of the finality header extensions in blocks up to and including the end block should have a After the conversion, it is possible to receive votes on the start block or blocks descendant from it. But it is likely that the first votes will be received only on the new blocks generated after the conversion occurs, e.g. the block that has the end block as its parent. Finalizer safety informationInitialization on Leap startupWhen Leap starts up, it reads its configuration whether specified through the config.ini or through command line options. This configuration includes the path to the directory that may hold finalizer safety information and any finalizer keys provided to that Leap instance. Leap also initializes an in-memory associative map between each of the provided finalizer keys and their corresponding finalizer safety information (which may not be present). The initialization rules are discussed further below. Leap must make no attempt to read or modify the finalizer safety information if there are no finalizer keys provided. However, if there is at least one finalizer key provided, Leap must attempt to read the finalizer safety information file from its appropriate path on startup. If the file exists, the values read from that file will be used to initialize the in-memory associative map. If the file does not exist, the in-memory associative map initially consists of the provided finalize keys mapping to Regardless of whether the file exists or not, in the case where at least one finalizer key is provided, after the initialization of the in-memory associative map is completed Leap should write out the contents of that associative map to the finalizer safety information file (creating it and the required directories as needed) before continuing with the rest of the startup process, unless it knows that there would be no changes that need to be written out. The initialization of the in-memory associative map uses the values read from the finalizer safety information file to appropriately set the values mapped to by any provided finalizer keys that were represented in the file. Additionally, any finalizer keys in the file that are not provided through Leap's configuration as finalizer keys should also be loaded with their corresponding finalizer safety information values (which should never be Proper initialization of the finalizer safety information to replace the
Finally, before the end of the Leap startup process, the current wall-clock time should be captured and used to determine a startup time lock on voting. The calculation requires adding some small amount of time (maybe 1 second) to the captured wall-clock time, possibly rounding up to the nearest 0.5 second (to be compatible to the block timestamp), and comparing with the head block's timestamp to pick the larger of the two. This determines the time that is saved on startup and remains untouched for the rest of the lifetime of the node process. The startup time lock on voting prevents Leap from using any finalizer key to vote on a block that has a timestamp earlier than that saved startup time regardless of what the finalizer safety information for the finalizer may allow. Additionally, even if voting on a block is allowed, it prevents a strong vote if it would imply a voting time interval containing the startup time. Changes to the finalizer safety informationThere are two ways the finalizer safety information associated with a finalizer key can be modified. The first way, also the typical way, is if that finalizer key is used for a vote (strong or weak). The second way only applies during the IF transition. If the end block is processed and irreversibility advances forward to include the start block as an irreversible block, all of the entries in the in-memory associative map that have a
In the above, Any time the finalizer safety information is modified, the changes should be persisted by writing it out to the finalizer safety information file before continuing. In the case of the first way finalizer safety information changes, the changes should be persisted before the vote is sent out to the network In the case of the second way finalizer safety information changes, the changes should be persisted before continuing on after the end of the conversion process. Finalizer voting during and shortly after transitionA few different scenarios can be explored to verify how the above rules for setting up and changing the finalizer safety information enable finalizer voting during and after the IF transition. In all of the scenarios below, let A finalizer setups a new Leap instance prior to the start of the IF transitionThe finalizer safety information for the finalizer would not be initialized. Instead initialization would be delayed until the IF transition at which point it would be set to:
Here it is assumed that Until sufficient QCs were achieved to advance finality according to IF, the root of the fork database would remain After the finalizer votes strongly on block
A finalizer setups a new Leap instance during the IF transitionThe finalizer safety information for the finalizer would be initialized on startup to:
While the root of the fork database would be Until sufficient QCs were achieved to advance finality according to IF, the root of the fork database would remain Assume that eventually a linkable block After the finalizer votes weakly on block
A finalizer setups a new Leap instance after the IF transitionThe finalizer safety information for the finalizer would be initialized on start up to:
In the above, At that moment in time, it would be possible for the finalizer to vote on a linkable block that had a timestamp greater than Assume the last irreversible block advanced to Now consider a block However, the liveness condition is satisfied because So with monotonicity and liveness satisfied, the finalizer can vote on Assuming the finalizer votes weakly on block
On the other hand, assuming the finalizer votes strongly on block
In the above, Notice that the A finalizer restarts a Leap instance that crashed during the IF transaction from a snapshot prior to the IF transition and syncsTBC A finalizer restarts a Leap instance that crashed during the IF transaction from a snapshot during the IF transition and syncsTBC A finalizer restarts a Leap instance that crashed during the IF transaction from a snapshot after the IF transition and syncsTBC A finalizer restarts a post-IF Leap instance from a snapshot prior to the IF transition and syncsTBC A finalizer restarts a post-IF Leap instance from a snapshot during the IF transition and syncsTBC A finalizer restarts a post-IF Leap instance from a snapshot after the IF transition and syncsTBC Disaster recoveryThere are several disaster recovery scenarios the IF consensus algorithm should be designed to support:
In all three scenarios there are cases A, B, C, and D to consider: Leap should not start up with a corrupted finalizer safety information file if it has any finalizer keys provided. In this case (case C) it should force the node operator to make a decision: either they intentionally destroy the file so that Leap can start up (forcing it into case D) or they copy over an older finalizer safety information file that wasn't corrupted. If they make the latter decisions they are forcing it into either case A (unlikely: the backup file they copied over happened to somehow have the exact state prior to the crash), case B (likely: the backup file is relevant to the node but it is a little stale so it does not have the most up-to-date information), or effectively case D (unlikely: they copy over the wrong file meant for other finalizer keys, and so for the finalizer keys provided to this Leap instance the file has no finalizer safety information relevant to it). In all cases for scenarios 1 and 2, there is at least one live node in the network that can eventually provide enough blocks to either help the restarted node recover the fork database it had prior to the crash (perhaps with additional blocks it did not have before) or a different fork database with a later root block which causes some of the blocks held in the prior iteration of the fork database to be removed (either because they are orphaned or because they are ancestors of the new last irreversible block). We can refer to the first situation as the recovered fork database situation. We can refer to the second as the progressed fork database situation. TODO: Classify the various scenarios, e.g. 1A, 2A, etc., into classes such as "theoretically safe automatic recovery scenarios", "practically safe automatic recovery scenarios", "possibly manual recovery scenarios", etc. and expand on the details in each case. |
From Areg So the safety condition for a target block a finalizer is considering voting for could be checked using just the block header state. If the lock.block_num() is less than target_block->core.last_final_block_num() or greater than or equal to target_block->core.current_block_num(), then the safety condition is not satisfied. Otherwise, use target_block->core.get_block_reference(lock.block_num()) to get the block_ref and use it to compare the the block ID (or finality digest) stored in lock to check if you have the correct block. If you do, then the safety condition is satisfied. If not, then the safety condition is not satisfied. Similarly, the liveness condition can use core.get_block_reference to lookup with using the qc claim block number. |
The finalizer safety information should be updated to track the following data per each finalizer:
Whenever a finalizer signs a new block, the finalizer safety information must be updated and durably committed before propagating the signature in a vote message.
Whenever a finalizer considers a new block to sign, it must consult its existing finalizer safety information to determine whether it should sign, and if so whether it should sign strongly or weakly. The rules for this are captured in the pseudo-code:
leap/libraries/chain/hotstuff/hs_pseudo
Lines 283 to 341 in 59532df
Related to #2069.
For this issue, update the finalizer signing process to consider the changes described above and in the pseudo-code. When signing weakly, the digest to sign should be a hash of the concatenation of the
finalizer_digest
and the stringWEAK
.The text was updated successfully, but these errors were encountered: