Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IF: Disaster_recovery scenario 2 test #72

Merged
merged 16 commits into from
May 1, 2024
Merged

IF: Disaster_recovery scenario 2 test #72

merged 16 commits into from
May 1, 2024

Conversation

heifner
Copy link
Member

@heifner heifner commented Apr 25, 2024

Disaster Recovery Scenario 2

Integration test with 5 nodes (A, B, C, D, and P). Nodes A, B, C, and D each have one finalizer but no proposers. Node P has a proposer but no finalizers. The finalizer policy consists of the four finalizers with a threshold of 3. The proposer policy involves just the single proposer P.

A, B, C, and D can be connected to each other however we like as long as blocks sent to ndoe A can traverse to the other nodes B, C, and D. However, node P should only be connected to node A.

At some point after IF transition has completed and LIB is advancing, block production on node P should be paused. Enough time should be given to allow and in-flight votes on the latest produced blocks to be delivered to node P. Then, the connection between node P and node A should be severed, and then block production on node P resumed. The LIB on node P should advance to but then stall at block N. Then shortly after that, node P should be cleanly shut down.

Verify that the LIB on A, B, C, and D has stalled and is less than block N. Then, nodes A, B, C, and D can all be cleanly shut down.

Then, reversible blocks from all nodes should be removed. All nodes are restarted from an earlier snapshot (prior to block N).

P is restarted and replays up to block N after restarting from snapshot. Blocks up to and including block N are sent to the other nodes A, B, C, and D after they are also started up again.

Verify that LIB advances and that A, B, C, and D are eventually voting strong on new blocks.

@heifner heifner requested review from greg7mdp and linh2931 April 25, 2024 18:05
@heifner heifner added the OCI Work exclusive to OCI team label Apr 25, 2024
@heifner heifner linked an issue Apr 25, 2024 that may be closed by this pull request
@heifner
Copy link
Member Author

heifner commented Apr 26, 2024

Observed test failure:
Node P is shutdown with HEAD 166, LIB 127
Node A is shutdown with HEAD 130, LIB 127, locked on 128. Since 128 is lost in this scenario, Node A can't make progress after restart.

Base automatically changed from seperate_prod_fin to savanna April 28, 2024 16:57
Base automatically changed from savanna to main April 29, 2024 18:39
tests/disaster_recovery_2.py Show resolved Hide resolved
tests/disaster_recovery_2.py Show resolved Hide resolved
tests/disaster_recovery_2.py Show resolved Hide resolved
tests/disaster_recovery_2.py Outdated Show resolved Hide resolved

###############################################################
# disaster_recovery - Scenario 2
#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the detailed description here. It will be even better for each of scenarios to have one sentence describing what the purpose of the test.

I think our new tests should follow this test's pattern.

tests/disaster_recovery_2.py Show resolved Hide resolved
tests/disaster_recovery_2.py Outdated Show resolved Hide resolved
tests/disaster_recovery_2_test_shape.json Show resolved Hide resolved
@linh2931 linh2931 self-requested a review May 1, 2024 01:15
@heifner heifner merged commit d2ec18f into main May 1, 2024
71 checks passed
@heifner heifner deleted the GH-13-disaster-test2 branch May 1, 2024 10:37
@ericpassmore
Copy link
Contributor

Note:start
group: IF
category: TEST
summary: Add a test covering disaster recovery with 5 nodes (A, B, C, D, and P).
Note:end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCI Work exclusive to OCI team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Disaster recovery integration test
4 participants