Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disaster recovery scenario 3 #153

Merged
merged 10 commits into from
May 22, 2024
Merged

Disaster recovery scenario 3 #153

merged 10 commits into from
May 22, 2024

Conversation

heifner
Copy link
Member

@heifner heifner commented May 17, 2024

Scenario 3

Create integration test with 4 nodes (A, B, C, and D) which each have their own producer and finalizer. The finalizer policy consists of the four finalizers with a threshold of 3. The proposer policy involves all four proposers.

  • At least two of the four nodes should have a LIB N and a finalizer safety information file that locks on a block after N. The other two nodes should have a LIB that is less than or equal to block N.

All nodes are shut down. The reversible blocks on all nodes is deleted. Then restart all nodes from an earlier snapshot.

All nodes eventually sync up to block N. Some nodes will consider block N to LIB but others may not.

Not enough finalizers should be voting because of the lock in their finalizer safety information file. Verify that LIB does not advance on any node.

Cleanly shut down all nodes and delete their finalizer safety information files. Then restart the nodes.

Verify that LIB advances on all nodes and they all agree on the LIB. In particular, verify that block N is the same ID on all nodes as the one before nodes were first shutdown.

Resolves #13

@heifner heifner requested review from linh2931 and arhag May 17, 2024 12:44
@heifner heifner added the OCI Work exclusive to OCI team label May 17, 2024
@@ -558,6 +563,11 @@ def removeReversibleBlks(self):
reversibleBlks = os.path.join(dataDir, "blocks", "reversible")
shutil.rmtree(reversibleBlks, ignore_errors=True)

def removeFinalizersSafetyFile(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this should be named removeFinalizersSafetyDir.

###############################################################
# disaster_recovery - Scenario 3
#
# Create integration test with 4 nodes (A, B, C, and D) which each have their own producer and finalizer. The finalizer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be each have their be each has its?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote the paragraph.

for node in [node0, node1, node2, node3]:
assert not node.waitForLibToAdvance(), "Node advanced LIB after relaunch when it should not"

Print("Shutdown all nodes to remove finalizer safety data")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't remove finalizer safety data here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do on line 112, that is why we are shutting down.

for node in [node0, node1, node2, node3]:
assert not node.verifyAlive(), "Node did not shutdown"

for node in [node0, node1, node2, node3]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Print remove finalizer safety data here

# Create integration test with 4 nodes (A, B, C, and D) which each have their own producer and finalizer. The finalizer
# policy consists of the four finalizers with a threshold of 3. The proposer policy involves all four proposers.
#
# - At least two of the four nodes should have a LIB N and a finalizer safety information file that locks on a block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably you should say something like "make a condition such that at least two of the four nodes should have ..."

@ericpassmore
Copy link
Contributor

Note:start
group: IF
category: TEST
summary: Four node disaster recovery tests with finalizer lock and recovery from previous state.
Note:end

tests/disaster_recovery_3.py Show resolved Hide resolved
tests/disaster_recovery_3.py Outdated Show resolved Hide resolved
@heifner heifner merged commit 78fbcf2 into main May 22, 2024
36 checks passed
@heifner heifner deleted the GH-13-scenario-3 branch May 22, 2024 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCI Work exclusive to OCI team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Disaster recovery integration test
4 participants