Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for PVC label selector conflicts against the secondary cluster. - WIP(Need to test) #1855

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

GowthamShanmugam
Copy link

@GowthamShanmugam GowthamShanmugam commented Feb 20, 2025

Example Scenario: PVC Label Selector Conflict Check
Primary Cluster (Cluster 1)
PVC:

name: app-pvc-1
labels:
  app: my-app

VRG1 (Primary for VolumeReplicationGroup of Cluster 1):

pvcLabelSelector:
  matchExpressions:
    - key: "app"
      operator: "Exists"

This configuration ensures that any PVC with the label key app is selected.

Secondary Cluster (Cluster 2)
PVCs:

name: app-pvc-2
labels:
  app: mysql

name: app-pvc-3
labels:
  app: backup-service

VRG2 (Primary for VolumeReplicationGroup of Cluster 2):

pvcLabelSelector:
  matchExpressions:
    - key: "app"
      operator: "In"
      values: ["backup-service"]

This configuration selects only PVCs labeled app: backup-service.

Why is a Conflict Check Needed?

1️⃣ Unintended PVC Selection

  • After failover, if VRG1 incorrectly applies its label selector on Cluster 2, it might mistakenly include:
    • app-pvc-2 (since it is not yet protected by any VRG).
    • app-pvc-3 (though this is currently prevented by the addition of a VRG owner label).

2️⃣ Data Integrity & Isolation

  • Ensures that the correct application workload is replicated and protected.
  • Prevents accidental inclusion of PVCs from unrelated workloads during failover.
  • Detects potential conflicts at the earliest stage, minimizing risks in the failover process.

By implementing PVC label selector conflict checks, you can ensure a seamless and reliable failover strategy while avoiding unintended PVC protection. 🚀

@GowthamShanmugam GowthamShanmugam changed the title Check for PVC label selector conflicts against the secondary cluster. Check for PVC label selector conflicts against the secondary cluster. - WIP(Need to test) Feb 20, 2025
@GowthamShanmugam GowthamShanmugam force-pushed the secondary_pvc_conflict_check branch from d8f3da4 to 92a2325 Compare February 20, 2025 20:10
@nirs
Copy link
Member

nirs commented Feb 21, 2025

How do you plan to do the conflict check?

Labels can be added and removed at any time. Lets say we have a conflict check the time a drpc is added, there are no conflict.

Now the user add a label or add a new pvc machine another drpc label. The next time we look at the pvcs we will find a new pvc and protect it. How the conflict check helps?

The example you using Exist for label is a very bad way to select pvcs. The only way to protect from such bad label selector is to reject Exists selector and allow only In selector.

@GowthamShanmugam
Copy link
Author

How do you plan to do the conflict check?

Labels can be added and removed at any time. Lets say we have a conflict check the time a drpc is added, there are no conflict.

The conflict check is implemented within the VRG reconciliation process, which already watching PVCs for the addition of the VRG owner label. Therefore, updating labels on PVCs does not pose an issue.

Now the user add a label or add a new pvc machine another drpc label. The next time we look at the pvcs we will find a new pvc and protect it. How the conflict check helps?

I think you are explaining primary cluster behavior,

The conflict check applies specifically to the secondary cluster. If any PVC in the secondary cluster matches the primary VRG’s label selector, the check will trigger an error. It does not consider whether the PVC is owned by another VRG or remains unowned.

The example you using Exist for label is a very bad way to select pvcs. The only way to protect from such bad label selector is to reject Exists selector and allow only In selector.

The above example provides a straightforward, high-level overview to highlight the importance of performing a PVC conflict check in the secondary cluster. Since a PVC can have multiple labels, conflicts can arise in various ways.

@nirs
Copy link
Member

nirs commented Feb 21, 2025

The conflict check applies specifically to the secondary cluster. If any PVC in the secondary cluster matches the primary VRG’s label selector, the check will trigger an error. It does not consider whether the PVC is owned by another VRG or remains unowned.

The flow is not clear. Can yo describe the state in both clusters and explain when the conflicts check happens, and what is expected result of the conflict check? Does it only log an error, or prevent failover/relocate?

The behavior must be specified and discussed before we working on a solution. An issue is the right place to describe the problem and specify the solution.

@GowthamShanmugam
Copy link
Author

GowthamShanmugam commented Feb 21, 2025

The conflict check applies specifically to the secondary cluster. If any PVC in the secondary cluster matches the primary VRG’s label selector, the check will trigger an error. It does not consider whether the PVC is owned by another VRG or remains unowned.

The flow is not clear. Can yo describe the state in both clusters and explain when the conflicts check happens, and what is expected result of the conflict check? Does it only log an error, or prevent failover/relocate?

The behavior must be specified and discussed before we working on a solution. An issue is the right place to describe the problem and specify the solution.

This PR is part of the independent VM feature, and I have already discussed the issue with my sub-team. I raised the PR to facilitate discussions with other sub-teams, as I felt continuing the discussion under an issue would delay implementation.

I still need help deciding whether to simply log the issue, return an error in the secondary VRG reconciliation, or update a status condition.

Signed-off-by: Gowtham Shanmugasundaram <gshanmug@redhat.com>
@GowthamShanmugam GowthamShanmugam force-pushed the secondary_pvc_conflict_check branch from 92a2325 to 05da967 Compare February 21, 2025 11:34
@nirs
Copy link
Member

nirs commented Feb 21, 2025

This PR is part of the independent VM feature, and I have already discussed the issue with my sub-team. I raised the PR to facilitate discussions with other sub-teams, as I felt continuing the discussion under an issue would delay implementation.

There is no way to discuss this PR with other people if the problem and the suggested behavior is not specified in public.

@GowthamShanmugam
Copy link
Author

This PR is part of the independent VM feature, and I have already discussed the issue with my sub-team. I raised the PR to facilitate discussions with other sub-teams, as I felt continuing the discussion under an issue would delay implementation.

There is no way to discuss this PR with other people if the problem and the suggested behavior is not specified in public.

I still don’t understand why presenting the problem along with a clear example isn't sufficient for the PR review. While we can continue refining the solution by creating a well-documented issue, drafting extensive documentation, and holding multiple meetings, this approach will only prolong the process. The purpose of the PR is to initiate discussions and collaboratively work toward a solution rather than delaying implementation with excessive formalities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants