-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ClusterClaim deletion validator #4062
Add ClusterClaim deletion validator #4062
Conversation
/test-multicluster-e2e |
Codecov Report
@@ Coverage Diff @@
## main #4062 +/- ##
==========================================
+ Coverage 67.51% 68.00% +0.49%
==========================================
Files 297 310 +13
Lines 44988 45477 +489
==========================================
+ Hits 30373 30928 +555
+ Misses 12238 12125 -113
- Partials 2377 2424 +47
|
1584db0
to
a3a9658
Compare
defer r.mutex.Unlock() | ||
r.mutex.Lock() | ||
|
||
if leaderID == common.InvalidClusterID { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The InvalidClusterID
will be set only when the ClusterSet is not created and common area is nil or the ClusterSet is deleted and common area is reset to nil. So there is no case to call updateLeaderStatus
when the leader ID is InvalidClusterID
.
/test-multicluster-e2e |
multicluster/cmd/multicluster-controller/clusterclaim_webhook.go
Outdated
Show resolved
Hide resolved
multicluster/cmd/multicluster-controller/clusterclaim_webhook.go
Outdated
Show resolved
Hide resolved
multicluster/cmd/multicluster-controller/clusterclaim_webhook.go
Outdated
Show resolved
Hide resolved
multicluster/cmd/multicluster-controller/clusterclaim_webhook.go
Outdated
Show resolved
Hide resolved
multicluster/cmd/multicluster-controller/clusterclaim_webhook.go
Outdated
Show resolved
Hide resolved
data.leaderStatus.reason = ReasonConnectedLeader | ||
} | ||
if !apierrors.IsNotFound(err) { | ||
return ctrl.Result{}, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to unlock. I suggest to set err to nil if apierrors.IsNotFound(err), and break and check err and return after line 135.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, defined a variable err
.
multicluster/controllers/multicluster/memberclusterannounce_controller.go
Outdated
Show resolved
Hide resolved
// TODO: Add ClusterClaim webhook to make sure it cannot be deleted while ClusterSet is present. | ||
// If it's not found error, ClusterClaims were probably deleted during the processing of MemberClusterAnnounce. | ||
// Nothing to handle in this case since it means no ClusterSet is referring to it or the ClusterSet is deleted | ||
// as well, no status need to be updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"and no need to update cluster status"
I actually suggest to remove this sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the whole sentence.
// Check whether this local cluster is the leader for this member. | ||
clusterClaim := &multiclusterv1alpha2.ClusterClaim{} | ||
if err := r.Get(context.TODO(), types.NamespacedName{Name: multiclusterv1alpha2.WellKnownClusterClaimID, Namespace: req.Namespace}, clusterClaim); err == nil { | ||
if clusterClaim.Value == memberAnnounce.LeaderClusterID { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When possible the leader clusterID does not match? Again, I suggest to remove laderClusterID from memberAnnounce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that user creates a member ClusterSet with right access but incorrect leader ClusterID, if we didn't validate the leader ID and allow it to create the member cluster announce, then member cluster will show a status with wrong cluster ID but connected status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually feel leader cluster ID in ClusterSet is not very useful. It is not used anywhere, except these validations. But anyway, we can keep the current way, if you prefer.
// If err != nil, probably ClusterClaims were deleted during the processing of MemberClusterAnnounce. | ||
// Nothing to handle in this case and MemberClusterAnnounce will also be deleted soon. | ||
// TODO: Add ClusterClaim webhook to make sure it cannot be deleted while ClusterSet is present. | ||
// If it's not found error, ClusterClaims were probably deleted during the processing of MemberClusterAnnounce. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we should revisit the design of Add/RemoveMember() after ClusterSet changes. Why that indirect, but not start/stop memberAnnounce periodic check right after the memberAnnounce is created/deleted in MemberClusterAnnounceReconciler?
Pasted below my earlier comments too.
- Do we really need leader cluster ID in the MemberAnnounce?
- Do we still need all status conditions after we remove election?
- Add/RemoveMember() need not to be called by ClusterSet changes, after we change to auto-update ClusterSet, but we can start to maintain the timerdata once MemberClusterAnnounce is created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, from member cluster's perspective, I feel we don't have to provide leader's ClusterID as long as the access token is correct. We probably need to refactor the whole ClusterSet part including the ClusterSet spec vs ClusterSet status for joined member.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can decide leader ClusterID later. But please consider 3). The current code is unnecessarily complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will double check this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jianjuns I have refined this part, the MemberClusterAnnounce is responsible to initialize the data now. Could you help to take another look? thanks.
c0241f1
to
5729c74
Compare
multicluster/controllers/multicluster/memberclusterannounce_controller.go
Show resolved
Hide resolved
@@ -114,38 +108,34 @@ func (r *MemberClusterAnnounceReconciler) Reconcile(ctx context.Context, req ctr | |||
klog.V(2).InfoS("Reset lastUpdateTime", "cluster", memberAnnounce.ClusterID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove or rephrase the comment: "// If the member is not found in timerData, it means the member is not added to the ClusterSet or was deleted."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, changed it to If the member is not found in timerData, it means MemberClusterAnnounce is not created or it was deleted.
multicluster/controllers/multicluster/memberclusterannounce_controller.go
Outdated
Show resolved
Hide resolved
@@ -114,38 +108,34 @@ func (r *MemberClusterAnnounceReconciler) Reconcile(ctx context.Context, req ctr | |||
klog.V(2).InfoS("Reset lastUpdateTime", "cluster", memberAnnounce.ClusterID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading the code more, I wonder should we do the following two:
- Merge timerData and memberStatus to one map (and call it memberStatusMap)
- Move this block to addMember() (and rename the func to sth like addUpdateMemberStatus())
What you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am thinking it should work, but If we'd like to include this PR to 1.8, I feel it's better to has a follow up PR to rethink about the whole status part, what's your suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine to do another PR, but I do feel we should clean up the code. The current implementation is unnecessarily complex, hard to follow and may include bugs due to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I feel we need rethink the design including ClusterClaim/ClusterSet and status.
5729c74
to
04a2198
Compare
} | ||
// If MemberClusterAnnounce is deleted, no need to process because RemoveMember must already | ||
// be called. | ||
// If MemberClusterAnnounce is deleted, no need to process because the Finalizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If MemberClusterAnnounce is deleted, no further processing is needed, as cleanup must have been done when the Finalizer was removed.
multicluster/controllers/multicluster/memberclusterannounce_controller.go
Outdated
Show resolved
Hide resolved
@@ -114,38 +108,34 @@ func (r *MemberClusterAnnounceReconciler) Reconcile(ctx context.Context, req ctr | |||
klog.V(2).InfoS("Reset lastUpdateTime", "cluster", memberAnnounce.ClusterID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine to do another PR, but I do feel we should clean up the code. The current implementation is unnecessarily complex, hard to follow and may include bugs due to that.
9f0db70
to
566c5fa
Compare
multicluster/cmd/multicluster-controller/clusterclaim_webhook.go
Outdated
Show resolved
Hide resolved
multicluster/cmd/multicluster-controller/clusterclaim_webhook.go
Outdated
Show resolved
Hide resolved
multicluster/controllers/multicluster/commonarea/remote_common_area.go
Outdated
Show resolved
Hide resolved
multicluster/controllers/multicluster/memberclusterannounce_controller.go
Outdated
Show resolved
Hide resolved
return ctrl.Result{}, client.IgnoreNotFound(err) | ||
} | ||
|
||
r.mapLock.Lock() | ||
// If the member is not found in timerData, it means the member is not added to the ClusterSet or was deleted. | ||
if data, ok := r.timerData[common.ClusterID(memberAnnounce.ClusterID)]; ok { | ||
// If the member is not found in memberStatus, it means MemberClusterAnnounce is not created |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I think this block (from line 94 to 119) should be moved to addMemberStatus (probably rename it to addOrUpdateMemberStatus). Could you check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
@@ -163,6 +142,7 @@ func (r *MemberClusterAnnounceReconciler) Reconcile(ctx context.Context, req ctr | |||
klog.ErrorS(err, "Failed to add member cluster to ClusterSet", "cluster", memberAnnounce.ClusterID) | |||
return ctrl.Result{}, err | |||
} | |||
r.addMemberStatus(memberID, v1.ConditionTrue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move it to the end of the func after adding the Finalizer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to place it before adding Finalizer since member is actually connected and able to do export/import etc.
@@ -163,6 +142,7 @@ func (r *MemberClusterAnnounceReconciler) Reconcile(ctx context.Context, req ctr | |||
klog.ErrorS(err, "Failed to add member cluster to ClusterSet", "cluster", memberAnnounce.ClusterID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should ignore "already exists" error in addMemberToClusterSet (in my mind better to add comments on addMemberToClusterSet to explain the behavior)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, return nil for already exists
error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I did not see that. Then we should move r.addOrUpdateMemberStatus() to line 113 before "if common.StringExistsInSlice(memberAnnounce.Finalizers, finalizer)"?
I assume we can call the func only once, and remove the call at line 95.
@@ -163,6 +142,7 @@ func (r *MemberClusterAnnounceReconciler) Reconcile(ctx context.Context, req ctr | |||
klog.ErrorS(err, "Failed to add member cluster to ClusterSet", "cluster", memberAnnounce.ClusterID) | |||
return ctrl.Result{}, err | |||
} | |||
r.addMemberStatus(memberID, v1.ConditionTrue) | |||
klog.InfoS("Adding finalizer to MemberClusterAnnounce", "MemberClusterAnnounce", klog.KObj(memberAnnounce)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will repeatedly log this message for every memberAnnounce update. Should should check if Finalizer is already added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the finalizer exists, it will return, code won't reach here.
if common.StringExistsInSlice(memberAnnounce.Finalizers, finalizer) {
return ctrl.Result{}, nil
}
566c5fa
to
aa20a88
Compare
/test-multicluster-e2e |
@@ -163,6 +142,7 @@ func (r *MemberClusterAnnounceReconciler) Reconcile(ctx context.Context, req ctr | |||
klog.ErrorS(err, "Failed to add member cluster to ClusterSet", "cluster", memberAnnounce.ClusterID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I did not see that. Then we should move r.addOrUpdateMemberStatus() to line 113 before "if common.StringExistsInSlice(memberAnnounce.Finalizers, finalizer)"?
I assume we can call the func only once, and remove the call at line 95.
// Initialize the member status with unknown status when controller restarts. | ||
// Let's background process to check and update the connection status. | ||
// v1.ConditionUnknown will be ignored if status already exists in the map. | ||
r.addOrUpdateMemberStatus(memberID, v1.ConditionUnknown) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this call, and keep only a single call of addOrUpdateMemberStatus(), e.g. at line 112?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
aa20a88
to
37a2105
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questions and a few nits.
} | ||
} | ||
|
||
if apierrors.IsNotFound(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just do:
localClusterMemberAnnounceExists := err == nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return err | ||
} | ||
var localClusterMemberAnnounceExist bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
localClusterMemberAnnounceExist -> localClusterMemberAnnounceExists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
multicluster/controllers/multicluster/commonarea/remote_common_area.go
Outdated
Show resolved
Hide resolved
multicluster/controllers/multicluster/memberclusterannounce_controller.go
Outdated
Show resolved
Hide resolved
multicluster/controllers/multicluster/memberclusterannounce_controller.go
Show resolved
Hide resolved
multicluster/controllers/multicluster/memberclusterannounce_controller.go
Show resolved
Hide resolved
LastTransitionTime: metav1.Now(), | ||
Message: "Member created", | ||
Reason: ReasonNeverConnected, | ||
Message: "Member Created", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question - what is difference between "Member Created" and "Member Connected"? Should we just set the condition to "ConditionTrue + Member Connected"? Or there is a protocol that member should notify leader after it is ready by updating the Announce?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Member Created"
will be used only when controller starts. It will become Member Connected"
when member cluster send a MemberClusterAnnounce update. The purpose is handle the case MemberClusterAnnounce
aleady exists on leader cluster, but it might be stale and will be cleaned up soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I meant what is the problem if we set the condition to "ConditionTrue + Member Connected" directly here? If the member cluster does not update the Announce, it will be changed to "Disconnected" anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will be changed to "Disconnected", I feel both way works. Do you recommend to use Member Connected
only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does member update the Announce right away after the creation? If yes, I am fine with the current way too; if not, I feel "ConditionTrue + Member Connected" works better, while "ConditionUnknown" is misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be updated every 5 seconds if member cluster works well. But you are right, I feel stale MemberClusterAnnounce case will be rare, let me change it to "ConditionTrue + Member Connected".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, do you think 5 seconds may be too frequent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I feel it can be longer, how about 15s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was thinking about 10 seconds. 15 seconds sound good to me too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I updated it to 10s in this PR, it can be quicker to detect failure.
37a2105
to
da508d5
Compare
/test-multicluster-e2e |
da508d5
to
22a172b
Compare
|
||
localClusterMemberAnnounceExists := !apierrors.IsNotFound(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to pass nil to apierrors.IsNotFound()? I would suggest:
localClusterMemberAnnounceExists := err == nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
1. Refine ClusterClaim webhook, add deletion validator to deny deletion if there is any ClusterSet referring to it. 2. Move status data initialization from leader ClusterSet controller to MemberClusterAnnounce controller and simplify the status: * When the MemberClusterAnnounce is created, the `Ready` status will be `True` and `Reason` will be `Connected`. * When the MemberClusterAnnounce is not updated in the agreed period, the `Ready` status will be `false` and `Reason` will be `Disconnected`. * When the MemberClusterAnnounce is updated after Disconnected, the `Ready` status will be reset to `true` and `Reason` will be `Connected`, lastUpdateTime will also be reset. * When the MemberCLusterAnnounce is deleted, the corresponding status will be removed. Signed-off-by: Lan Luo <luola@vmware.com>
22a172b
to
25c10a9
Compare
/test-multicluster-e2e |
/skip-all |
if there is any ClusterSet referring to the ClusterClaim.
MemberClusterAnnounce controller and simplify the status:
Ready
status will beTrue
andReason
will beConnected
.the
Ready
status will befalse
andReason
will beDisconnected
.Ready
status will be reset to
true
andReason
will beConnected
, andlastUpdateTime
will also be reset.will be removed.
Signed-off-by: Lan Luo luola@vmware.com