-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix nil pointer issue when clusterset is deleted in leader #3915
Conversation
/test-multicluster-dataplane-e2e |
Codecov Report
@@ Coverage Diff @@
## main #3915 +/- ##
==========================================
- Coverage 63.91% 54.35% -9.56%
==========================================
Files 293 290 -3
Lines 43227 43166 -61
==========================================
- Hits 27628 23463 -4165
- Misses 13363 17780 +4417
+ Partials 2236 1923 -313
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit message, change "," to "." or ", and" between two sentences.
Please pay attention to such basic grammar mistakes (I often saw it in your PR and reminded you a few times for the same one).
@@ -232,10 +233,10 @@ func (r *MemberClusterAnnounceReconciler) processMCSStatus() { | |||
|
|||
/******************************* MemberClusterStatusManager methods *******************************/ | |||
|
|||
func (r *MemberClusterAnnounceReconciler) AddMember(MemberId common.ClusterID) { | |||
defer r.mapLock.Unlock() | |||
func (r *MemberClusterAnnounceReconciler) AddMember(memberId common.ClusterID) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change memberId to memberID
@luolanzone what will happen when the issue occurs? MC Controller will crash? Please still create an issue to track it. We should do more tests to detect such basic issues. |
I will pay attention on it.
Sure, thanks for reminding. I missed this kind of issue, I will double check it next time. |
5e0c5b1
to
67636f7
Compare
Conditions: conditions} | ||
|
||
r.timerData[MemberId] = &timerData{connected: false, lastUpdateTime: time.Time{}} | ||
r.timerData[memberID] = &timerData{connected: false, lastUpdateTime: time.Time{}} | ||
klog.InfoS("new member is added", "member", memberID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please capitalize logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if _, ok := r.timerData[memberID]; ok { | ||
klog.InfoS("remove members", "member", memberID) | ||
delete(r.timerData, memberID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the previous code doesn't have data race. The issue was caused by data inconsistency and it's still there.
RemoveMember
removed the item from r.memberStatus
but not r.timerData
, but processMCSStatus
expects items existing in r.timerData
also exist in r.memberStatus
, so accessing r.memberStatus[member].Condition
panic.
I think the right fix is removing the two return
in this function. We should add unit test in this PR, not a separate PR. One of the points of tests is to test the situation that has the issue, and to prove this patch can fix it. I don't think an e2e test can be very helpful here, it's likely unable to reproduce the scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks.
if _, ok := r.timerData[MemberId]; ok { | ||
delete(r.timerData, MemberId) | ||
if _, ok := r.timerData[memberID]; ok { | ||
klog.InfoS("remove members", "member", memberID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"remove members" is not clear whether it's going to remove it or it has done it. Please move it to the bottom of the method and change the log to "Removed member" to indicate it has been done.
Perhaps change the log in AddMember
to "Added member" to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -60,5 +59,8 @@ $GOPATH/bin/informer-gen \ | |||
--output-package "${ANTREA_PKG}/multicluster/pkg/client/informers" \ | |||
--go-header-file hack/boilerplate.go.txt | |||
|
|||
go get sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest not to include unrelated changes in bug fix PRs, one PR should focus on one purpose, especially when it's a bug fix. Imageine how we handle it if this needs to be backported. The side effect of bumping libraries would be backported or more efforts would be spent on dividing the patches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other changes have been moved to this PR #3919
3327c3a
to
7bf0f27
Compare
|
||
GetMemberClusterStatuses() []multiclusterv1alpha1.ClusterStatus | ||
// This is for unit test only | ||
GetTimerData() map[common.ClusterID]*timerData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had similar comments before, does test code really need this interface to access the data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, yes, it's unnecessary, removed it.
7bf0f27
to
7e66a35
Compare
/test-multicluster-dataplane-e2e |
s/due to data inconsistent/due to data inconsistency/ and perhaps remove the code line and stack trace from commit message which doesn't look neat in git log history, and is more suitable to be put in issue. |
7e66a35
to
b3d9312
Compare
Done |
b3d9312
to
d5895e8
Compare
/test-multicluster-dataplane-e2e |
1 similar comment
/test-multicluster-dataplane-e2e |
status := memberClusterAnnounceReconcilerUnderTest.GetMemberClusterStatuses() | ||
expectedTimeData := memberClusterAnnounceReconcilerUnderTest.timerData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not expected data but actual data.
and it should be timerData not timeData to be consistent.
either actualTimerData or just timerData, but better to add "actual" to both of "status" and "timerData" or none of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, changed to actualTimerData
multicluster/controllers/multicluster/memberclusterannounce_controller_test.go
Outdated
Show resolved
Hide resolved
When the ClusterSet is deleted in leader Namespace, there will be a nil pointer issue due to data inconsistency. This issue will cause leader controller crash and restart. Refined codes to make data consistent. Signed-off-by: Lan Luo <luola@vmware.com>
d5895e8
to
c7577f5
Compare
/test-multicluster-dataplane-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/skip-all |
When the ClusterSet is deleted in leader Namespace, there will be
a nil pointer issue due to data inconsistency. This issue will cause
leader controller crash and restart. Refined codes to make data
consistent.
```
leader_clusterset_controller.go:72] "Received ClusterSet delete" config=""
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x188b9b1]
Fixes #3918
Signed-off-by: Lan Luo luola@vmware.com