-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: pool-coordinator #1126
Fix: pool-coordinator #1126
Conversation
// createInsecureRestClientConfig create insecure rest client config. | ||
func createInsecureRestClientConfig(remoteServer *url.URL) (*restclient.Config, error) { | ||
// CreateInsecureRestClientConfig create insecure rest client config. | ||
func CreateInsecureRestClientConfig(remoteServer *url.URL) (*restclient.Config, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we expose the function?
I don't find anywhere that uses it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix it.
pkg/yurthub/filter/approver_test.go
Outdated
@@ -43,6 +43,7 @@ var supportedResourceAndVerbsForFilter = map[string]map[string]sets.String{ | |||
}, | |||
ServiceTopologyFilterName: { | |||
"endpoints": sets.NewString("list", "watch"), | |||
"pods": sets.NewString("list", "watch"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why to add this entry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix it.
@@ -203,10 +201,9 @@ func (coordinator *coordinator) Run() { | |||
needUploadLocalCache = true | |||
needCancelEtcdStorage = true | |||
isPoolCacheSynced = false | |||
etcdStorage = nil | |||
poolCacheManager = nil | |||
coordinator.poolCacheManager = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not update the coordinator field if we do not acquire lock, since these fields are used and exposed by IsReady
and IsHealthy
function, which may be called concurrently.
For example, we assume that coordinator is healthy when coordinator.electStatus != PendingHub
, at which time the coordinator.poolCacheManager
should also not be nil. Thus the condition coordinator.electStatus != PendingHub && coordinator.poolCacheManager != nil
is a defined situation. And the caller of IsHealthy
will always get non-nil poolCacheManager when it find the returned bool value of IsHealthy
is true.
However, if we update coordinator.poolCacheManager
here. The caller of IsHealthy
may find that the poolcoordinator is healthy, but got a nil poolCacheManager, which situation is undefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add lock here
case LeaderHub: | ||
poolCacheManager, etcdStorage, cancelEtcdStorage, err = coordinator.buildPoolCacheStore() | ||
coordinator.poolCacheManager, coordinator.etcdStorage, cancelEtcdStorage, err = coordinator.buildPoolCacheStore() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
klog.Errorf("failed to upload local cache when yurthub becomes leader, %v", err) | ||
} else { | ||
needUploadLocalCache = false | ||
} | ||
} | ||
case FollowerHub: | ||
poolCacheManager, etcdStorage, cancelEtcdStorage, err = coordinator.buildPoolCacheStore() | ||
coordinator.poolCacheManager, coordinator.etcdStorage, cancelEtcdStorage, err = coordinator.buildPoolCacheStore() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@@ -316,7 +311,8 @@ func (coordinator *coordinator) IsReady() (cachemanager.CacheManager, bool) { | |||
// If electStatus is not PendingHub, it means pool-coordinator is healthy. | |||
coordinator.Lock() | |||
defer coordinator.Unlock() | |||
if coordinator.electStatus != PendingHub && coordinator.isPoolCacheSynced && !coordinator.needUploadLocalCache { | |||
// fixme: coordinator.isPoolCacheSynced now is not considered | |||
if coordinator.electStatus != PendingHub && !coordinator.needUploadLocalCache { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why to remove the isPoolCacheSynced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now isPoolCacheSynced is never set to 'true'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acctually we need isPoolCacheSynced
. I'll submit a pr to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed by #1131
@@ -55,7 +55,7 @@ func NewHubElector( | |||
coordinatorClient: coordinatorClient, | |||
coordinatorHealthChecker: coordinatorHealthChecker, | |||
cloudAPIServerHealthChecker: cloudAPIServerHealthyChecker, | |||
electorStatus: make(chan int32), | |||
electorStatus: make(chan int32, 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why to set the capability of electorStatus as 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The capability set to 0 would cause deadlock in line: https://github.com/openyurtio/openyurt/blob/pool-coordinator-dev/pkg/yurthub/poolcoordinator/leader_election.go#L91
@Congrool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving he.electorStatus <- PendingHub
to the begining of Run()
?
cmd/yurthub/app/start.go
Outdated
if err != nil { | ||
return fmt.Errorf("failed to wait for coordinator to run, %v", err) | ||
} | ||
} | ||
|
||
// wait for async coordinator informer registry | ||
time.Sleep(time.Second * 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems not a good way to use time.Sleep
to sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I'll change it to notification.
@@ -120,48 +120,49 @@ func (c *componentKeyCache) Recover() error { | |||
} | |||
|
|||
func (c *componentKeyCache) getPoolScopedKeyset() (*keySet, error) { | |||
client := c.getEtcdClient() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the function is uninitialized. I'll fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will you fix it into my pr or create a new one ? @Congrool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll create a new pr to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed by #1137
BTW, could you please split your PR into several commits each of which fixes one bug? It will be helpful for review, thanks! |
Sure, I'll do that later |
… into pool-coordinator-fix
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: LaurenceLiZhixin, rambohe-ch The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Fix: pool-coordinator
fix some logical bugs and nil pointer.