Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: pool-coordinator #1126

Conversation

LaurenceLiZhixin
Copy link
Member

fix some logical bugs and nil pointer.

@openyurt-bot openyurt-bot added the size/L size/L: 100-499 label Jan 6, 2023
// createInsecureRestClientConfig create insecure rest client config.
func createInsecureRestClientConfig(remoteServer *url.URL) (*restclient.Config, error) {
// CreateInsecureRestClientConfig create insecure rest client config.
func CreateInsecureRestClientConfig(remoteServer *url.URL) (*restclient.Config, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we expose the function?
I don't find anywhere that uses it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix it.

@@ -43,6 +43,7 @@ var supportedResourceAndVerbsForFilter = map[string]map[string]sets.String{
},
ServiceTopologyFilterName: {
"endpoints": sets.NewString("list", "watch"),
"pods": sets.NewString("list", "watch"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why to add this entry?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix it.

@@ -203,10 +201,9 @@ func (coordinator *coordinator) Run() {
needUploadLocalCache = true
needCancelEtcdStorage = true
isPoolCacheSynced = false
etcdStorage = nil
poolCacheManager = nil
coordinator.poolCacheManager = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not update the coordinator field if we do not acquire lock, since these fields are used and exposed by IsReady and IsHealthy function, which may be called concurrently.

For example, we assume that coordinator is healthy when coordinator.electStatus != PendingHub, at which time the coordinator.poolCacheManager should also not be nil. Thus the condition coordinator.electStatus != PendingHub && coordinator.poolCacheManager != nil is a defined situation. And the caller of IsHealthy will always get non-nil poolCacheManager when it find the returned bool value of IsHealthy is true.

However, if we update coordinator.poolCacheManager here. The caller of IsHealthy may find that the poolcoordinator is healthy, but got a nil poolCacheManager, which situation is undefined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add lock here

case LeaderHub:
poolCacheManager, etcdStorage, cancelEtcdStorage, err = coordinator.buildPoolCacheStore()
coordinator.poolCacheManager, coordinator.etcdStorage, cancelEtcdStorage, err = coordinator.buildPoolCacheStore()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

klog.Errorf("failed to upload local cache when yurthub becomes leader, %v", err)
} else {
needUploadLocalCache = false
}
}
case FollowerHub:
poolCacheManager, etcdStorage, cancelEtcdStorage, err = coordinator.buildPoolCacheStore()
coordinator.poolCacheManager, coordinator.etcdStorage, cancelEtcdStorage, err = coordinator.buildPoolCacheStore()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -316,7 +311,8 @@ func (coordinator *coordinator) IsReady() (cachemanager.CacheManager, bool) {
// If electStatus is not PendingHub, it means pool-coordinator is healthy.
coordinator.Lock()
defer coordinator.Unlock()
if coordinator.electStatus != PendingHub && coordinator.isPoolCacheSynced && !coordinator.needUploadLocalCache {
// fixme: coordinator.isPoolCacheSynced now is not considered
if coordinator.electStatus != PendingHub && !coordinator.needUploadLocalCache {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why to remove the isPoolCacheSynced?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now isPoolCacheSynced is never set to 'true'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acctually we need isPoolCacheSynced. I'll submit a pr to fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed by #1131

@@ -55,7 +55,7 @@ func NewHubElector(
coordinatorClient: coordinatorClient,
coordinatorHealthChecker: coordinatorHealthChecker,
cloudAPIServerHealthChecker: cloudAPIServerHealthyChecker,
electorStatus: make(chan int32),
electorStatus: make(chan int32, 1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why to set the capability of electorStatus as 1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving he.electorStatus <- PendingHub to the begining of Run()?

if err != nil {
return fmt.Errorf("failed to wait for coordinator to run, %v", err)
}
}

// wait for async coordinator informer registry
time.Sleep(time.Second * 5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems not a good way to use time.Sleep to sync.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I'll change it to notification.

@@ -120,48 +120,49 @@ func (c *componentKeyCache) Recover() error {
}

func (c *componentKeyCache) getPoolScopedKeyset() (*keySet, error) {
client := c.getEtcdClient()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the function is uninitialized. I'll fix it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you fix it into my pr or create a new one ? @Congrool

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll create a new pr to fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed by #1137

@Congrool
Copy link
Member

Congrool commented Jan 8, 2023

BTW, could you please split your PR into several commits each of which fixes one bug? It will be helpful for review, thanks!

@LaurenceLiZhixin
Copy link
Member Author

BTW, could you please split your PR into several commits each of which fixes one bug? It will be helpful for review, thanks!

Sure, I'll do that later

@openyurt-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: LaurenceLiZhixin, rambohe-ch

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openyurt-bot openyurt-bot added the approved approved label Jan 11, 2023
@rambohe-ch rambohe-ch merged commit 07fc159 into openyurtio:pool-coordinator-dev Jan 11, 2023
Congrool pushed a commit to Congrool/openyurt that referenced this pull request Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved approved size/L size/L: 100-499
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants