-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix sync of multi-cluster ingress #183
Conversation
Thanks for sending this @nicksardo |
/assign |
Items: []extensions.Ingress{*ing}, | ||
igs, err := lbc.CloudClusterManager.EnsureInstanceGroupsAndPorts(nodeNames, allNodePorts) | ||
if err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set syncError = err as at other places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. As I said, WIP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be retErr now?
ing.Annotations = map[string]string{} | ||
} | ||
if err = setInstanceGroupsAnnotation(ing.Annotations, igs); err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same. set syncError = err?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted syncError
pkg/controller/controller.go
Outdated
if err = setInstanceGroupsAnnotation(ing.Annotations, igs); err != nil { | ||
return err | ||
} | ||
return updateAnnotations(lbc.client, ing.Name, ing.Namespace, ing.Annotations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the returned error should be set as SyncError?
// for the Ingress associated with "key" is ready for a UrlMap update. | ||
// If this encounters an error, eg for quota reasons, we want to invoke | ||
// Phase 2 right away and retry checkpointing. | ||
// * Phase 2 performs GC by refcounting shared resources. This needs to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, I think this comment was useful specially this explanation of why we always run phase 2, even when phase 1 (creating LB fails).
This helps understand why we have the defer func and tricky syncError tracking code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will replace with a better comment elsewhere.
@nicksardo Were you able to find the PR that caused this regression? Will help in comparing how this was working before. |
It was either the PR that stopped syncing multiple load balancers or stopped syncing multiple backend services. I don't think knowing which one caused the regression is important. You can examine the code at v0.9.7 to see how MCI was separated from GCE-ingress. I think the division was too subtle and was bound to cause this break. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly lg.
One comment about how to handle errors.
Earlier, we were raising an event for any error returned by Checkpoint. We seem to be ignoring some of them now (not generating event). Is that intentional?
Items: []extensions.Ingress{*ing}, | ||
igs, err := lbc.CloudClusterManager.EnsureInstanceGroupsAndPorts(nodeNames, allNodePorts) | ||
if err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be retErr now?
} | ||
// Create the backend services and higher-level LB resources. | ||
if err = lbc.CloudClusterManager.EnsureLoadBalancer(lb, lbSvcPorts, igs); err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set to retErr?
@@ -344,20 +351,24 @@ func (lbc *LoadBalancerController) sync(key string) (err error) { | |||
// Update the UrlMap of the single loadbalancer that came through the watch. | |||
l7, err := lbc.CloudClusterManager.l7Pool.Get(key) | |||
if err != nil { | |||
syncError = fmt.Errorf("%v, unable to get loadbalancer: %v", syncError, err) | |||
return syncError | |||
return fmt.Errorf("unable to get loadbalancer: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was being set to syncError earlier and hence we were generating event for this. Is the change to not generate event intentional?
As discussed with @nicksardo, turns out I was understanding named returns wrong. /lgtm |
There is one function difference in terms of syncError. Previously, if we failed "Checkpoint", it would continue on to update the URL map. I don't think that's behavior we want to keep. Since the controller now only syncs GCP resources belonging to the ingress, updating the URL map seems pointless if there's an error earlier. |
sgtm |
/approve |
Fixes #182
Also fixes an pre-existing issue of keeping around backend services if MCI and normal ingresses shared backend services. If the normal ingress was deleted, the backend service wouldn't be GC'd until MCI was deleted.
The logic for syncing MCI vs GCE-ingress was not easily distinguishable. I've re-ordered the sync logic and broke down
Checkpoint
into something I think is more clear.Checkpoint is broken up into three separate funcs and are called in the following order:
EnsureInstanceGroupsAndPorts()
EnsureLoadBalancer()
EnsureFirewall()
cc @nikhiljindal @csbell @bowei @MrHohn @rramkumar1