Controller fail on syncing config when a service is not found #646

ashi009 · 2019-02-26T08:09:18Z

Ingress controller fails to sync the entire config in case a backend service is missing.

We ran into last week. We deployed the ingress config for a new service along with other changes to the cluster before actually deploying the new service. None of the change was made until the new service is deployed. At the meantime ingress controller only reported a warning mentioned nothing on the gravity of the issue. We later realized this issue as some other changes are not effective, though all the checks against the ingress look sane.

GKE service console also failed to show this kind of issues as errors, it looks just like a minor warning:

IMHO, the ingress controller should do partial sync, and create some dummy backend resources that are unhealthy. (I think the major issue here is about the naming, as current implementation uses node port number to name backends.)

rramkumar1 · 2019-02-27T05:56:26Z

@ashi009 What are you hoping to gain by the load balancer being partially created with dummy backends? Even if we were to do this, you can't send traffic to the LB.

Is the error message in the GKE console not clear enough?

ashi009 · 2019-02-27T07:41:03Z

Sorry that I didn't make the request clear.

With dummy backends, the config push will success and only the service that has error will not work. For our case, we use ingress config sharding to allow different teams to manage their own ingress in their own namespaces without having to manage SSL cert and DNS setups (due to #369). Hence, letting a single shard to fail the entire batch is particularly annoying. ATM, there is no easy way to validate a config first before pushing it to the cluster.

The error message shows up on GKE console as a warning,

instead of an error:

Which delivers a mixed signal.

bowei · 2019-02-27T09:08:56Z

Can you describe what "ingress config sharding" means? Is this an automated system or a way you have setup your config?

ashi009 · 2019-03-02T15:12:50Z

It's an automated pipeline we have. Each team puts custom ingress resources in their namespaces, and a server will monitor those custom resources, and produce combined ingress resources.

rramkumar1 · 2019-03-12T05:38:37Z

@ashi009 Why not simply wait for the services to be deployed before creating the Ingress? Then, if your concern is knowing when to deploy the Ingress, I suppose you could periodically check if all of the Services referenced in the Ingress specification exist. Forgive me if what I said doesn't make sense, just trying to understand your use case better :)

ashi009 · 2019-03-12T09:48:34Z

@rramkumar1 You are right, technically. Ideally, no bad config will ever be pushed to production, but...

I might have oversimplified the issue and made it a very specific issue. A broader problem here is that the control panel for configuring ingress controller is through ingress resources, and a user may not know the implementation details of the controller under the hood. Which means a user may not know the criteria needed to let ingress controller X to work on a given ingress resource, so there is no way to perform such checks manually beforehand to guarantee a successful config push.

rramkumar1 · 2019-03-13T04:52:16Z

@ashi009 If the services referenced by an Ingress do not exist, it really is an invalid config.

I think the major issue here is about the naming, as current implementation uses node port number to name backends.

Can you elaborate more? We have a couple open issues related to naming and its high on our list to fix.

fejta-bot · 2019-06-11T05:21:24Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-07-11T06:09:38Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-08-10T06:53:20Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-08-10T06:53:28Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 11, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 11, 2019

k8s-ci-robot closed this as completed Aug 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Controller fail on syncing config when a service is not found #646

Controller fail on syncing config when a service is not found #646

ashi009 commented Feb 26, 2019

rramkumar1 commented Feb 27, 2019

ashi009 commented Feb 27, 2019 •

edited

Loading

bowei commented Feb 27, 2019

ashi009 commented Mar 2, 2019

rramkumar1 commented Mar 12, 2019

ashi009 commented Mar 12, 2019

rramkumar1 commented Mar 13, 2019

fejta-bot commented Jun 11, 2019

fejta-bot commented Jul 11, 2019

fejta-bot commented Aug 10, 2019

k8s-ci-robot commented Aug 10, 2019

Controller fail on syncing config when a service is not found #646

Controller fail on syncing config when a service is not found #646

Comments

ashi009 commented Feb 26, 2019

rramkumar1 commented Feb 27, 2019

ashi009 commented Feb 27, 2019 • edited Loading

bowei commented Feb 27, 2019

ashi009 commented Mar 2, 2019

rramkumar1 commented Mar 12, 2019

ashi009 commented Mar 12, 2019

rramkumar1 commented Mar 13, 2019

fejta-bot commented Jun 11, 2019

fejta-bot commented Jul 11, 2019

fejta-bot commented Aug 10, 2019

k8s-ci-robot commented Aug 10, 2019

ashi009 commented Feb 27, 2019 •

edited

Loading