Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controller fail on syncing config when a service is not found #646

Closed
ashi009 opened this issue Feb 26, 2019 · 11 comments
Closed

Controller fail on syncing config when a service is not found #646

ashi009 opened this issue Feb 26, 2019 · 11 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@ashi009
Copy link

ashi009 commented Feb 26, 2019

Ingress controller fails to sync the entire config in case a backend service is missing.

We ran into last week. We deployed the ingress config for a new service along with other changes to the cluster before actually deploying the new service. None of the change was made until the new service is deployed. At the meantime ingress controller only reported a warning mentioned nothing on the gravity of the issue. We later realized this issue as some other changes are not effective, though all the checks against the ingress look sane.

GKE service console also failed to show this kind of issues as errors, it looks just like a minor warning:
image

IMHO, the ingress controller should do partial sync, and create some dummy backend resources that are unhealthy. (I think the major issue here is about the naming, as current implementation uses node port number to name backends.)

@rramkumar1
Copy link
Contributor

@ashi009 What are you hoping to gain by the load balancer being partially created with dummy backends? Even if we were to do this, you can't send traffic to the LB.

Is the error message in the GKE console not clear enough?

@ashi009
Copy link
Author

ashi009 commented Feb 27, 2019

Sorry that I didn't make the request clear.

With dummy backends, the config push will success and only the service that has error will not work. For our case, we use ingress config sharding to allow different teams to manage their own ingress in their own namespaces without having to manage SSL cert and DNS setups (due to #369). Hence, letting a single shard to fail the entire batch is particularly annoying. ATM, there is no easy way to validate a config first before pushing it to the cluster.

The error message shows up on GKE console as a warning,

image

instead of an error:

image

Which delivers a mixed signal.

@bowei
Copy link
Member

bowei commented Feb 27, 2019

Can you describe what "ingress config sharding" means? Is this an automated system or a way you have setup your config?

@ashi009
Copy link
Author

ashi009 commented Mar 2, 2019

It's an automated pipeline we have. Each team puts custom ingress resources in their namespaces, and a server will monitor those custom resources, and produce combined ingress resources.

@rramkumar1
Copy link
Contributor

@ashi009 Why not simply wait for the services to be deployed before creating the Ingress? Then, if your concern is knowing when to deploy the Ingress, I suppose you could periodically check if all of the Services referenced in the Ingress specification exist. Forgive me if what I said doesn't make sense, just trying to understand your use case better :)

@ashi009
Copy link
Author

ashi009 commented Mar 12, 2019

@rramkumar1 You are right, technically. Ideally, no bad config will ever be pushed to production, but...

I might have oversimplified the issue and made it a very specific issue. A broader problem here is that the control panel for configuring ingress controller is through ingress resources, and a user may not know the implementation details of the controller under the hood. Which means a user may not know the criteria needed to let ingress controller X to work on a given ingress resource, so there is no way to perform such checks manually beforehand to guarantee a successful config push.

@rramkumar1
Copy link
Contributor

@ashi009 If the services referenced by an Ingress do not exist, it really is an invalid config.

I think the major issue here is about the naming, as current implementation uses node port number to name backends.

Can you elaborate more? We have a couple open issues related to naming and its high on our list to fix.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 11, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 11, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants