Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ingress-nginx controller lose ssl certificate #5337

Closed
wu105 opened this issue Apr 8, 2020 · 4 comments
Closed

ingress-nginx controller lose ssl certificate #5337

wu105 opened this issue Apr 8, 2020 · 4 comments

Comments

@wu105
Copy link

wu105 commented Apr 8, 2020

NGINX Ingress controller version:

kubectl logs pod/nginx-ingress-controller-78465dcf9d-gvg7m -n nginx-ingress | head :
NGINX Ingress controller
Release: 0.14.0
Build: git-734361d
Repository: https://github.com/kubernetes/ingress-nginx

Kubernetes version (use kubectl version):

v1.12.7+1.2.3.el7

Environment:

  • Cloud provider or hardware configuration:

VMWare instance

  • OS (e.g. from /etc/os-release):
    Oracle Linux Server 7.6

  • Kernel (e.g. uname -a):
    4.14.35-1902.7.3.1.el7uek.x86_64

  • Install tools:
    Oracle tools for HA kubernetes cluster, 2019 release

  • Others:

What happened:

ingress controller started log the following on a ingress after working fine for many weeks:

W0408 01:40:21.454530       8 controller.go:1020] ssl certificate "devops/ht-harbor-ingress" does not exist in local store

The ingress url stopped working, apparently serving the certificates of the default backend instead.

Restarting the ingress controller by deleting the pod does not help.

However, editing the secret on the kubernetes dashboard made it to be noticed by the ingress controller again, and the ingress controller would log the following:

I0408 01:57:47.530039       8 store.go:375] secret devops/ht-harbor-ingress was updated and it is used in ingress annotations. Parsing...
10.244.5.1 - [10.244.5.1] - - [08/Apr/2020:01:57:47 +0000] "PUT /api/v1/_raw/secret/namespace/devops/name/ht-harbor-ingress HTTP/2.0" 201 23 "https://dashboard.k8s.nonprod.avaya.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36" 32106 0.122 [kube-system-kubernetes-dashboard-443] 10.244.1.18:8443 23 0.047 201
I0408 01:57:47.531839       8 backend_ssl.go:67] adding secret devops/ht-harbor-ingress to the local store

What you expected to happen:
ingress controller should not lose track of the ssl secret.

issue #1004 might be related.
the haproxy-ingress issue seems hinting something: jcmoraisjr/haproxy-ingress#78
How to reproduce it:

This happened spontaneously after running ok for weeks. We really have no idea how to reproduce but similar incident had happened some other times.

Anything else we need to know:

/kind bug

@wu105 wu105 added the kind/bug Categorizes issue or PR as related to a bug. label Apr 8, 2020
@aledbf aledbf removed the kind/bug Categorizes issue or PR as related to a bug. label Apr 8, 2020
@aledbf
Copy link
Member

aledbf commented Apr 8, 2020

Release: 0.14.0

Please update to 0.30.0. The version you are using is almost two years old https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.14.0

@aledbf aledbf closed this as completed Apr 8, 2020
@wu105
Copy link
Author

wu105 commented Apr 8, 2020

will consider upgrade. we stayed at 0.14.0 because upto at least 0.22 we had troubles with ingress tls certs regarding certificate chains. I did search issues to see whether this issue was reported but found none.

@aledbf
Copy link
Member

aledbf commented Apr 8, 2020

will consider upgrade. we stayed at 0.14.0 because upto at least 0.22 we had troubles with ingress tls certs regarding certificate chains

You should upgrade and if this is an issue, open a new one, indicating how to reproduce it, so we can fix it and be available in the next release.

@wu105
Copy link
Author

wu105 commented Apr 9, 2020

We 'helm deleted' then 'helm installed' ingress nginx 0.14.0, and the missing ssl certificate issue returned. Again, modifying the secret from k8s dashboard made it noticed.

The ingress tls secret involved has the following members:
tls.crt contains the entire certificate chain: certificate, intermediate CA cert., and root CA cert.
tls.key
ca.crt the root CA certificate

In the log we have the following when nginx ingress is reinstalled:

I0409 08:59:18.281152       8 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"devops", Name:"ht-harbor-ingress", UID:"8d777e29-e08d-11e9-8c7b-005056b1e9fb", APIVersion:"extensions", ResourceVersion:"62509221", FieldPath:""}): type: 'Normal' reason: 'CREATE' Ingress devops/ht-harbor-ingress
W0409 08:59:18.282039       8 backend_ssl.go:48] error obtaining PEM from secret devops/ht-harbor-ingress: unexpected error creating pem file: failed to verify certificate chain: 
	x509: certificate signed by unknown authority

When the ca.crt member was renamed to caca.crt, the log showed:

2020/04/09 16:37:04 [warn] 317#317: *175 a client request body is buffered to a temporary file /var/lib/nginx/body/0000000001, client: 10.244.5.1, server: dashboard.k8s.nonprod.avaya.com, request: "PUT /api/v1/_raw/secret/namespace/devops/name/ht-harbor-ingress HTTP/2.0", host: "dashboard.k8s.nonprod.avaya.com", referrer: "https://dashboard.k8s.nonprod.avaya.com/"
10.244.5.1 - [10.244.5.1] - - [09/Apr/2020:16:37:04 +0000] "PUT /api/v1/_raw/secret/namespace/devops/name/ht-harbor-ingress HTTP/2.0" 201 23 "https://dashboard.k8s.nonprod.avaya.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36" 32093 0.105 [kube-system-kubernetes-dashboard-443] 10.244.1.18:8443 23 0.030 201
I0409 16:37:04.268479       8 store.go:375] secret devops/ht-harbor-ingress was updated and it is used in ingress annotations. Parsing...
I0409 16:37:04.270241       8 backend_ssl.go:67] adding secret devops/ht-harbor-ingress to the local store
10.244.5.1 - [10.244.5.1] - - [09/Apr/2020:16:37:04 +0000] "GET /api/v1/login/status HTTP/2.0" 200 93 "https://dashboard.k8s.nonprod.avaya.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36" 3930 0.002 [kube-system-kubernetes-dashboard-443] 10.244.1.18:8443 93 0.002 200
10.244.5.1 - [10.244.5.1] - - [09/Apr/2020:16:37:04 +0000] "GET /api/v1/csrftoken/token HTTP/2.0" 200 87 "https://dashboard.k8s.nonprod.avaya.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36" 3933 0.002 [kube-system-kubernetes-dashboard-443] 10.244.1.18:8443 87 0.002 200
10.244.5.1 - [10.244.5.1] - - [09/Apr/2020:16:37:04 +0000] "POST /api/v1/token/refresh HTTP/2.0" 200 1535 "https://dashboard.k8s.nonprod.avaya.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36" 5952 0.011 [kube-system-kubernetes-dashboard-443] 10.244.1.18:8443 1535 0.007 200
10.244.5.1 - [10.244.5.1] - - [09/Apr/2020:16:37:05 +0000] "GET /api/v1/secret/devops/ht-harbor-ingress HTTP/2.0" 200 10654 "https://dashboard.k8s.nonprod.avaya.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36" 3961 0.026 [kube-system-kubernetes-dashboard-443] 10.244.1.18:8443 10701 0.027 200
I0409 16:37:05.662028       8 controller.go:168] backend reload required
I0409 16:37:05.804628       8 controller.go:177] ingress backend successfully reloaded...
10.244.5.1 - [10.244.5.1] - - [09/Apr/2020:16:37:11 +0000] "GET /api/v1/_raw/secret/namespace/devops/name/ht-harbor-ingress HTTP/2.0" 200 10753 "https://dashboard.k8s.nonprod.avaya.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36" 3975 0.041 [kube-system-kubernetes-dashboard-443] 10.244.1.18:8443 10800 0.040 200

The ingress in question starts to get its certificate and starts working.
After rename the caca.crt back to ca.crt, the log showed:

2020/04/09 16:37:22 [warn] 590#590: *203 a client request body is buffered to a temporary file /var/lib/nginx/body/0000000002, client: 10.244.5.1, server: dashboard.k8s.nonprod.avaya.com, request: "PUT /api/v1/_raw/secret/namespace/devops/name/ht-harbor-ingress HTTP/2.0", host: "dashboard.k8s.nonprod.avaya.com", referrer: "https://dashboard.k8s.nonprod.avaya.com/"
I0409 16:37:22.232206       8 store.go:375] secret devops/ht-harbor-ingress was updated and it is used in ingress annotations. Parsing...
W0409 16:37:22.234063       8 backend_ssl.go:48] error obtaining PEM from secret devops/ht-harbor-ingress: unexpected error creating pem file: failed to verify certificate chain: 
	x509: certificate signed by unknown authority
10.244.5.1 - [10.244.5.1] - - [09/Apr/2020:16:37:22 +0000] "PUT /api/v1/_raw/secret/namespace/devops/name/ht-harbor-ingress HTTP/2.0" 201 23 "https://dashboard.k8s.nonprod.avaya.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36" 32108 0.135 [kube-system-kubernetes-dashboard-443] 10.244.1.18:8443 23 0.048 201

Other ingress tls secrets in the same cluster are all ok, with with their tls.crt containing the ceritificate, the intermidate CA certificate, but no root CA certificate, and with no ca.crt.

On other clusters, we have ingress tls.crt members with the entire certificate chain (3 certs) but no ca.crt and they are loaded ok. the other clusters have the the same k8s, helm, and nginx.

Hope the above is sufficient to recreate the issue, may be with the new nginx ingress version.
It is still possible but unlikely that k8s is losing track of the secret.

FYI:

We did upgrade from chart version 0.18.1 to the latest and the upgrade failed, with helm displaying the following:

UPGRADE FAILED
Error: Service "nginx-ingress-controller" is invalid: spec.clusterIP: Invalid value: "": field is immutable && Service "nginx-ingress-default-backend" is invalid: spec.clusterIP: Invalid value: "": field is immutable
Error: UPGRADE FAILED: Service "nginx-ingress-controller" is invalid: spec.clusterIP: Invalid value: "": field is immutable && Service "nginx-ingress-default-backend" is invalid: spec.clusterIP: Invalid value: "": field is immutable

The nginx ingress seems to be running, but ingress on apiserver is not working, probably all other ingresses are not working.

Rolling back is successful, but the nginx controler stayed at the new app version according to the log and the ingresses are not working.

We then deleted the nginx ingress and helm installed the latest, which looks ok except the apiserver ingress does not work, probably neither other ingresses.

Finally we reinstalled the chart version 0.18.1 to restore the service and leave the upgrade to further testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants