ingress-nginx controller lose ssl certificate #5337

wu105 · 2020-04-08T06:18:34Z

NGINX Ingress controller version:

kubectl logs pod/nginx-ingress-controller-78465dcf9d-gvg7m -n nginx-ingress | head :
NGINX Ingress controller
Release: 0.14.0
Build: git-734361d
Repository: https://github.com/kubernetes/ingress-nginx

Kubernetes version (use kubectl version):

v1.12.7+1.2.3.el7

Environment:

Cloud provider or hardware configuration:

VMWare instance

OS (e.g. from /etc/os-release):
Oracle Linux Server 7.6
Kernel (e.g. uname -a):
4.14.35-1902.7.3.1.el7uek.x86_64
Install tools:
Oracle tools for HA kubernetes cluster, 2019 release
Others:

What happened:

ingress controller started log the following on a ingress after working fine for many weeks:

W0408 01:40:21.454530       8 controller.go:1020] ssl certificate "devops/ht-harbor-ingress" does not exist in local store

The ingress url stopped working, apparently serving the certificates of the default backend instead.

Restarting the ingress controller by deleting the pod does not help.

However, editing the secret on the kubernetes dashboard made it to be noticed by the ingress controller again, and the ingress controller would log the following:

I0408 01:57:47.530039       8 store.go:375] secret devops/ht-harbor-ingress was updated and it is used in ingress annotations. Parsing...
10.244.5.1 - [10.244.5.1] - - [08/Apr/2020:01:57:47 +0000] "PUT /api/v1/_raw/secret/namespace/devops/name/ht-harbor-ingress HTTP/2.0" 201 23 "https://dashboard.k8s.nonprod.avaya.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36" 32106 0.122 [kube-system-kubernetes-dashboard-443] 10.244.1.18:8443 23 0.047 201
I0408 01:57:47.531839       8 backend_ssl.go:67] adding secret devops/ht-harbor-ingress to the local store

What you expected to happen:
ingress controller should not lose track of the ssl secret.

issue #1004 might be related.
the haproxy-ingress issue seems hinting something: jcmoraisjr/haproxy-ingress#78
How to reproduce it:

This happened spontaneously after running ok for weeks. We really have no idea how to reproduce but similar incident had happened some other times.

Anything else we need to know:

/kind bug

The text was updated successfully, but these errors were encountered:

aledbf · 2020-04-08T12:33:46Z

Release: 0.14.0

Please update to 0.30.0. The version you are using is almost two years old https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.14.0

wu105 · 2020-04-08T14:43:19Z

will consider upgrade. we stayed at 0.14.0 because upto at least 0.22 we had troubles with ingress tls certs regarding certificate chains. I did search issues to see whether this issue was reported but found none.

aledbf · 2020-04-08T14:56:13Z

will consider upgrade. we stayed at 0.14.0 because upto at least 0.22 we had troubles with ingress tls certs regarding certificate chains

You should upgrade and if this is an issue, open a new one, indicating how to reproduce it, so we can fix it and be available in the next release.

wu105 · 2020-04-09T21:48:49Z

We 'helm deleted' then 'helm installed' ingress nginx 0.14.0, and the missing ssl certificate issue returned. Again, modifying the secret from k8s dashboard made it noticed.

The ingress tls secret involved has the following members:
tls.crt contains the entire certificate chain: certificate, intermediate CA cert., and root CA cert.
tls.key
ca.crt the root CA certificate

In the log we have the following when nginx ingress is reinstalled:

I0409 08:59:18.281152       8 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"devops", Name:"ht-harbor-ingress", UID:"8d777e29-e08d-11e9-8c7b-005056b1e9fb", APIVersion:"extensions", ResourceVersion:"62509221", FieldPath:""}): type: 'Normal' reason: 'CREATE' Ingress devops/ht-harbor-ingress
W0409 08:59:18.282039       8 backend_ssl.go:48] error obtaining PEM from secret devops/ht-harbor-ingress: unexpected error creating pem file: failed to verify certificate chain: 
	x509: certificate signed by unknown authority

When the ca.crt member was renamed to caca.crt, the log showed:

2020/04/09 16:37:04 [warn] 317#317: *175 a client request body is buffered to a temporary file /var/lib/nginx/body/0000000001, client: 10.244.5.1, server: dashboard.k8s.nonprod.avaya.com, request: "PUT /api/v1/_raw/secret/namespace/devops/name/ht-harbor-ingress HTTP/2.0", host: "dashboard.k8s.nonprod.avaya.com", referrer: "https://dashboard.k8s.nonprod.avaya.com/"
10.244.5.1 - [10.244.5.1] - - [09/Apr/2020:16:37:04 +0000] "PUT /api/v1/_raw/secret/namespace/devops/name/ht-harbor-ingress HTTP/2.0" 201 23 "https://dashboard.k8s.nonprod.avaya.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36" 32093 0.105 [kube-system-kubernetes-dashboard-443] 10.244.1.18:8443 23 0.030 201
I0409 16:37:04.268479       8 store.go:375] secret devops/ht-harbor-ingress was updated and it is used in ingress annotations. Parsing...
I0409 16:37:04.270241       8 backend_ssl.go:67] adding secret devops/ht-harbor-ingress to the local store
10.244.5.1 - [10.244.5.1] - - [09/Apr/2020:16:37:04 +0000] "GET /api/v1/login/status HTTP/2.0" 200 93 "https://dashboard.k8s.nonprod.avaya.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36" 3930 0.002 [kube-system-kubernetes-dashboard-443] 10.244.1.18:8443 93 0.002 200
10.244.5.1 - [10.244.5.1] - - [09/Apr/2020:16:37:04 +0000] "GET /api/v1/csrftoken/token HTTP/2.0" 200 87 "https://dashboard.k8s.nonprod.avaya.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36" 3933 0.002 [kube-system-kubernetes-dashboard-443] 10.244.1.18:8443 87 0.002 200
10.244.5.1 - [10.244.5.1] - - [09/Apr/2020:16:37:04 +0000] "POST /api/v1/token/refresh HTTP/2.0" 200 1535 "https://dashboard.k8s.nonprod.avaya.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36" 5952 0.011 [kube-system-kubernetes-dashboard-443] 10.244.1.18:8443 1535 0.007 200
10.244.5.1 - [10.244.5.1] - - [09/Apr/2020:16:37:05 +0000] "GET /api/v1/secret/devops/ht-harbor-ingress HTTP/2.0" 200 10654 "https://dashboard.k8s.nonprod.avaya.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36" 3961 0.026 [kube-system-kubernetes-dashboard-443] 10.244.1.18:8443 10701 0.027 200
I0409 16:37:05.662028       8 controller.go:168] backend reload required
I0409 16:37:05.804628       8 controller.go:177] ingress backend successfully reloaded...
10.244.5.1 - [10.244.5.1] - - [09/Apr/2020:16:37:11 +0000] "GET /api/v1/_raw/secret/namespace/devops/name/ht-harbor-ingress HTTP/2.0" 200 10753 "https://dashboard.k8s.nonprod.avaya.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36" 3975 0.041 [kube-system-kubernetes-dashboard-443] 10.244.1.18:8443 10800 0.040 200

The ingress in question starts to get its certificate and starts working.
After rename the caca.crt back to ca.crt, the log showed:

2020/04/09 16:37:22 [warn] 590#590: *203 a client request body is buffered to a temporary file /var/lib/nginx/body/0000000002, client: 10.244.5.1, server: dashboard.k8s.nonprod.avaya.com, request: "PUT /api/v1/_raw/secret/namespace/devops/name/ht-harbor-ingress HTTP/2.0", host: "dashboard.k8s.nonprod.avaya.com", referrer: "https://dashboard.k8s.nonprod.avaya.com/"
I0409 16:37:22.232206       8 store.go:375] secret devops/ht-harbor-ingress was updated and it is used in ingress annotations. Parsing...
W0409 16:37:22.234063       8 backend_ssl.go:48] error obtaining PEM from secret devops/ht-harbor-ingress: unexpected error creating pem file: failed to verify certificate chain: 
	x509: certificate signed by unknown authority
10.244.5.1 - [10.244.5.1] - - [09/Apr/2020:16:37:22 +0000] "PUT /api/v1/_raw/secret/namespace/devops/name/ht-harbor-ingress HTTP/2.0" 201 23 "https://dashboard.k8s.nonprod.avaya.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36" 32108 0.135 [kube-system-kubernetes-dashboard-443] 10.244.1.18:8443 23 0.048 201

Other ingress tls secrets in the same cluster are all ok, with with their tls.crt containing the ceritificate, the intermidate CA certificate, but no root CA certificate, and with no ca.crt.

On other clusters, we have ingress tls.crt members with the entire certificate chain (3 certs) but no ca.crt and they are loaded ok. the other clusters have the the same k8s, helm, and nginx.

Hope the above is sufficient to recreate the issue, may be with the new nginx ingress version.
It is still possible but unlikely that k8s is losing track of the secret.

FYI:

We did upgrade from chart version 0.18.1 to the latest and the upgrade failed, with helm displaying the following:

UPGRADE FAILED
Error: Service "nginx-ingress-controller" is invalid: spec.clusterIP: Invalid value: "": field is immutable && Service "nginx-ingress-default-backend" is invalid: spec.clusterIP: Invalid value: "": field is immutable
Error: UPGRADE FAILED: Service "nginx-ingress-controller" is invalid: spec.clusterIP: Invalid value: "": field is immutable && Service "nginx-ingress-default-backend" is invalid: spec.clusterIP: Invalid value: "": field is immutable

The nginx ingress seems to be running, but ingress on apiserver is not working, probably all other ingresses are not working.

Rolling back is successful, but the nginx controler stayed at the new app version according to the log and the ingresses are not working.

We then deleted the nginx ingress and helm installed the latest, which looks ok except the apiserver ingress does not work, probably neither other ingresses.

Finally we reinstalled the chart version 0.18.1 to restore the service and leave the upgrade to further testing.

wu105 added the kind/bug Categorizes issue or PR as related to a bug. label Apr 8, 2020

aledbf removed the kind/bug Categorizes issue or PR as related to a bug. label Apr 8, 2020

aledbf closed this as completed Apr 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ingress-nginx controller lose ssl certificate #5337

ingress-nginx controller lose ssl certificate #5337

wu105 commented Apr 8, 2020

aledbf commented Apr 8, 2020

wu105 commented Apr 8, 2020

aledbf commented Apr 8, 2020

wu105 commented Apr 9, 2020

ingress-nginx controller lose ssl certificate #5337

ingress-nginx controller lose ssl certificate #5337

Comments

wu105 commented Apr 8, 2020

aledbf commented Apr 8, 2020

wu105 commented Apr 8, 2020

aledbf commented Apr 8, 2020

wu105 commented Apr 9, 2020