Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest ingress-gce does not work with a new ingress resource #606

Closed
gingerwizard opened this issue Jun 1, 2018 · 36 comments
Closed

Latest ingress-gce does not work with a new ingress resource #606

gingerwizard opened this issue Jun 1, 2018 · 36 comments
Labels
area/acme Indicates a PR directly modifies the ACME Issuer code kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.

Comments

@gingerwizard
Copy link

gingerwizard commented Jun 1, 2018

/kind bug

cert-manager-v0.3.1

**Ingress isn't modified and thus he challenge fails when its routed to the proxy behind the ingress **:

The ingress to be modified, thus causing the challenge to be intercepted:

How to reproduce it (as minimally and precisely as possible):

My stack is ingress->service->nginx-proxy

Ingress as follows, which exists and is bound to static global ip - the DNS entry resolves.

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: kibana
  annotations:
    kubernetes.io/tls-acme: "true"
    kubernetes.io/ingress.class: "gce"
    kubernetes.io/ingress.global-static-ip-name: ${STATIC_IP_NAME}
    certmanager.k8s.io/acme-http01-edit-in-place: "true"
    certmanager.k8s.io/cluster-issuer: "letsencrypt-issuer"
    certmanager.k8s.io/acme-challenge-type: "http01"
  labels:
    app: kibana
spec:
  tls:
  - hosts:
    - ${DOMAIN_NAME}
    secretName: demo-elastic-co
  backend:
    serviceName: nginx-service
    servicePort: 80
  rules:
  - host: ${DOMAIN_NAME}

Service:

---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
  namespace: default
spec:
  type: NodePort
  ports:
  - port: 80
    name: kibana
  selector:
    app: nginx

The site is available over http and resolves.

Issuer

---
apiVersion: certmanager.k8s.io/v1alpha1
kind: Issuer
metadata:
  name: letsencrypt-issuer
  namespace: default
spec:
  acme:
    # The ACME server URL
    server: "https://acme-staging-v02.api.letsencrypt.org/directory"
    # Email address used for ACME registration
    email: "${EMAIL}"
    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt-key
    # Enable the HTTP-01 challenge provider
    http01: {}

Cert

---
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
  name: demo-elastic-co
  namespace: default
spec:
  secretName: demo-elastic-co
  issuerRef:
    name: letsencrypt-issuer
  commonName: ${DOMAIN_NAME}
  dnsNames:
  - ${DOMAIN_NAME}
  acme:
    config:
    - http01:
        ingress: kibana
      domains:
      - ${DOMAIN_NAME}

Creating the issuer and then cert, logs redacted:

0601 17:01:14.034442       1 controller.go:177] certificates controller: syncing item 'default/test-domain-co'
I0601 17:01:14.034567       1 sync.go:239] Preparing certificate default/test-domain-co with issuer
I0601 17:01:14.034582       1 acme.go:159] getting private key (letsencrypt-key->tls.key) for acme issuer default/letsencrypt-issuer
I0601 17:01:14.034925       1 logger.go:27] Calling GetOrder
I0601 17:01:14.187634       1 logger.go:52] Calling GetAuthorization
I0601 17:01:14.259376       1 logger.go:72] Calling HTTP01ChallengeResponse
I0601 17:01:14.259425       1 prepare.go:263] Cleaning up old/expired challenges for Certificate default/test-domain-co
I0601 17:01:14.259447       1 logger.go:47] Calling GetChallenge
I0601 17:01:14.416321       1 helpers.go:162] Found status change for Certificate "test-domain-co" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-06-01 17:01:14.416307907 +0000 UTC m=+7713.231874001
I0601 17:01:14.416360       1 sync.go:241] Error preparing issuer for certificate default/test-domain-co: http-01 self check failed for domain "test-domain.example.co"
E0601 17:01:14.423710       1 sync.go:168] [default/test-domain-co] Error getting certificate 'test-domain-co': secret "test-domain-co" not found
E0601 17:01:14.423762       1 controller.go:186] certificates controller: Re-queuing item "default/test-domain-co" due to error processing: http-01 self check failed for domain "test-domain.example.co"
I0601 17:02:14.424648       1 controller.go:177] certificates controller: syncing item 'default/test-domain-co'
I0601 17:02:14.425671       1 sync.go:239] Preparing certificate default/test-domain-co with issuer
I0601 17:02:14.425781       1 acme.go:159] getting private key (letsencrypt-key->tls.key) for acme issuer default/letsencrypt-issuer
I0601 17:02:14.427512       1 logger.go:27] Calling GetOrder
I0601 17:02:14.610652       1 logger.go:52] Calling GetAuthorization
I0601 17:02:14.713524       1 logger.go:72] Calling HTTP01ChallengeResponse
I0601 17:02:14.713564       1 prepare.go:263] Cleaning up old/expired challenges for Certificate default/test-domain-co
I0601 17:02:14.713582       1 logger.go:47] Calling GetChallenge
I0601 17:02:14.817513       1 helpers.go:162] Found status change for Certificate "test-domain-co" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-06-01 17:02:14.817499697 +0000 UTC m=+7773.633065800
I0601 17:02:14.817550       1 sync.go:241] Error preparing issuer for certificate default/test-domain-co: http-01 self check failed for domain "test-domain.example.co"
E0601 17:02:14.823613       1 sync.go:168] [default/test-domain-co] Error getting certificate 'test-domain-co': secret "test-domain-co" not found
E0601 17:02:14.823647       1 controller.go:186] certificates controller: Re-queuing item "default/test-domain-co" due to error processing: http-01 self check failed for domain "test-domain.example.co"

In he nginx logs for the app i see the challenge - tweaking the proxy to 404,200 or 301 makes no difference.

35.202.242.207, 35.201.81.158 - - [01/Jun/2018:17:01:14 +0000] "GET /.well-known/acme-challenge/5f_2k1u87-xJ1h4xMjNZN7q9nPlVVSfHVwKH9M58UCw HTTP/1.1" 301 610 "" "Go-http-client/1.1"
35.202.242.207, 35.201.81.158 - - [01/Jun/2018:17:02:14 +0000] "GET /.well-known/acme-challenge/5f_2k1u87-xJ1h4xMjNZN7q9nPlVVSfHVwKH9M58UCw HTTP/1.1" 301 610 "" "Go-http-client/1.1"

Anything else we need to know?:

GCE:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-05-12T04:12:12Z", GoVersion:"go1.9.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.2-gke.3", GitCommit:"d2c7a2bd41036f9474287579a725dc54c904e92d", GitTreeState:"clean", BuildDate:"2018-05-23T00:19:39Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration**: GCE on ubuntu
  • Install tools: helm
@gingerwizard
Copy link
Author

I should add i see no modifications append to the ingress at any point

@gingerwizard
Copy link
Author

Adding that i have he following role for tiller

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: tiller-default
  namespace: default
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: tiller-cluster-admin-binding
subjects:
- kind: ServiceAccount
  name: tiller
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io

and installed cert-manager with

helm install --name cert-manager --namespace default stable/cert-manager

@gingerwizard
Copy link
Author

gingerwizard commented Jun 1, 2018

Also, despite the annotations on the ingress cert-manager seems to spawn a http resolver - another pod i assume to intercept the calls rather than modifying the ingress

cm-acme-http-solver-jn2xh 1/1 Running 0 11s

If i install via

helm install --name cert-manager --namespace default stable/cert-manager --set ingressShim.defaultIssuerName=letsencrypt-issuer --set ingressShim.defaultIssuerKind=ClusterIssuer

and deploy just the issuer, i get


I0601 19:35:07.909076       1 controller.go:177] certificates controller: syncing item 'default/demo-elastic-co'
I0601 19:35:07.909183       1 sync.go:239] Preparing certificate default/demo-elastic-co with issuer
I0601 19:35:07.909199       1 acme.go:159] getting private key (letsencrypt-key->tls.key) for acme issuer default/letsencrypt-issuer
I0601 19:35:07.909600       1 logger.go:27] Calling GetOrder
I0601 19:35:08.076256       1 logger.go:52] Calling GetAuthorization
I0601 19:35:08.126714       1 logger.go:72] Calling HTTP01ChallengeResponse
I0601 19:35:08.126773       1 prepare.go:263] Cleaning up old/expired challenges for Certificate default/demo-elastic-co
I0601 19:35:08.126797       1 logger.go:47] Calling GetChallenge
I0601 19:35:08.195021       1 helpers.go:162] Found status change for Certificate "demo-elastic-co" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-06-01 19:35:08.195006168 +0000 UTC m=+109.606525986
I0601 19:35:08.195060       1 sync.go:241] Error preparing issuer for certificate default/demo-elastic-co: http-01 self check failed for domain "demo.elastic.co"
I0601 19:35:08.199715       1 controller.go:152] ingress-shim controller: syncing item 'default/kibana'
I0601 19:35:08.199753       1 sync.go:123] Certificate "demo-elastic-co" for ingress "kibana" already exists
I0601 19:35:08.199765       1 sync.go:126] Certificate "demo-elastic-co" for ingress "kibana" is up to date
I0601 19:35:08.199788       1 controller.go:166] ingress-shim controller: Finished processing work item "default/kibana"
E0601 19:35:08.200398       1 sync.go:168] [default/demo-elastic-co] Error getting certificate 'demo-elastic-co': secret "demo-elastic-co" not found
E0601 19:35:08.200431       1 controller.go:186] certificates controller: Re-queuing item "default/demo-elastic-co" due to error processing: http-01 self check failed for domain "demo.elastic.co"

@chengji77
Copy link

I suspect it's caused by a recent change in k8s:
kubernetes/ingress-gce@d2559d2?utf8=%E2%9C%93&diff=split#diff-3c862eb54a8e0e161b534e0c67e5379eR414

Previously loadbalancer-controller logs the non-exist secret and then proceeds to create the /.well-known/acme-challenge/... path rule in gclb. Once this rule is created, the self check will pass then cert-manager will obtain the new cert and create the secret.

But now the loadbalancer-controller just gets stuck waiting the secret to be created without creating the path rules in gclb.

The walkaround is just to manually create the secret first (must be in valid format, we just copied the secret from our staging cluster). After that cert-manager will do the normal acme workflow and update the k8s secret and gclb.

@thebigredgeek
Copy link

Also running into this :(

@munnerz
Copy link
Member

munnerz commented Jun 5, 2018

@chengji77 thanks very much for digging that out - I had not seen this change 😬

I have made a comment here: kubernetes/ingress-gce#112 (comment)

It's a real shame to see yet another divergence in how ingress controllers behave 😢 this will indeed break both kube-lego and cert-manager when used with GCLB, unless some form of certificate already exists (this must be expired or nearing expiry in order for cert-manager to trigger renewal)

@thebigredgeek
Copy link

I followed the workaround mentioned above and while my ingress is able to pull the manually created (and bogus) secret, I am still getting failed self checks immediately following the pull of the secret.

@munnerz
Copy link
Member

munnerz commented Jun 5, 2018

It's also worth noting that the ACME server will refuse to validate domain ownership by HTTPS (regardless whether it is valid or invalid). You must make sure the challenge endpoint is accessible over http on port 80 (you can test this using curl -vv http://challenge-endpoint......)

@thebigredgeek
Copy link

So what I am hearing is that the ingress must come completely online before the self check will stop failing (and the secret gets re-generated)?

@munnerz munnerz added kind/bug Categorizes issue or PR as related to a bug. area/acme Indicates a PR directly modifies the ACME Issuer code priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Jun 5, 2018
@munnerz
Copy link
Member

munnerz commented Jun 5, 2018

Yes, it will take 10 minutes or so for any changes to your load balancers to be performed anyway.

You will need to supply a certificate that is nearing expiry (within 30d of expiry) the first time you issue a certificate, from what I understand of the problem. It doesn't matter which CA signed the cert, and it can be self signed too.

This will then:

a) cause ingress-gce to serve with the provided, nearing expiry cert.
b) so long as you have not disabled HTTP traffic on your ingress, also cause ingress-gce to serve over port 80
c) cert-manager will see a certificate referencing that secret, and determine it needs to be renewed as it is nearing expiry, and trigger HTTP01 validation
d) cert-manager will edit the ingress resource to include the challenge path
e) ingress-gce will update the LB accordingly
f) after ~10m, the change will be reflected in the GCLB and the self check should pass (as well as the LE validation attempt)

@thebigredgeek
Copy link

Must the ingress controller be completely online before this will pass? I completely nuked the deployment and reapplied (cert, issuer, ingress, etc.. everything except the manually created cert noted as a workaround ) and I am still getting a failed selfcheck

@munnerz
Copy link
Member

munnerz commented Jun 5, 2018

@thebigredgeek you need to be very patient when using GCLB's - they are extremely slow to update.

@thebigredgeek
Copy link

Ok, so if the cert is set to expire in 365 this won't work. So it sounds like i need to create a closer-to-expiry cert

@munnerz
Copy link
Member

munnerz commented Jun 5, 2018

Ok, so if the cert is set to expire in 365 this won't work. So it sounds like i need to create a closer-to-expiry cert

Correct - FWIW, this is not how we expect users to use cert-manager with GCLB ingresses, and is a regression caused by kubernetes/ingress-gce#112 that we need to fix 😄

@thebigredgeek
Copy link

Yeah I saw that. No worries, and thanks for your help so far!

@thebigredgeek
Copy link

If i create a cert that is already expired, that should work too yeah?

@chengji77
Copy link

@thebigredgeek What I did was to use a cert from our staging environment, which doesn't contain the domain for prod. This also triggered cert-manager to renew the cert.

@thebigredgeek
Copy link

@munnerz seems to still be happening 30 minutes later :(. It shouldn't take this long to self correct, should it?

@munnerz
Copy link
Member

munnerz commented Jun 5, 2018

@thebigredgeek nope - are you on Kubernetes slack? Can you send over your full cert-manager logs, as well as the output of kubectl describe issuer,clusterissuer,certificate --all-namespaces?

We can then update this issue if we come to a resolution that's relevant 😄

@thebigredgeek
Copy link

Sure, I’ll hop on tomorrow (I’m US pacific)

@munnerz munnerz changed the title http self check fails as ingress is not modified Latest ingress-gce does not work with a new ingress resource Jun 5, 2018
@thebigredgeek
Copy link

Just slacked you on k8s

@thebigredgeek
Copy link

Any traction here? Still trying to figure out how to make this work

@paolomainardi
Copy link

same problem here, we are using the manually created tls secret for now.

@santinoncs
Copy link

same problem ! working with 1.9.7 GKE cluster

@paolomainardi
Copy link

are there any updates here ?

@sjahreis
Copy link

got the same problem! we're using GKE 1.10.2-gke.3

@kjarri
Copy link

kjarri commented Jun 13, 2018

I had this same issue, or it at least behaved the same - secret was not being created.

What ended up working for me was the same as mentioned above. I generated a self-signed certificate expiring in one day and manually created the secret.

openssl req -x509 -nodes -days 1 -newkey rsa:2048 -keyout /tmp/tls.key -out /tmp/tls.crt -subj "/CN=subdomain.example.com"

kubectl create secret tls my-secret --key /tmp/tls.key --cert /tmp/tls.crt

After this, cert-manager successfully issued a new certificate.

@RobinUS2
Copy link

We're having the same issue on GKE cluster at 1.10.4-gke.2 the generation of the certificate above worked although it's far from ideal for obvious reasons.

@farahabdi
Copy link

having these problems also

@munnerz
Copy link
Member

munnerz commented Jul 4, 2018

I have opened kubernetes/ingress-gce#388 which will fix this issue.

As another alternative - for now, ingress-gce users can manually specify a Certificate resource. You will need to exclude the TLS section from your Certificate whilst this is provisioning, but once done, you should be good to add it back in referencing the newly created Secret.

@marekr
Copy link

marekr commented Jul 14, 2018

@munnerz

I just encountered this issue using the latest stable/ingress-nginx and stable/cert-manager charts as I
write this.

I had to precreate the cert just like for GCE above to make it work with ingress-nginx, other nginx spent all day whining the secret didn't exist which prevented cert-manager from doing its job.

Did ingress-nginx copy GCE?

W0714 01:16:02.171068       5 controller.go:1020] ssl certificate "dev/web-api-tls-secret" does not exist in local store
10.1.0.4 - [10.1.0.4] - - [14/Jul/2018:01:16:09 +0000] "GET /.well-known/acme-challenge/XXXXXXXXXXXXXXXXXXXXXX-7mmcN-FqwuKc HTTP/1.1" 404 62 "-" "Go-http-client/1.1" 176 0.001 [dev-web-api-service-80] 10.1.0.26:80 31 0.000 404 c0fe1ea8344e36c93c89e6a23cdf9f30
10.1.0.4 - [10.1.0.4] - - [14/Jul/2018:01:16:13 +0000] "GET /.well-known/acme-challenge/XXXXXXXXXXXXXXXXXXXXXX-7mmcN-FqwuKc HTTP/1.1" 404 62 "-" "Go-http-client/1.1" 176 0.001 [dev-web-api-service-80] 10.1.0.25:80 31 0.000 404 f9b367bdfa4df7aa4a2f5f6876f924df
10.1.0.4 - [10.1.0.4] - - [14/Jul/2018:01:16:30 +0000] "GET /.well-known/acme-challenge/XXXXXXXXXXXXXXXXXXXXXX-7mmcN-FqwuKc HTTP/1.1" 404 62 "-" "Go-http-client/1.1" 176 0.001 [dev-web-api-service-80] 10.1.0.7:80 31 0.004 404 51223ce2ad35371671d30a247602259

The moment I created the secret

0714 01:17:56.903217       5 store.go:348] secret dev/web-api-tls-secret was added and it is used in ingress annotations. Parsing...
I0714 01:17:56.904870       5 backend_ssl.go:69] adding secret dev/web-api-tls-secret to the local store
I0714 01:17:57.226319       5 controller.go:177] ingress backend successfully reloaded...
I0714 01:17:58.864434       5 backend_ssl.go:181] updating local copy of ssl certificate dev/web-api-tls-secret with missing intermediate CA certs
I0714 01:18:00.192565       5 controller.go:168] backend reload required
I0714 01:18:00.302296       5 controller.go:177] ingress backend successfully reloaded...
10.1.0.4 - [10.1.0.4] - - [14/Jul/2018:01:18:14 +0000] "GET /.well-known/acme-challenge/XXXXXXXXXXXXXXXXXXXXXX-7mmcN-FqwuKc HTTP/1.1" 200 87 "-" "Go-http-client/1.1" 176 0.026 [dev-cm-acme-http-solver-h49ts-8089] 10.1.0.15:8089 87 0.024 200 489ee18372ea0fe577641c8ace44565b

ingress-nginx was however generating a self-signed cert, it just seems to completely dropped routing the acme challenge while it was busy throwing a fit over the missing secret

@Mexxerio
Copy link

Mexxerio commented Aug 24, 2018

I tried creating the secret manually like mentioned above but it still doesn't work for me.

All I'm getting is:

I0824 15:17:47.531456       1 controller.go:181] certificates controller: syncing item 'default/domain-production-tls-ipv4'
I0824 15:17:47.531976       1 sync.go:242] Preparing certificate default/domain-production-tls-ipv4 with issuer
I0824 15:17:47.532002       1 acme.go:169] getting private key (letsencrypt-prod->tls.key) for acme issuer kube-system/letsencrypt-prod
I0824 15:17:47.532607       1 logger.go:27] Calling GetOrder
I0824 15:17:47.777548       1 logger.go:57] Calling GetAuthorization
I0824 15:17:47.953649       1 logger.go:77] Calling HTTP01ChallengeResponse
I0824 15:17:47.953793       1 prepare.go:263] Cleaning up old/expired challenges for Certificate default/domain-production-tls-ipv4
I0824 15:17:47.953839       1 logger.go:52] Calling GetChallenge
I0824 15:17:48.148411       1 helpers.go:188] Found status change for Certificate "domain-production-tls-ipv4" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-08-24 15:17:48.148395361 +0000 UTC m=+1936.846873368
I0824 15:17:48.148908       1 sync.go:244] Error preparing issuer for certificate default/domain-production-tls-ipv4: http-01 self check failed for domain "domain.com"
I0824 15:17:48.149264       1 sync.go:174] Certificate default/domain-production-tls-ipv4 scheduled for renewal in -696 hours
E0824 15:17:48.156534       1 controller.go:190] certificates controller: Re-queuing item "default/domain-production-tls-ipv4" due to error processing: http-01 self check failed for domain "domain.com"

EDIT:
After letting it run for a few hours I received the certificate. Phew, thank god. I almost gave up.

@brettcurtis
Copy link

This is working now on 1.10.7-gke.2, You'll see a warming now in your service:

Could not find TLS certificates. Continuing setup for the load balancer to serve HTTP. Note: this behavior is deprecated and will be removed in a future version of ingress-gce

@munnerz
Copy link
Member

munnerz commented Sep 25, 2018 via email

@jetstack-bot
Copy link
Contributor

@munnerz: Closing this issue.

In response to this:

Awesome, thanks for confirming it has rolled out!

I'm going to close this issue now then as the issue is resolved.

We'll soon be in a better position to workaround this limitation in future
from our end too, to avoid the deprecated behaviour warning.

/close

On Tue, 25 Sep 2018 at 16:56, Brett Curtis notifications@github.com wrote:

This is working now on 1.10.7-gke.2, You'll see a warming now in your
service:

Could not find TLS certificates. Continuing setup for the load balancer to
serve HTTP. Note: this behavior is deprecated and will be removed in a
future version of ingress-gce


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#606 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAMbPzFWfxhUNegJqAHp-z6ns5aHHo13ks5uelIwgaJpZM4UXCIT
.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@atb00ker
Copy link

I am seeing the warning as well, is there any open issue for this that I can subscribe for updates on it? 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/acme Indicates a PR directly modifies the ACME Issuer code kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

No branches or pull requests