Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS binder failing to launch sessions due to SSL certificate problem #183

Closed
scottyhq opened this issue Nov 13, 2020 · 3 comments
Closed

Comments

@scottyhq
Copy link
Member

@rsignell-usgs pointed out you can't launch any sessions on the AWS binder currently. A user sees:
(https://gallery.pangeo.io/repos/pangeo-data/landsat-8-tutorial-gallery/)

Found built image, launching...
Launching server...
Launch attempt 1 failed, retrying...
Launch attempt 2 failed, retrying...
Launch attempt 3 failed, retrying...
Failed to create temporary user for pangeoaccess/binder-pangeo-2ddata-2dlandsat-2d8-2dtutorial-2dgallery-3da528:47a0db9694d764bc7be38401dc8b4e3470a2271d

The binder pod log shows kubectl logs -n prod binder-554f44cf89-cd6pk

[E 201113 21:03:13 launcher:101] Error accessing Hub API (using https://hub.aws-uswest2-binder.pangeo.io/hub/api/users/pangeo-data-lan-utorial-gallery-wy4o8c4e): HTTP 599: SSL certificate problem: certificate has expired
[E 201113 21:03:13 launcher:171] Error creating user pangeo-data-lan-utorial-gallery-wy4o8c4e: HTTP 599: SSL certificate problem: certificate has expired

I have emails from Let's Encrypt Expiry Bot such as:

Your certificate (or certificates) for the names listed below will expire in 10 days (on 12 Nov 20 18:17 +0000). Please make sure to renew your certificate before then, or visitors to your website will encounter errors.

We recommend renewing certificates automatically when they have a third of their
total lifetime left. For Let's Encrypt's current 90-day certificates, that means
renewing 30 days before expiration. See
https://letsencrypt.org/docs/integration-guide/ for details.

hub.aws-uswest2-binder.pangeo.io

I thought that these things got renewed automatically, but either that isn't true or we have some config misplaced.

Perhaps we need to restart the core node periodically? (t3a.xlarge instance that has been running for 207d on a SPOT instance) or just delete the cert-manager pods which have been running for 207 days and they get re-created with new certificates?... Note sure where things stand on unifying HTTPS for binderhub and juptyerhub @consideRatio (ref pangeo-data/jupyter-earth#3)

@scottyhq
Copy link
Member Author

@scottyhq
Copy link
Member Author

well, i don't know why these things disappeared but kubectl get crds did not show anything certmanager related.

So I simply re-ran kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v0.11.0/cert-manager.yaml --validate=false followed by kubectl apply -f k8s-aws/binderhub-issuer-prod.yaml and things seem to be working again.

and kubectl get crds shows:

NAME                                   CREATED AT
certificaterequests.cert-manager.io    2020-11-13T22:41:52Z
certificates.cert-manager.io           2020-11-13T22:41:52Z
challenges.acme.cert-manager.io        2020-11-13T22:41:52Z
clusterissuers.cert-manager.io         2020-11-13T22:41:53Z

@consideRatio
Copy link
Member

@scottyhq nice work getting it functioning. Seems like a bug in cert-manager since it was able to realize you want a cert with the given configuration, but failed to renew it. Perhaps an upgrade to a new version of cert manager is needed - warning: upgrades of cert-manager has been a bit painful in the past due to changes in CRDs etc requiring manual deletion/creation of them etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants