AWS binder failing to launch sessions due to SSL certificate problem #183

scottyhq · 2020-11-13T21:36:29Z

@rsignell-usgs pointed out you can't launch any sessions on the AWS binder currently. A user sees:
(https://gallery.pangeo.io/repos/pangeo-data/landsat-8-tutorial-gallery/)

Found built image, launching...
Launching server...
Launch attempt 1 failed, retrying...
Launch attempt 2 failed, retrying...
Launch attempt 3 failed, retrying...
Failed to create temporary user for pangeoaccess/binder-pangeo-2ddata-2dlandsat-2d8-2dtutorial-2dgallery-3da528:47a0db9694d764bc7be38401dc8b4e3470a2271d

The binder pod log shows kubectl logs -n prod binder-554f44cf89-cd6pk

[E 201113 21:03:13 launcher:101] Error accessing Hub API (using https://hub.aws-uswest2-binder.pangeo.io/hub/api/users/pangeo-data-lan-utorial-gallery-wy4o8c4e): HTTP 599: SSL certificate problem: certificate has expired
[E 201113 21:03:13 launcher:171] Error creating user pangeo-data-lan-utorial-gallery-wy4o8c4e: HTTP 599: SSL certificate problem: certificate has expired

I have emails from Let's Encrypt Expiry Bot such as:

Your certificate (or certificates) for the names listed below will expire in 10 days (on 12 Nov 20 18:17 +0000). Please make sure to renew your certificate before then, or visitors to your website will encounter errors.

We recommend renewing certificates automatically when they have a third of their
total lifetime left. For Let's Encrypt's current 90-day certificates, that means
renewing 30 days before expiration. See
https://letsencrypt.org/docs/integration-guide/ for details.

hub.aws-uswest2-binder.pangeo.io

I thought that these things got renewed automatically, but either that isn't true or we have some config misplaced.

Perhaps we need to restart the core node periodically? (t3a.xlarge instance that has been running for 207d on a SPOT instance) or just delete the cert-manager pods which have been running for 207 days and they get re-created with new certificates?... Note sure where things stand on unifying HTTPS for binderhub and juptyerhub @consideRatio (ref pangeo-data/jupyter-earth#3)

The text was updated successfully, but these errors were encountered:

scottyhq · 2020-11-13T21:42:44Z

initial cert-manager setup documented here https://github.com/pangeo-data/pangeo-binder/tree/staging/k8s-aws#set-up-https-httpsbinderhubreadthedocsioenlatesthttpshtml

scottyhq · 2020-11-13T22:49:31Z

well, i don't know why these things disappeared but kubectl get crds did not show anything certmanager related.

So I simply re-ran kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v0.11.0/cert-manager.yaml --validate=false followed by kubectl apply -f k8s-aws/binderhub-issuer-prod.yaml and things seem to be working again.

and kubectl get crds shows:

NAME                                   CREATED AT
certificaterequests.cert-manager.io    2020-11-13T22:41:52Z
certificates.cert-manager.io           2020-11-13T22:41:52Z
challenges.acme.cert-manager.io        2020-11-13T22:41:52Z
clusterissuers.cert-manager.io         2020-11-13T22:41:53Z

consideRatio · 2020-11-13T23:08:17Z

@scottyhq nice work getting it functioning. Seems like a bug in cert-manager since it was able to realize you want a cert with the given configuration, but failed to renew it. Perhaps an upgrade to a new version of cert manager is needed - warning: upgrades of cert-manager has been a bit painful in the past due to changes in CRDs etc requiring manual deletion/creation of them etc

scottyhq closed this as completed Nov 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS binder failing to launch sessions due to SSL certificate problem #183

AWS binder failing to launch sessions due to SSL certificate problem #183

scottyhq commented Nov 13, 2020

scottyhq commented Nov 13, 2020

scottyhq commented Nov 13, 2020

consideRatio commented Nov 13, 2020

AWS binder failing to launch sessions due to SSL certificate problem #183

AWS binder failing to launch sessions due to SSL certificate problem #183

Comments

scottyhq commented Nov 13, 2020

scottyhq commented Nov 13, 2020

scottyhq commented Nov 13, 2020

consideRatio commented Nov 13, 2020