[documentation] Document deployment on existing AWS EKS cluster #942

iameskild · 2021-11-24T16:43:42Z

Related to #935.

To test and document how to deploy to an existing ("local") EKS cluster, I ran through the following steps:

Use (create) base EKS cluster

To get a functioning EKS cluster up and running quickly, I created a cluster and web app based on this tutorial. This cluster is running in it's own VPC with 3 subnets (each in it's own AZ) and there are no node groups. A scenario like seemed like a good place to start from the perspective of an incoming user.

Once this EKS cluster is up, there are still a handful of steps that seem to be required before QHub can be deployed to it:

Ensure that the subnets are allowed to "automatically assign public IP addresses to instances launched into it" otherwise node group can't be launched
Create general, user and worker node groups
- Attach Node IAM Role with specific permissions (copied from existing role from previous qhub deployment):
- Configure node group being mindful of instance size, attached block storage size and auto-scaling features.

I'm sure there are scenarios where there already exists node groups and they can be repurposed but more broadly it would be nice to make this process a lot more streamlined. Did I overcomplicate this, or are there other ways of handling the QHub deployment without having to add these node groups explicitly?

Deploy QHub to Existing EKS Cluster

Ensure that you are using the existing cluster's kubectl context.

Initialize in the usual manner:

python -m qhub init aws --project eaeexisting --domain eaeexisting.qhub.dev --ci-provider github-actions --auth-provider github --auth-auto-provision --repository github.com/iameskild/eaeaws

Then update the qhub-config.yaml file. The important keys to update are:

Replace provider: aws with provider: local
Replace amazon_web_services with local
- And update the node_selector and kube_context appropriately

Once updated, deploy in the usual manner:

python -m qhub deploy --config qhub-config.yaml --disable-prompt --dns-provider cloudflare --dns-auto-provision

The deployment completes successfully and all the pods appear to be running (alongside the existing pods from the web app). The issue is that I can't access the cluster from the browser:

404 page not found

When examining the print statement from the deployment more, you can see that the cluster doesn't have an IP address:

[terraform]: ingress_jupyter = {
[terraform]:   "hostname" = "aea1abf087211438cbf9e44ef5fb64c3-197330438.us-east-2.elb.amazonaws.com"
[terraform]:   "ip" = ""
[terraform]:

`qhub-config.yaml`

project_name: eaeexisting
provider: local
domain: eaeexisting.qhub.dev
certificate:
  type: self-signed
security:
  authentication:
    type: GitHub
    config:
      client_id: 
      client_secret:
      oauth_callback_url: https://eaeexisting.qhub.dev/hub/oauth_callback
  users:
    iameskild:
      uid: 1000
      primary_group: admin
      secondary_groups:
      - users
  groups:
    users:
      gid: 100
    admin:
      gid: 101
default_images:
  jupyterhub: quansight/qhub-jupyterhub:v0.3.13
  jupyterlab: quansight/qhub-jupyterlab:v0.3.13
  dask_worker: quansight/qhub-dask-worker:v0.3.13
  dask_gateway: quansight/qhub-dask-gateway:v0.3.13
  conda_store: quansight/qhub-conda-store:v0.3.13
storage:
  conda_store: 60Gi
  shared_filesystem: 100Gi
theme:
  jupyterhub:
    hub_title: QHub - eaeexisting
    hub_subtitle: Autoscaling Compute Environment on Amazon Web Services
    welcome: Welcome to eaeexisting.qhub.dev. It is maintained by <a href="http://quansight.com">Quansight
      staff</a>. The hub's configuration is stored in a github repository based on
      <a href="https://github.com/Quansight/qhub/">https://github.com/Quansight/qhub/</a>.
      To provide feedback and report any technical problems, please use the <a href="https://github.com/Quansight/qhub/issues">github
      issue tracker</a>.
    logo: /hub/custom/images/jupyter_qhub_logo.svg
    primary_color: '#4f4173'
    secondary_color: '#957da6'
    accent_color: '#32C574'
    text_color: '#111111'
    h1_color: '#652e8e'
    h2_color: '#652e8e'
monitoring:
  enabled: true
cdsdashboards:
  enabled: true
  cds_hide_user_named_servers: true
  cds_hide_user_dashboard_servers: false
ci_cd:
  type: github-actions
  branch: main
terraform_state:
  type: remote
namespace: dev
local:
  kube_context: arn:aws:eks:us-east-2:892486800165:cluster/eaeeks
  node_selectors:
    general:
      key: eks.amazonaws.com/nodegroup
      value: general
    user:
      key: eks.amazonaws.com/nodegroup
      value: user
    worker:
      key: eks.amazonaws.com/nodegroup
      value: worker
profiles:
  jupyterlab:
  - display_name: Small Instance
    description: Stable environment with 1 cpu / 4 GB ram
    default: true
    kubespawner_override:
      cpu_limit: 1
      cpu_guarantee: 0.75
      mem_limit: 4G
      mem_guarantee: 2.5G
      image: quansight/qhub-jupyterlab:v0.3.13
  - display_name: Medium Instance
    description: Stable environment with 2 cpu / 8 GB ram
    kubespawner_override:
      cpu_limit: 2
      cpu_guarantee: 1.5
      mem_limit: 8G
      mem_guarantee: 5G
      image: quansight/qhub-jupyterlab:v0.3.13
  dask_worker:
    Small Worker:
      worker_cores_limit: 1
      worker_cores: 0.75
      worker_memory_limit: 4G
      worker_memory: 2.5G
      worker_threads: 1
      image: quansight/qhub-dask-worker:v0.3.13
    Medium Worker:
      worker_cores_limit: 2
      worker_cores: 1.5
      worker_memory_limit: 8G
      worker_memory: 5G
      worker_threads: 2
      image: quansight/qhub-dask-worker:v0.3.13
environments:
  environment-dask.yaml:
    name: dask
    channels:
    - conda-forge
    dependencies:
    - python
    - ipykernel
    - ipywidgets
    - qhub-dask ==0.3.13
    - python-graphviz
    - numpy
    - numba
    - pandas
  environment-dashboard.yaml:
    name: dashboard
    channels:
    - conda-forge
    dependencies:
    - python==3.9.7
    - ipykernel==6.4.1
    - ipywidgets==7.6.5
    - qhub-dask==0.3.13
    - param==1.11.1
    - python-graphviz==0.17
    - matplotlib==3.4.3
    - panel==0.12.4
    - voila==0.2.16
    - streamlit==1.0.0
    - dash==2.0.0
    - cdsdashboards-singleuser==0.5.7

The text was updated successfully, but these errors were encountered:

iameskild · 2021-11-24T16:47:41Z

@viniciusdc would you mind taking a look at this to see if I missed anything? And could you share any qhub-config.yaml that successfully deployed on an existing cluster? Thanks a lot :)

iameskild · 2021-11-24T17:02:32Z

Now that I think of it, this is most likely caused by the fact that this existing web app already has an EXTERNAL-IP set. I will attempt this again with an existing cluster that doesn't already have a public facing IP/ingress.

viniciusdc · 2021-11-25T15:01:33Z

@viniciusdc would you mind taking a look at this to see if I missed anything? And could you share any qhub-config.yaml that successfully deployed on an existing cluster? Thanks a lot :)

Hi @iameskild, the only qhub-config that I have is for a GCP deployment. The only difference from yours (besides the provider) it that we needed to set the load-balancer configuration to an internal one, but that's because some security policies

iameskild · 2021-11-25T18:29:04Z

Hey @viniciusdc, how did you provision the DNS? From looking reading through the code base, it appears that when deploying to a local (existing) cluster, the update_record for CloudFlare is skipped altogether:
https://github.com/Quansight/qhub/blob/c0d08bbcc08816475bf26466e2d64f9daf03164e/qhub/deploy.py#L108-L119

And that what I see when I deploy:

INFO:qhub.deploy:Couldn't update the DNS record for cloud provider: local

This explains why I can't access the cluster.

iameskild · 2021-11-25T18:38:08Z

I was able to get around this by updating the DNS record manually in the CloudFlare portal 👍

viniciusdc · 2021-11-26T12:31:39Z

Hey @viniciusdc, how did you provision the DNS? From looking reading through the code base, it appears that when deploying to a local (existing) cluster, the update_record for CloudFlare is skipped altogether:

https://github.com/Quansight/qhub/blob/c0d08bbcc08816475bf26466e2d64f9daf03164e/qhub/deploy.py#L108-L119

And that what I see when I deploy:
INFO:qhub.deploy:Couldn't update the DNS record for cloud provider: local
This explains why I can't access the cluster.

You can work around that, providing the DNS records manually in the namespace right? by providing the certificate's secrets... (I am not sure)

iameskild · 2021-11-26T18:31:34Z

I noticed that a few minutes after posting this 😆 Thanks @viniciusdc

In the future, it might be nice if users with existing clusters can have their DNS recorded auto provisioned as well. Some changes to this part of the code could include a check for which cloud provider they are using and update_record appropriately.

iameskild added needs: discussion 💬 Needs discussion with the rest of the team area: documentation 📖 Improvements or additions to documentation labels Nov 24, 2021

iameskild mentioned this issue Nov 25, 2021

Add docs for deploying QHub to existing EKS cluster #944

Merged

12 tasks

phobson mentioned this issue Nov 29, 2021

Help Installing QHub on existing EKS (AWS) Cluster & Missing Local Testing Docs. #935

Closed

tylerpotts closed this as completed in #944 Dec 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[documentation] Document deployment on existing AWS EKS cluster #942

[documentation] Document deployment on existing AWS EKS cluster #942

iameskild commented Nov 24, 2021 •

edited

Loading

iameskild commented Nov 24, 2021 •

edited

Loading

iameskild commented Nov 24, 2021

viniciusdc commented Nov 25, 2021

iameskild commented Nov 25, 2021

iameskild commented Nov 25, 2021

viniciusdc commented Nov 26, 2021

iameskild commented Nov 26, 2021 •

edited

Loading

[documentation] Document deployment on existing AWS EKS cluster #942

[documentation] Document deployment on existing AWS EKS cluster #942

Comments

iameskild commented Nov 24, 2021 • edited Loading

Use (create) base EKS cluster

Deploy QHub to Existing EKS Cluster

qhub-config.yaml

iameskild commented Nov 24, 2021 • edited Loading

iameskild commented Nov 24, 2021

viniciusdc commented Nov 25, 2021

iameskild commented Nov 25, 2021

iameskild commented Nov 25, 2021

viniciusdc commented Nov 26, 2021

iameskild commented Nov 26, 2021 • edited Loading

iameskild commented Nov 24, 2021 •

edited

Loading

`qhub-config.yaml`

iameskild commented Nov 24, 2021 •

edited

Loading

iameskild commented Nov 26, 2021 •

edited

Loading