Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Timeout waiting for module.kubernetes-keycloak-helm.helm_release.keycloak #1491

Closed
abarciauskas-bgse opened this issue Oct 12, 2022 · 5 comments
Labels
area: integration/keycloak area: terraform 💾 needs: follow-up 📫 Someone needs to get back to this issue or PR type: bug 🐛 Something isn't working

Comments

@abarciauskas-bgse
Copy link

OS system and architecture in which you are running QHub

macOS Catalina 10.15.7

Expected behavior

Successful deployment of stage 05-kubernetes-keycloak

Actual behavior

Upon repeat deployments, I am getting this error:

[terraform]: │ Error: timed out waiting for the condition
[terraform]: │ 
[terraform]: │   with module.kubernetes-keycloak-helm.helm_release.keycloak,
[terraform]: │   on modules/kubernetes/keycloak-helm/main.tf line 1, in resource "helm_release" "keycloak":
[terraform]: │    1: resource "helm_release" "keycloak" {

from this change:

[terraform]:   # module.kubernetes-keycloak-helm.helm_release.keycloak will be updated in-place
[terraform]:   ~ resource "helm_release" "keycloak" {
[terraform]:         id                         = "keycloak"
[terraform]:         name                       = "keycloak"
[terraform]:       ~ status                     = "failed" -> "deployed"
[terraform]:         # (26 unchanged attributes hidden)
[terraform]: 
[terraform]:         set {
[terraform]:           # At least one attribute in this block is (or was) sensitive,
[terraform]:           # so its contents will not be displayed.
[terraform]:         }
[terraform]:     }
[terraform]: 
[terraform]: Plan: 0 to add, 1 to change, 0 to destroy.

How to Reproduce the problem?

This is my config:

project_name: aimee-qhub
provider: aws
domain: eo-analytics.delta-backend.com
certificate:
  type: lets-encrypt
  acme_email: aimee@developmentseed.org
  acme_server: https://acme-v02.api.letsencrypt.org/directory
security:
  authentication:
    type: GitHub
    config:
      client_id: XXX
      client_secret: XXX
  keycloak:
    initial_root_password: XXX
default_images:
  jupyterhub: quansight/qhub-jupyterhub:v0.4.3
  jupyterlab: quansight/qhub-jupyterlab:v0.4.3
  dask_worker: quansight/qhub-dask-worker:v0.4.3
storage:
  conda_store: 200Gi
  shared_filesystem: 200Gi
theme:
  jupyterhub:
    hub_title: VEDA QHub
    hub_subtitle: NASA VEDA
    welcome: Welcome to the VEDA Analytics QHub.
    logo: https://cdn.cdnlogo.com/logos/n/66/nasa.png
    primary_color: '#5d7fb9'
    secondary_color: '#000000'
    accent_color: '#32C574'
    text_color: '#5d7fb9'
    h1_color: '#5d7fb9'
    h2_color: '#5d7fb9'
    version: v0.4.3
helm_extensions: []
monitoring:
  enabled: true
argo_workflows:
  enabled: true
kbatch:
  enabled: true
cdsdashboards:
  enabled: true
  cds_hide_user_named_servers: true
  cds_hide_user_dashboard_servers: false
ci_cd:
  type: github-actions
  branch: main
  commit_render: true
terraform_state:
  type: remote
namespace: dev
qhub_version: 0.4.3
amazon_web_services:
  region: us-west-2
  kubernetes_version: '1.23'
  node_groups:
    general:
      instance: m5.2xlarge
      min_nodes: 1
      max_nodes: 1
    user:
      instance: m5.xlarge
      min_nodes: 1
      max_nodes: 5
    worker:
      instance: m5.xlarge
      min_nodes: 1
      max_nodes: 5
jupyterhub:
  overrides:
    singleuser:
      lifecycleHooks:
        postStart:
          exec:
            command:
              [
                "gitpuller",
                "https://github.com/NASA-IMPACT/veda-documentation",
                "master",
                "docs",
             ]
profiles:
  jupyterlab:
  - display_name: Small Instance
    description: Stable environment with 2 cpu / 8 GB ram
    default: true
    kubespawner_override:
      cpu_limit: 2
      cpu_guarantee: 1.5
      mem_limit: 8G
      mem_guarantee: 5G
  - display_name: Medium Instance
    description: Stable environment with 4 cpu / 16 GB ram
    kubespawner_override:
      cpu_limit: 4
      cpu_guarantee: 3
      mem_limit: 16G
      mem_guarantee: 10G
  dask_worker:
    Small Worker:
      worker_cores_limit: 2
      worker_cores: 1.5
      worker_memory_limit: 8G
      worker_memory: 5G
      worker_threads: 2
    Medium Worker:
      worker_cores_limit: 4
      worker_cores: 3
      worker_memory_limit: 16G
      worker_memory: 10G
      worker_threads: 4
environments:
  environment-dask.yaml:
    name: dask
    channels:
    - conda-forge
    dependencies:
    - nbgitpuller
    - python
    - ipykernel
    - ipywidgets
    - qhub-dask ==0.4.3
    - python-graphviz
    - numpy
    - numba
    - pandas
    - pip:
      - kbatch
  environment-dashboard.yaml:
    name: dashboard
    channels:
    - conda-forge
    dependencies:
    - nbgitpuller
    - python==3.9.7
    - ipykernel==6.4.1
    - ipywidgets==7.6.5
    - qhub-dask==0.4.3
    - param==1.11.1
    - python-graphviz==0.17
    - matplotlib==3.4.3
    - panel==0.12.7
    - voila==0.3.5
    - streamlit==1.0.0
    - dash==2.0.0
    - cdsdashboards-singleuser==0.6.1

so with GITHUB_TOKEN, AWS_ creds set, I run the qhub deploy -c aimee-config.yaml

Command output

No response

Versions and dependencies used.

$ conda --version
conda 4.13.0

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}

$ qhub --version
0.4.3

Others that might be relevant:

  • terrafrom 1.2.9
  • hashicorp/kubernetes v2.7.1
  • hashicorp/random v3.4.3
  • hashicorp/aws v4.34.0
  • hashicorp/helm v2.1.2

Compute environment

AWS

Integrations

Keycloak

Anything else?

No response

@abarciauskas-bgse abarciauskas-bgse added the type: bug 🐛 Something isn't working label Oct 12, 2022
@viniciusdc
Copy link
Contributor

@abarciauskas-bgse if you have kubectl installed could you ran the following:

  • kubectl get pods --namespace=dev keycloak-0, this will tell if Keycloak is running
  • kubectl describe pod -n dev keycloak-0 , will show the pod's events
  • kubectl logs -n dev keycloak-0 --tail 100 will give us some logs

@viniciusdc
Copy link
Contributor

also, have you tried running the deploy command after a few minutes passed since the first timeout error?

@iameskild
Copy link
Member

Hi @abarciauskas-bgse, we released v0.4.5 Friday. This latest release should resolve the issues you're experiencing above. Thank you for your interest in QHub and let us if your deployment was successful :)

@abarciauskas-bgse
Copy link
Author

Adding @tracetechnical to this thread - he has been troubleshooting our deployment and found and fixed the issue with EBS CSI drivers and the updated kubernetes version, so we do believe the 0.4.5 upgrade should fix this problem. Thanks @iameskild

@iameskild
Copy link
Member

@abarciauskas-bgse @tracetechnical wonderful! We also fixed the EBS CSI driver issue in this latest release so it sounds like we're in good shape. Unless there is anything else, at this point, I think we can close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: integration/keycloak area: terraform 💾 needs: follow-up 📫 Someone needs to get back to this issue or PR type: bug 🐛 Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants