Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Nebari deployment fails setup keycloak service #2163

Closed
pt247 opened this issue Dec 22, 2023 · 3 comments
Closed

[BUG] - Nebari deployment fails setup keycloak service #2163

pt247 opened this issue Dec 22, 2023 · 3 comments
Labels
area: dependencies 📦 All things dependencies type: bug 🐛 Something isn't working type: release 🏷 Items related to Nebari releases
Milestone

Comments

@pt247
Copy link
Contributor

pt247 commented Dec 22, 2023

Describe the bug

Nebari pod fails to start because of error:

│ dev          keycloak-0                                                  ●  0/1   Init:CrashLoopBackOff         4 10.10.35.173   │

Looking into pod:

┌───────────────────────────────────────────────── Containers(dev/keycloak-0)[3] ──────────────────────────────────────────────────┐
│ NAME↑                       PF IMAGE                            READY  STATE             INIT   RESTARTS PROBES(L:R) CPU/R:L R:L │
│ initialize-spi-metrics-jar  ●  busybox:1.31                     false  CrashLoopBackOff  true          5 off:off         0:0 0:0 │
│ keycloak                    ●  docker.io/jboss/keycloak:15.0.2  false  PodInitializing   false         0 on:on           0:0 0:0 │
│ pgchecker                   ●  docker.io/busybox:1.32           true   Completed         true          0 off:off       20:20 :32 │
│                                                                                                                                  │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Logs of initialize-spi-metrics-jar:

┌───────────────────────────────────── Logs(dev/keycloak-0:initialize-spi-metrics-jar)[tail] ──────────────────────────────────────┐
│                                Autoscroll:On      FullScreen:Off     Timestamps:Off     Wrap:Off                                 │
│ Connecting to github.com (140.82.121.4:443)                                                                                      │
│ wget: note: TLS certificate validation not implemented                                                                           │
│ wget: TLS error from peer (alert code 80): 80                                                                                    │
│ wget: error getting response: Connection reset by peer                                                                           │
│ Stream closed EOF for dev/keycloak-0 (initialize-spi-metrics-jar)                                                                │
│                                                                                                                                  │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Expected behavior

Deployment should finish successfully.

OS and architecture in which you are running Nebari

AWS

How to Reproduce the problem?

Config used (with secrets and domains removed)

provider: aws
namespace: dev
nebari_version: 2023.11.2.dev1+gb0c451d1
project_name: nebaridemo
domain: nebari.aaa.com
ci_cd:
  type: none
terraform_state:
  type: remote
security:
  keycloak:
    initial_root_password: xxxxxxxxxxxxxxxxxxxxxxxxxxxx
  authentication:
    type: password
theme:
  jupyterhub:
    hub_title: Nebari - nebaridemo
    welcome: Welcome! Learn about Nebari's features and configurations in <a href="https://www.nebari.dev/docs/welcome">the
      documentation</a>. If you have any questions or feedback, reach the team on
      <a href="https://www.nebari.dev/docs/community#getting-support">Nebari's support
      forums</a>.
    hub_subtitle: Your open source data science platform, hosted on Amazon Web Services
amazon_web_services:
  kubernetes_version: '1.26'
  region: eu-west-1
certificate:
  type: lets-encrypt
  acme_email: aaa@gmail.com

Command output

[terraform]: module.kubernetes-keycloak-helm.helm_release.keycloak: Still modifying... [id=keycloak, 4m40s elapsed]
[terraform]: module.kubernetes-keycloak-helm.helm_release.keycloak: Still modifying... [id=keycloak, 4m50s elapsed]
[terraform]: module.kubernetes-keycloak-helm.helm_release.keycloak: Still modifying... [id=keycloak, 5m0s elapsed]
[terraform]: ╷
[terraform]: │ Error: timed out waiting for the condition
[terraform]: │ 
[terraform]: │   with module.kubernetes-keycloak-helm.helm_release.keycloak,
[terraform]: │   on modules/kubernetes/keycloak-helm/main.tf line 1, in resource "helm_release" "keycloak":
[terraform]: │    1: resource "helm_release" "keycloak" {
[terraform]: │ 
[terraform]: ╵
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/prashant/work/nebari/src/_nebari/subcommands/deploy.py:87 in deploy                       │
│                                                                                                  │
│   84 │   │   │   │   │   stages.remove(stage)                                                    │
│   85 │   │   │   rich.print("Skipping remote state provision")                                   │
│   86 │   │                                                                                       │
│ ❱ 87 │   │   deploy_configuration(                                                               │
│   88 │   │   │   config,                                                                         │
│   89 │   │   │   stages,                                                                         │
│   90 │   │   │   disable_prompt=disable_prompt,                                                  │
│                                                                                                  │
│ /Users/prashant/work/nebari/src/_nebari/deploy.py:53 in deploy_configuration                     │
│                                                                                                  │
│   50 │   │   with contextlib.ExitStack() as stack:                                               │
│   51 │   │   │   for stage in stages:                                                            │
│   52 │   │   │   │   s = stage(output_directory=pathlib.Path.cwd(), config=config)               │
│ ❱ 53 │   │   │   │   stack.enter_context(s.deploy(stage_outputs, disable_prompt))                │
│   54 │   │   │   │                                                                               │
│   55 │   │   │   │   if not disable_checks:                                                      │
│   56 │   │   │   │   │   s.check(stage_outputs, disable_prompt)                                  │
│                                                                                                  │
│ /Users/prashant/miniconda3/envs/nebari/lib/python3.8/contextlib.py:425 in enter_context          │
│                                                                                                  │
│   422 │   │   # statement.                                                                       │
│   423 │   │   _cm_type = type(cm)                                                                │
│   424 │   │   _exit = _cm_type.__exit__                                                          │
│ ❱ 425 │   │   result = _cm_type.__enter__(cm)                                                    │
│   426 │   │   self._push_cm_exit(cm, _exit)                                                      │
│   427 │   │   return result                                                                      │
│   428                                                                                            │
│                                                                                                  │
│ /Users/prashant/miniconda3/envs/nebari/lib/python3.8/contextlib.py:113 in __enter__              │
│                                                                                                  │
│   110 │   │   # they are only needed for recreation, which is not possible anymore               │
│   111 │   │   del self.args, self.kwds, self.func                                                │
│   112 │   │   try:                                                                               │
│ ❱ 113 │   │   │   return next(self.gen)                                                          │
│   114 │   │   except StopIteration:                                                              │
│   115 │   │   │   raise RuntimeError("generator didn't yield") from None                         │
│   116                                                                                            │
│                                                                                                  │
│ /Users/prashant/work/nebari/src/_nebari/stages/kubernetes_keycloak/__init__.py:295 in deploy     │
│                                                                                                  │
│   292 │   def deploy(                                                                            │
│   293 │   │   self, stage_outputs: Dict[str, Dict[str, Any]], disable_prompt: bool = False       │
│   294 │   ):                                                                                     │
│ ❱ 295 │   │   with super().deploy(stage_outputs, disable_prompt):                                │
│   296 │   │   │   with keycloak_provider_context(                                                │
│   297 │   │   │   │   stage_outputs["stages/" + self.name]["keycloak_credentials"]["value"]      │
│   298 │   │   │   ):                                                                             │
│                                                                                                  │
│ /Users/prashant/miniconda3/envs/nebari/lib/python3.8/contextlib.py:113 in __enter__              │
│                                                                                                  │
│   110 │   │   # they are only needed for recreation, which is not possible anymore               │
│   111 │   │   del self.args, self.kwds, self.func                                                │
│   112 │   │   try:                                                                               │
│ ❱ 113 │   │   │   return next(self.gen)                                                          │
│   114 │   │   except StopIteration:                                                              │
│   115 │   │   │   raise RuntimeError("generator didn't yield") from None                         │
│   116                                                                                            │
│                                                                                                  │
│ /Users/prashant/work/nebari/src/_nebari/stages/base.py:72 in deploy                              │
│                                                                                                  │
│    69 │   │   │   deploy_config["terraform_import"] = True                                       │
│    70 │   │   │   deploy_config["state_imports"] = state_imports                                 │
│    71 │   │                                                                                      │
│ ❱  72 │   │   self.set_outputs(stage_outputs, terraform.deploy(**deploy_config))                 │
│    73 │   │   self.post_deploy(stage_outputs, disable_prompt)                                    │
│    74 │   │   yield                                                                              │
│    75                                                                                            │
│                                                                                                  │
│ /Users/prashant/work/nebari/src/_nebari/provider/terraform.py:71 in deploy                       │
│                                                                                                  │
│    68 │   │   │   │   )                                                                          │
│    69 │   │                                                                                      │
│    70 │   │   if terraform_apply:                                                                │
│ ❱  71 │   │   │   apply(directory, var_files=[f.name])                                           │
│    72 │   │                                                                                      │
│    73 │   │   if terraform_destroy:                                                              │
│    74 │   │   │   destroy(directory, var_files=[f.name])                                         │
│                                                                                                  │
│ /Users/prashant/work/nebari/src/_nebari/provider/terraform.py:151 in apply                       │
│                                                                                                  │
│   148 │   │   + ["-var-file=" + _ for _ in var_files]                                            │
│   149 │   )                                                                                      │
│   150 │   with timer(logger, "terraform apply"):                                                 │
│ ❱ 151 │   │   run_terraform_subprocess(command, cwd=directory, prefix="terraform")               │
│   152                                                                                            │
│   153                                                                                            │
│   154 def output(directory=None):                                                                │
│                                                                                                  │
│ /Users/prashant/work/nebari/src/_nebari/provider/terraform.py:118 in run_terraform_subprocess    │
│                                                                                                  │
│   115 │   terraform_path = download_terraform_binary()                                           │
│   116 │   logger.info(f" terraform at {terraform_path}")                                         │
│   117 │   if run_subprocess_cmd([terraform_path] + processargs, **kwargs):                       │
│ ❱ 118 │   │   raise TerraformException("Terraform returned an error")                            │
│   119                                                                                            │
│   120                                                                                            │
│   121 def version():                                                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TerraformException: Terraform returned an error

Versions and dependencies used.

(base)  ➜ nebari-aws (2154-aws-set-minimum-notes-to-0) ✏️  conda --version 
conda 4.8.2
(base)  ➜ nebari-aws (2154-aws-set-minimum-notes-to-0) ✏️  kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.5", GitCommit:"5c99e2ac2ff9a3c549d9ca665e7bc05a3e18f07e", GitTreeState:"clean", BuildDate:"2021-12-16T08:38:33Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"26+", GitVersion:"v1.26.11-eks-8cb36c9", GitCommit:"3970d4cb523c57bbb0780125446d60f9aca260ad", GitTreeState:"clean", BuildDate:"2023-11-22T21:52:34Z", GoVersion:"go1.20.11", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.22) and server (1.26) exceeds the supported minor version skew of +/-1
(base)  ➜ nebari-aws (2154-aws-set-minimum-notes-to-0) ✏️  

nebari_version: 2023.11.2.dev1+gb0c451d1

Compute environment

AWS

Integrations

Keycloak

Anything else?

The issue happens at line:

      wget https://github.com/aerogear/keycloak-metrics-spi/releases/download/2.5.3/keycloak-metrics-spi-2.5.3.jar -P /data/ &&

For some reason, this request is insecure and gets rejected.

Now, we are using an old and unsupported version of busybox: 1.31 (line)

All the currently supported tags for BusyBox can be found at:
https://hub.docker.com/_/busybox

Upgrading the version of BusyBox to 1.36.1 fixes this issue.

@pt247 pt247 added needs: triage 🚦 Someone needs to have a look at this issue and triage type: bug 🐛 Something isn't working labels Dec 22, 2023
@pt247
Copy link
Contributor Author

pt247 commented Dec 22, 2023

If you agree with the findings, I would happily create a mini PR for review.

@pt247
Copy link
Contributor Author

pt247 commented Dec 22, 2023

Checking locally to verify the findings:

  • With: busybox:1.31

    $ docker run -it busybox:1.31 wget https://github.com/ 
    Connecting to github.com (140.82.121.3:443)
    wget: note: TLS certificate validation not implemented
    wget: TLS error from peer (alert code 80): 80
    wget: error getting response: Connection reset by peer
  • And with busybox:1.36.1:

    $ docker run -it busybox:1.36.1 wget https://github.com                   
    Connecting to github.com (140.82.121.3:443)
    wget: note: TLS certificate validation not implemented
    saving to 'index.html'
    index.html           100% |************************************************************************************|  203k  0:00:00 ETA
    'index.html' saved

@dcmcand
Copy link
Contributor

dcmcand commented Dec 22, 2023

Hi @pt247,
Thanks so much for the report, and for the diagnosis work. We are going to reproduce your bug and verify your solution and hopefully get this fix in the next release.

@dcmcand dcmcand added area: dependencies 📦 All things dependencies type: release 🏷 Items related to Nebari releases and removed needs: triage 🚦 Someone needs to have a look at this issue and triage labels Dec 22, 2023
@dcmcand dcmcand added this to the Next release milestone Dec 22, 2023
@pt247 pt247 closed this as completed Dec 24, 2023
@github-project-automation github-project-automation bot moved this from New 🚦 to Done 💪🏾 in 🪴 Nebari Project Management Dec 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: dependencies 📦 All things dependencies type: bug 🐛 Something isn't working type: release 🏷 Items related to Nebari releases
Projects
Development

No branches or pull requests

2 participants