Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Installing nebari locally (local with kind) fails. #1703

Closed
twaclaw opened this issue Apr 8, 2023 · 3 comments
Closed

[BUG] - Installing nebari locally (local with kind) fails. #1703

twaclaw opened this issue Apr 8, 2023 · 3 comments
Labels
type: bug 🐛 Something isn't working

Comments

@twaclaw
Copy link

twaclaw commented Apr 8, 2023

Describe the bug

Deploying nebari locally fails.

The health check for https://domain/argo/ fails.
Other endpoints work fine, and argo worksflows seems to be installed properly.

log3.8.txt

Expected behavior

Installation should go through.

OS and architecture in which you are running Nebari

Archlinux: 6.2.8-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 22 Mar 2023 22:52:35 +0000 x86_64 GNU/Linux. I tried with both Python 3.8 and 3.11.

How to Reproduce the problem?

Follow the steps in:
https://www.nebari.dev/docs/how-tos/nebari-local

Command output

nebari init local  --project projectname  --domain domain  --auth-provider password  --terraform-state=local

nebari deploy -c nebari-config.yaml --disable-prompt |tee log3.8.txt


[terraform]:   "keycloak" = {
[terraform]:     "health_url" = "https://domain/auth/realms/master"
[terraform]:     "url" = "https://domain/auth/"
[terraform]:   }
[terraform]:   "monitoring" = {
[terraform]:     "health_url" = "https://domain/monitoring/api/health"
[terraform]:     "url" = "https://domain/monitoring/"
[terraform]:   }
[terraform]: }
Attempt 1 health check failed for url=https://domain/argo/
Attempt 2 health check failed for url=https://domain/argo/
Attempt 3 health check failed for url=https://domain/argo/
Attempt 4 health check failed for url=https://domain/argo/
Attempt 5 health check failed for url=https://domain/argo/
Attempt 6 health check failed for url=https://domain/argo/
Attempt 7 health check failed for url=https://domain/argo/
Attempt 8 health check failed for url=https://domain/argo/
Attempt 9 health check failed for url=https://domain/argo/
Attempt 10 health check failed for url=https://domain/argo/
ERROR: Service argo-workflows DOWN when checking url=https://domain/argo/

Versions and dependencies used.

kind: 0.18.0

kubectl:
Client Version: v1.26.3
Kustomize Version: v4.5.7
Server Version: v1.21.10

nebari: 2023.1.1

Compute environment

kind

Integrations

Argo

Anything else?

No response

@twaclaw twaclaw added needs: triage 🚦 Someone needs to have a look at this issue and triage type: bug 🐛 Something isn't working labels Apr 8, 2023
@twaclaw twaclaw changed the title [BUG] - <title> [BUG] - Installing nebari locally (local with kind) fails. Apr 8, 2023
@dharhas
Copy link
Member

dharhas commented Apr 10, 2023

@pmeier is this the same issue you faced recently?

@pmeier
Copy link
Member

pmeier commented Apr 11, 2023

Most likely. Just to confirm @twaclaw: if you create a new user and try to login, are you also seeing a HTTP 500 error?

This is happening, because we only patch /etc/hosts and thus our browser on the host knows how to handle this. However, this change does not apply to the pods inside the cluster and thus they fail to resolve the URL. We don't see this in CI, because we have a permanent DNS entry for

sudo echo "172.18.1.100 github-actions.nebari.dev" | sudo tee -a /etc/hosts

As discussed in our last sync, we need to eliminate this since other users might not have access to a domain and more important shouldn't need to to deploy nebari locally.

@twaclaw As a quick workaround, destroy the cluster with nebari destroy -c nebari-config.yaml and rerun nebari init local with --domain 172.18.1.100. This should work here, but won't in 100% of the cases (see #1707).

If you drop the --disable-prompt flag from nebari deploy, you should see this message roughly 50% in:

Take IP Address 172.18.1.100 and update DNS to point to "172.18.1.100" [Press Enter when Complete]

If the IPs match, just confirm and your cluster should start up without issues. You can also remove the entry from /etc/hosts now and access the web UI through the IP directly.

@twaclaw
Copy link
Author

twaclaw commented Apr 11, 2023

@pmeier, thanks for the workaround. I confirm that passing --domain IP solves the issue.

This was referenced Apr 14, 2023
@Adam-D-Lewis Adam-D-Lewis removed the needs: triage 🚦 Someone needs to have a look at this issue and triage label Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug 🐛 Something isn't working
Projects
Development

No branches or pull requests

4 participants