-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] - Set minimum nodes to 0 for AWS deployment #2154
Comments
To replicate this, I created a Nebari cluster. The following nodes were created:
Default settings create a single node. All the pods are deployed on the same node. |
Thanks @aktech for pointing out that k9 is not actually showing all the hosts. On AWS EC2 console we see 3 instances. ![]() After setting the worker and user nodes to 0 and destroying and recreating, I can verify that there is only one general instance. |
StatusCurrent ProgressTo get nodes to scale from 0 in AWS EKS, we need to do the following:
The PR successfully does the first two steps. IssueWe have the following stages that deal with AWS:
Now, terraform refuses to create tags for autoscaling groups, which are only available once the node groups are formed. This means we need to do this in the next stage, which is Possible solutions
|
Please note: I had to move the user scheduler to the general node, as it kept the user node alive even if there was no user activity. If needed I can raise a separate mini-pr to just do this. |
ASG tagging is moved to Error: [terraform]: ╷
[terraform]: │ Error: Invalid provider configuration
[terraform]: │
[terraform]: │ Provider "registry.terraform.io/hashicorp/aws" requires explicit
[terraform]: │ configuration. Add a provider block to the root module and configure the
[terraform]: │ provider's required arguments as described in the provider documentation.
[terraform]: │
[terraform]: ╵
[terraform]: ╷
[terraform]: │ Error: No valid credential sources found
[terraform]: │
[terraform]: │ with provider["registry.terraform.io/hashicorp/aws"],
[terraform]: │ on <empty> line 0:
[terraform]: │ (source code not available)
[terraform]: │
[terraform]: │ Please see https://registry.terraform.io/providers/hashicorp/aws
[terraform]: │ for more information about providing credentials.
[terraform]: │
[terraform]: │ Error: failed to refresh cached credentials, no EC2 IMDS role found,
[terraform]: │ operation error ec2imds: GetMetadata, http response error StatusCode: 404,
[terraform]: │ request to EC2 IMDS failed At this point, I can use some pointers on how to resolve it. For some reason, it's expecting AWS credentials to be set. Link to logs with error: https://github.com/nebari-dev/nebari/actions/runs/7463265924/job/20307526950?pr=2168 After removing this block from
And the run passes deploy nebari phase. Logs: https://github.com/nebari-dev/nebari/actions/runs/7463764148/job/20309179650?pr=2168 |
To replicate the issue of local deployment, I am trying to run the same config on my laptop. It's an Intel-based Mac. ConfigNebari configI got the following config from the CI logs $ cat nebari-config.yaml
provider: local
namespace: dev
nebari_version: 2024.1.1rc2.dev82+g99d4445c
project_name: thisisatest
domain: github-actions.nebari.dev
ci_cd:
type: none
terraform_state:
type: remote
security:
keycloak:
initial_root_password: foad9omyohtfc7hfanwbem8zhahaup3s
authentication:
type: password
theme:
jupyterhub:
hub_title: Nebari - thisisatest
welcome: Welcome! Learn about Nebari's features and configurations in <a href="https://www.nebari.dev/docs/welcome">the
documentation</a>. If you have any questions or feedback, reach the team on
<a href="https://www.nebari.dev/docs/community#getting-support">Nebari's support
forums</a>.
hub_subtitle: Your open source data science platform, hosted Hosts file$ cat /etc/hosts | grep 172.18.1.100
172.18.1.100 github-actions.nebari.dev Error[terraform]: Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
[terraform]:
[terraform]: Outputs:
[terraform]:
[terraform]: load_balancer_address = {
[terraform]: "hostname" = ""
[terraform]: "ip" = "172.20.1.100"
[terraform]: }
Attempt 1 failed to connect to tcp tcp://172.20.1.100:80
Attempt 2 failed to connect to tcp tcp://172.20.1.100:80
Attempt 3 failed to connect to tcp tcp://172.20.1.100:80
Attempt 4 failed to connect to tcp tcp://172.20.1.100:80
Attempt 5 failed to connect to tcp tcp://172.20.1.100:80
Attempt 6 failed to connect to tcp tcp://172.20.1.100:80
Attempt 7 failed to connect to tcp tcp://172.20.1.100:80
Attempt 8 failed to connect to tcp tcp://172.20.1.100:80
Attempt 9 failed to connect to tcp tcp://172.20.1.100:80
Attempt 10 failed to connect to tcp tcp://172.20.1.100:80
ERROR: After stage=04-kubernetes-ingress unable to connect to ingress host=172.20.1.100 port=80 IssueNothing is running on port 80 on my laptop. $ sudo lsof -i -P | grep LISTEN | grep :80
Password:
$ Next stepThe document clearly says it doesn't work on Mac. So, I will try this on an EC2 machine instead. |
Reopening this as a release blocker as we've discovered an issue. From @kenafoster :
He is currently on PTO until next week so we'll wait until then to discuss. |
@kcpevey, glad you are on the case! Really hoping to see this land, as it would help me sell Nebari a lot easier! |
@pt247 shared this with me. I tested a similar configuration in AWS and it works - you can scale from 0->1 using the "dedicated" selector to target a node pool by name. So, the way to target different JupyterLab profile to a specific NodeGroup is by using the following key:
"
|
I have created a ticket in nebari-docs document this: Since no changes are needed in this PR to support this, is it okay to:
|
Feature description
We need to set the
min_nodes
for AWS to 0 foruser
andworker
nodes. We do have 0 for GCP.Otherwise this makes Nebari quite expensive (~$625/month) for someone trying it with default configuration:
Cost:
Nebari base cost on AWS: $625 per month with default config
This was not supported in by AWS, but its been a while since it's supported.
Value and/or benefit
Reduction in base cost for Nebari on AWS with the default configuration.
Anything else?
No response
The text was updated successfully, but these errors were encountered: