Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set default resources for all dagster repos #324

Merged

Conversation

pietercusters
Copy link
Contributor

@pietercusters pietercusters commented Dec 6, 2023

See here how resources of user code deployments should be set: https://github.com/dagster-io/dagster/blob/master/helm/dagster/charts/dagster-user-deployments/values.yaml#L114

This PR attempts to implement this.

Check any pod starting with udc in grafana (e.g. this one) and conclude with me that the resources I set are sensible.


📕 DATA-1159 As a developer I don't want to waste money by overcommitting my app its resource requests in Kubernetes pietercusters@vandebron.nl

– This ticket is a reminder for Pieter to follow-up with Pim on resources –

Story
Every app we make in-house will be deployed to Kubernetes. Kubernetes uses so called resources requests (and limits) to make sure you app gets the resources it needs, but doesn't use too much and breaks other apps or the underlaying nodes.These are manual configurations, which can be found in the project.yaml and must be refined before deployment to production and each time you modify the app.To use them correctly, please read/watch on of the articles below. In short:

** Kubernetes defines Limits* as the maximum amount of a resource to be used by a container. This means that the container can never consume more than the memory amount or CPU amount indicated. It will either be throttled (CPU) or killed OOM (RAM) after hitting the limits.
** Requests*, on the other hand, are the minimum guaranteed amount of a resource that is reserved for a container. CPU or RAM requested by one pod, can't be requested/used by another.

For this item we will focus on the CPU requests, not the memory requests and cpu/memory limits. Why?
Because we reserve 75% CPU while we're mostly only using 10%. See image below. This is expensive, but also makes it impossible to deploy sometimes, because the cluster seems "full".

How?
For each of the apps/deployments (not jobs, for now) your team is responsible for, verify it's actual CPU use here: GRAFANA PROD.
You can filter on namespace and app name. In the CPU Usage graph you can see the "normal" CPU usage of the app. Don't look at the spikes, just try to find the trend of normal usage.
In the URL I added as an example to GRAFANA PROD above, you can see that the app:

  • is using around 0.033 CPU
  • requested 0.4 CPU
  • limits the CPU at 2 full CPU's

The preferred modification here should be reducing the CPU requests rom 0.4 to +/- 0.05. We'll keep the limits untouched for now. Redeploy.
Please have a quick look if the CPU requests could be lower in acce}} and {{test}} in comparison to {{prod. Those clusters are smaller and the apps are being used less, usually!Resources

  • Watch Kubernetes Resource Requests explained in 5m: watch?v=*nknHwTKlh8
  • Read (until Namespace settings): kubernetes-best-practices-resource-requests-and-limits
  • Grafana dashboards to determine CPU usage: [acce|https://grafana.acce-backend.vdbinfra.nl/k8s/clusters/c-6mkzg/api/v1/namespaces/cattle-monitoring-system/services/http:rancher-monitoring-grafana:80/proxy/d/6581e46e4e5c7ba40a07646395ef7b23/kubernetes-compute-resources-pod?orgId=1&refresh=10s&var-datasource=Prometheus&var-cluster=&var-namespace=batterypack&var-pod=batterypackapi-5686477f9d-lqmfl&from=now-2d&to=now], [test|https://grafana.test-backend.vdbinfra.nl/k8s/clusters/c-z8wzm/api/v1/namespaces/cattle-monitoring-system/services/http:rancher-monitoring-grafana:80/proxy/d/6581e46e4e5c7ba40a07646395ef7b23/kubernetes-compute-resources-pod?orgId=1&refresh=10s&var-datasource=Prometheus&var-cluster=&var-namespace=batterypack&var-pod=batterypackapi-5b76b64bb5-f85vk&from=now-2d&to=now], prod
  • project.yaml example: https://github.com/Vandebron/Vandebron/blob/master/apps/customer-dashboard/deployment/project.yml#L33

🏗️ Build 3 ✅ Successful, started by Pieter Custers
🚀 cloudfront-service, example-dagster-user-code, job, kong-sync, nodeservice, ocpp-first, ocpp-second, sbtservice, sparkJob-first, sparkJob-second

Copy link

@timvandruenen timvandruenen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

github-actions bot commented Dec 7, 2023

Code Coverage

Package Line Rate Health
src.mpyl 93%
src.mpyl.artifacts 64%
src.mpyl.cli 60%
src.mpyl.projects 92%
src.mpyl.reporting 100%
src.mpyl.reporting.formatting 100%
src.mpyl.stages 75%
src.mpyl.steps 92%
src.mpyl.steps.build 83%
src.mpyl.steps.deploy 80%
src.mpyl.steps.deploy.k8s 81%
src.mpyl.steps.deploy.k8s.resources 97%
src.mpyl.steps.postdeploy 100%
src.mpyl.steps.test 93%
src.mpyl.utilities 100%
src.mpyl.utilities.cypress 93%
src.mpyl.utilities.dagster 71%
src.mpyl.utilities.github 71%
src.mpyl.utilities.helm 100%
src.mpyl.utilities.jenkins 69%
src.mpyl.utilities.junit 97%
src.mpyl.utilities.logging 78%
src.mpyl.utilities.pyaml_env 100%
src.mpyl.utilities.repo 74%
src.mpyl.utilities.s3 47%
src.mpyl.utilities.sbt 96%
src.mpyl.utilities.subprocess 81%
src.mpyl.utilities.yaml 100%
Summary 82% (2675 / 3268)

Minimum allowed line rate is 79%

Copy link
Contributor

@kgrunert kgrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR should create a test version for MPyL.

I recommend on building and deploying a dagster user deployment with this test version and then checking with
helm get values or helm get manifest that the values are being set in the release that you expect!

And then also run the MPyL Test Pipeline on jenkins to verify the build of this mpyl version :-)

Copy link

github-actions bot commented Dec 7, 2023

New release 324.3432 deployed at Test Pypi.
Install with

pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple mpyl==324.3432

Copy link
Contributor

@kgrunert kgrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and please write a 1.4.11 readme with the release notes of the changes you are implementing

@pietercusters pietercusters force-pushed the task/data-1159-set-default-resources-dagster-repos branch from ba5a4ea to a123fff Compare December 13, 2023 09:06
@pietercusters pietercusters marked this pull request as ready for review December 13, 2023 09:09
@pietercusters pietercusters merged commit 9c67ede into main Dec 13, 2023
9 checks passed
@pietercusters pietercusters deleted the task/data-1159-set-default-resources-dagster-repos branch December 13, 2023 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants