Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dashboard] Add memory graphs optimized for OOM debugging #47007

Closed
alanwguo opened this issue Aug 8, 2024 · 2 comments · Fixed by #48530
Closed

[Dashboard] Add memory graphs optimized for OOM debugging #47007

alanwguo opened this issue Aug 8, 2024 · 2 comments · Fixed by #48530
Labels
enhancement Request for new feature and/or capability good-first-issue Great starter issue for someone just starting to contribute to Ray observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling triage Needs triage (eg: priority, bug/not-bug, and owning component)

Comments

@alanwguo
Copy link
Contributor

alanwguo commented Aug 8, 2024

Description

The current graph shows memory usage of each node along side the MAX memory across the cluster.

For OOM detection, we probably care more about % memory usage per node and should put extra emphasis on nodes with > 80% (or some other number) memory usage.

Use case

Debug OOM as reason my workload crashed

@alanwguo alanwguo added good-first-issue Great starter issue for someone just starting to contribute to Ray enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling labels Aug 8, 2024
@alanwguo
Copy link
Contributor Author

alanwguo commented Aug 8, 2024

Another thing we can try doing is make head node emphasized in the Grafana dashboards. Make the line double thick.

@Bye-legumes
Copy link
Contributor

Let me try to solve this, I prefer to head node emphasized with double thick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability good-first-issue Great starter issue for someone just starting to contribute to Ray observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling triage Needs triage (eg: priority, bug/not-bug, and owning component)
Projects
None yet
2 participants