[Dashboard] Add memory graphs optimized for OOM debugging #47007
Labels
enhancement
Request for new feature and/or capability
good-first-issue
Great starter issue for someone just starting to contribute to Ray
observability
Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
Description
The current graph shows memory usage of each node along side the MAX memory across the cluster.
For OOM detection, we probably care more about % memory usage per node and should put extra emphasis on nodes with > 80% (or some other number) memory usage.
Use case
Debug OOM as reason my workload crashed
The text was updated successfully, but these errors were encountered: