-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core][Node Labels 3/n]Add node labels to node resources and publish to all node #36009
Conversation
if (it == nodes_.end()) { | ||
NodeResources node_resources; | ||
it = nodes_.emplace(node_id, node_resources).first; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a bit weird that we can have node with only labels but empty resources. We should be able to set labels and resources together when we add a node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reconsidered and still think it's better to keep this interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually wondering when we may not have total resources. Should it be a check instead of warning? Let me check with the team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we actually change if (it == nodes_.end()) {
to CHECK which makes code easier to reason about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NodeResources node_resources; | ||
node_resources.total = ResourceMapToResourceRequest(resource_map_total, false); | ||
node_resources.available = ResourceMapToResourceRequest(resource_map_available, false); | ||
node_resources.labels = node_labels; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After we have labels, the current naming (NodeResources) is no longer that accurate. We should probably have something like
class Node {
NodeResources resources;
map<string, string> labels;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Labels can also be regarded as a resource of node. So it is very suitable to put it in NodeResources, and it reduces a lot of code changes.
- NodeResources do not need to be computable. The ResourceRequest in NodeResources is Computable. Adding labels to NodeResources does not affect the original available/tatal, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Got the design.
NodeResources node_resources; | ||
node_resources.total = ResourceMapToResourceRequest(resource_map_total, false); | ||
node_resources.available = ResourceMapToResourceRequest(resource_map_available, false); | ||
node_resources.labels = node_labels; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Got the design.
if (it == nodes_.end()) { | ||
NodeResources node_resources; | ||
it = nodes_.emplace(node_id, node_resources).first; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we actually change if (it == nodes_.end()) {
to CHECK which makes code easier to reason about.
const scheduling::NodeID &node_id, | ||
const absl::flat_hash_map<std::string, std::string> &labels) { | ||
if (labels.empty()) { | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should early return here. If labels are empty then we should just set node labels to empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
I've thought about it, and for the scenario where nodes are added, the semantics should be 'reset' instead of 'update', since the labels are static. Therefore, I have changed the interface to 'ResetNodeLabels()'. If dynamic labels are added in the future, I will create a new interface called 'UpdateNodeLabels()'.
3781ee3
to
3942b14
Compare
|
||
absl::flat_hash_map<std::string, std::string> labels(node.labels().begin(), | ||
node.labels().end()); | ||
cluster_resource_manager_.ResetNodeLabels(scheduling_node_id, labels); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: SetNodeLabels()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
src/ray/raylet/node_manager.h
Outdated
@@ -237,6 +237,7 @@ class NodeManager : public rpc::NodeManagerServiceHandler, | |||
/// Handler for the addition or updation of a resource in the GCS | |||
/// \param node_id ID of the node that created or updated resources. | |||
/// \param createUpdatedResources Created or updated resources. | |||
/// \param labels Created or updated labels of this node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -44,10 +44,11 @@ ClusterResourceScheduler::ClusterResourceScheduler( | |||
const absl::flat_hash_map<std::string, double> &local_node_resources, | |||
std::function<bool(scheduling::NodeID)> is_node_available_fn, | |||
std::function<int64_t(void)> get_used_object_store_memory, | |||
std::function<bool(void)> get_pull_manager_at_capacity) | |||
std::function<bool(void)> get_pull_manager_at_capacity, | |||
const absl::flat_hash_map<std::string, std::string> &local_node_labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the change that creates the ClusterResourceScheduler and pass in the labels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all node Signed-off-by: LarryLian <554538252@qq.com>
…to all node(ray-project#36009) Signed-off-by: LarryLian <554538252@qq.com> Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
Why are these changes needed?
Add node labels to node resources and publish to all node
Related issue number
Enhancing node affinity scheduling feature through node labels #34894
(P1)Parse the configuration parameters for node labels and save them in the NodeInfo data structure.
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.