-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix resource bookkeeping bug with acquiring unknown resource. #4945
Changes from all commits
d13e889
0abcd07
bb48572
28d73f6
b3277ec
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -76,7 +76,11 @@ ResourceSet::ResourceSet() {} | |
|
||
ResourceSet::ResourceSet( | ||
const std::unordered_map<std::string, FractionalResourceQuantity> &resource_map) | ||
: resource_capacity_(resource_map) {} | ||
: resource_capacity_(resource_map) { | ||
for (auto const &resource_pair : resource_map) { | ||
RAY_CHECK(resource_pair.second > 0); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This prevents the ability to create a resource set with any zero capacity resource. One of the use cases we supported in the past is a local laptop with zero CPU allocation to prevent work assignment/force all work assignment to the remote cluster. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so, this change would break backward compatibility There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's actually still supported. This PR shouldn't have any API implications. The resources with quantity 0 are filtered out before getting to this constructor. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm just not comfortable with a constructor that breaks the whole raylet if you pass it a fairly reasonable zero resource capacity attribute. This restricts the programmatic expressivity of the ResourceSet. This should also be extensively tested with dynamic custom resources (ESCHER), while allowing some custom resource to drop down to zero capacity. There's a possibility that somewhere in the code we make a copy of the ResourceSet that's dynamically updated to \vec{r} = [..., 0, ...] and trigger this check. I.e., I'm not as worried about the static resource case as I am about the dynamic resource case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We maintain the invariant that all quantities in a This change makes sure to enforce those semantics more clearly (and the absence of this change allowed the bug to creep in). Note that the constructor below has the same check. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think as long as other methods of ResourceSet that update capacity are correctly implemented, dynamic custom resources should be fine even when capacity drops to zero. As @robertnishihara pointed out, any illegal updates must be filtered out before this code is invoked. If they're not, raising a check fail here seems helpful for isolating bugs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @robertnishihara , I see, I suppose it makes sense to be consistent with the second constructor, which is already performing this check. Longer term, not having the ability to represent a zero in a system seems broken to me. There was this major breakthrough in mathematics when people invented zero... |
||
} | ||
} | ||
|
||
ResourceSet::ResourceSet(const std::unordered_map<std::string, double> &resource_map) { | ||
for (auto const &resource_pair : resource_map) { | ||
|
@@ -169,7 +173,8 @@ void ResourceSet::SubtractResourcesStrict(const ResourceSet &other) { | |
const std::string &resource_label = resource_pair.first; | ||
const FractionalResourceQuantity &resource_capacity = resource_pair.second; | ||
RAY_CHECK(resource_capacity_.count(resource_label) == 1) | ||
<< "Attempt to acquire unknown resource: " << resource_label; | ||
<< "Attempt to acquire unknown resource: " << resource_label << " capacity " | ||
<< resource_capacity.ToDouble(); | ||
resource_capacity_[resource_label] -= resource_capacity; | ||
|
||
// Ensure that quantity is positive. Note, we have to have the check before | ||
|
@@ -233,8 +238,10 @@ FractionalResourceQuantity ResourceSet::GetResource( | |
|
||
const ResourceSet ResourceSet::GetNumCpus() const { | ||
ResourceSet cpu_resource_set; | ||
cpu_resource_set.resource_capacity_[kCPU_ResourceLabel] = | ||
GetResource(kCPU_ResourceLabel); | ||
const FractionalResourceQuantity cpu_quantity = GetResource(kCPU_ResourceLabel); | ||
if (cpu_quantity > 0) { | ||
cpu_resource_set.resource_capacity_[kCPU_ResourceLabel] = cpu_quantity; | ||
} | ||
return cpu_resource_set; | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change looks like it would allow placing zero CPU actor creation tasks on a node with zero CPUs. The current behavior is to disallow it. Is this what we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the current behavior is to crash according to #4892. I view this more as a bug fix and not a change in behavior. The requirement of needing a CPU for placing the actor only applies to actors that do not acquire lifetime resources. The actors that do acquire lifetime resources can be placed as long as they can acquire those resources.