[Cherry-pick][Streaming Generator] Fix a reference leak when pinning requests are … #35794
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…received after refs are consumed. (#35712)
When we put a new object or an object is spilled, raylet sends a RPC to the owner. For example, it sends a request to the owner to pin the plasma object until the ref goes out of scope.
For none-generator tasks, we always guarantee to create return references, so we can handle these RPCs properly, However, when a generator task is used, we don't know the references ahead of time, which means it is not guaranteed to own the references when the RPC is received. In this case, we own a reference before it is reported from the executor. See the code below for more details.
ray/src/ray/core_worker/core_worker.cc
Line 3292 in 5acf41e
reference_counter_->OwnDynamicStreamingTaskReturnRef(object_id, generator_id);
However, this code is prone to error and causes reference leaks. Here are some examples. Imagine "WRITE 0" means the generator task return is written to a stream index 0. PinObjectRPCRecieved means the raylet RPC is received. READ 0 means we read the index 0 from a stream.
Example 1
WRITE 0
READ 0
PinObjectRPCRecieved
-> This means the reference is already consumed, and the PinObjectRPCRecieved comes after that. In this case, we shouldn't add a reference to the object, otherwise, it will leak because we cannot read this ref anymore (cuz it is already consumed).
Example 2
PinObjectRPCRecieved
In this case, WRITE 0 is failed. So, the when the object is owned by PinObjectRPCRecieved, it will be never be cleaned up.
To handle all these cases, we introduce a new API TemporarilyInsertToStreamIfNeeded. This API will own the object only when
the corresponding ref was never consumed
The stream has not been deleted.
And add the object ref to the temporary refs until it is reported. If the report fails, all the references will be removed when the stream is deleted.
Why are these changes needed?
Related issue number
#35634
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.