-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Safe(r) minting of CapabilityRef
s
#429
Merged
frankmcsherry
merged 1 commit into
TimelyDataflow:master
from
petrosagg:safe-capabilityref
Sep 7, 2022
Merged
Safe(r) minting of CapabilityRef
s
#429
frankmcsherry
merged 1 commit into
TimelyDataflow:master
from
petrosagg:safe-capabilityref
Sep 7, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
petrosagg
force-pushed
the
safe-capabilityref
branch
4 times, most recently
from
November 3, 2021 00:01
668b490
to
0929592
Compare
petrosagg
force-pushed
the
safe-capabilityref
branch
from
November 10, 2021 13:47
0929592
to
2862f9d
Compare
I just built materialize with this change to get further confidence that it is correct. Tests passed :) MaterializeInc/materialize#9023 |
petrosagg
force-pushed
the
safe-capabilityref
branch
from
December 1, 2021 17:36
2862f9d
to
a7bf67f
Compare
CapabilityRefs are valid to exist as long as the data in the input are not marked as consumed. This change makes sure that this is the case by including an extra drop guard in the capability ref. Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
petrosagg
force-pushed
the
safe-capabilityref
branch
from
September 6, 2022 14:09
a7bf67f
to
e49dccb
Compare
Looks good; thanks! |
petrosagg
added a commit
to petrosagg/materialize
that referenced
this pull request
Sep 12, 2022
After TimelyDataflow/timely-dataflow#429 holding onto CapabilityRefs across await points is safe Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
petrosagg
added a commit
to petrosagg/materialize
that referenced
this pull request
Sep 13, 2022
After TimelyDataflow/timely-dataflow#429 holding onto CapabilityRefs across await points is safe Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
petrosagg
added a commit
to petrosagg/timely-dataflow
that referenced
this pull request
Dec 22, 2022
Since the merge of TimelyDataflow#429, `CapabilityRef`s have been made safe to hold onto across operator invocations because that PR made sure that they only decremented their progress counts on `Drop`. While this allowed `async`/`await` based operators to freely hold on to them, it was still very difficult for synchronous based operators to do the same thing, due to the lifetime attached to the `CapabilityRef`. Since `CapabilityRef`s can now be held arbitrarily long, the lifetime constraint on them can be lifted and therefore made into a `'static` value. No extra clones of the timestamps were needed for this change. After making this change, the name `CapabilityRef` felt wrong, since there is no reference to anything anymore. Instead, the main distinction between `CapabilityRef`s and `Capabilities` are that the former is associated with an input port and the latter is associated with an output port. As such, I have renamed `CapabilityRef` to `InputCapability` to signal to users that holding onto one of them represents holding onto a timestamp at the input for which we have not yet determined the output port that it should flow to. This nicely ties up the semantics of the `InputCapability::retain_for_output` and `InputCapability::delayed_for_output` methods, which make it clear by their name and signature that this is what "transfers" the capability from input ports to output ports. Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
petrosagg
added a commit
to petrosagg/timely-dataflow
that referenced
this pull request
Dec 22, 2022
Since the merge of TimelyDataflow#429, `CapabilityRef`s have been made safe to hold onto across operator invocations because that PR made sure that they only decremented their progress counts on `Drop`. While this allowed `async`/`await` based operators to freely hold on to them, it was still very difficult for synchronous based operators to do the same thing, due to the lifetime attached to the `CapabilityRef`. We can observe that the lifetime no longer provides any benefits, which means it can be removed and turn `CapabilityRef`s into fully owned values. This allows any style of operator to easily hold on to them. The benefit of that isn't just performance (by avoiding the `retain()` dance), but also about deferring the decision of the output port a given input should flow to to a later time. After making this change, the name `CapabilityRef` felt wrong, since there is no reference to anything anymore. Instead, the main distinction between `CapabilityRef`s and `Capabilities` are that the former is associated with an input port and the latter is associated with an output port. As such, I have renamed `CapabilityRef` to `InputCapability` to signal to users that holding onto one of them represents holding onto a timestamp at the input for which we have not yet determined the output port that it should flow to. This nicely ties up the semantics of the `InputCapability::retain_for_output` and `InputCapability::delayed_for_output` methods, which make it clear by their name and signature that this is what "transfers" the capability from input ports to output ports. Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
petrosagg
added a commit
to petrosagg/timely-dataflow
that referenced
this pull request
Jan 11, 2023
Since the merge of TimelyDataflow#429, `CapabilityRef`s have been made safe to hold onto across operator invocations because that PR made sure that they only decremented their progress counts on `Drop`. While this allowed `async`/`await` based operators to freely hold on to them, it was still very difficult for synchronous based operators to do the same thing, due to the lifetime attached to the `CapabilityRef`. We can observe that the lifetime no longer provides any benefits, which means it can be removed and turn `CapabilityRef`s into fully owned values. This allows any style of operator to easily hold on to them. The benefit of that isn't just performance (by avoiding the `retain()` dance), but also about deferring the decision of the output port a given input should flow to to a later time. After making this change, the name `CapabilityRef` felt wrong, since there is no reference to anything anymore. Instead, the main distinction between `CapabilityRef`s and `Capabilities` are that the former is associated with an input port and the latter is associated with an output port. As such, I have renamed `CapabilityRef` to `InputCapability` to signal to users that holding onto one of them represents holding onto a timestamp at the input for which we have not yet determined the output port that it should flow to. This nicely ties up the semantics of the `InputCapability::retain_for_output` and `InputCapability::delayed_for_output` methods, which make it clear by their name and signature that this is what "transfers" the capability from input ports to output ports. Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
frankmcsherry
pushed a commit
that referenced
this pull request
Jan 29, 2023
Since the merge of #429, `CapabilityRef`s have been made safe to hold onto across operator invocations because that PR made sure that they only decremented their progress counts on `Drop`. While this allowed `async`/`await` based operators to freely hold on to them, it was still very difficult for synchronous based operators to do the same thing, due to the lifetime attached to the `CapabilityRef`. We can observe that the lifetime no longer provides any benefits, which means it can be removed and turn `CapabilityRef`s into fully owned values. This allows any style of operator to easily hold on to them. The benefit of that isn't just performance (by avoiding the `retain()` dance), but also about deferring the decision of the output port a given input should flow to to a later time. After making this change, the name `CapabilityRef` felt wrong, since there is no reference to anything anymore. Instead, the main distinction between `CapabilityRef`s and `Capabilities` are that the former is associated with an input port and the latter is associated with an output port. As such, I have renamed `CapabilityRef` to `InputCapability` to signal to users that holding onto one of them represents holding onto a timestamp at the input for which we have not yet determined the output port that it should flow to. This nicely ties up the semantics of the `InputCapability::retain_for_output` and `InputCapability::delayed_for_output` methods, which make it clear by their name and signature that this is what "transfers" the capability from input ports to output ports. Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
As part of progress tracking timely keeps track of three kinds of timestamp frequencies for each operator. Timestamps at the inputs (the
consumeds
part ofSharedProgress
), internal timestamps (theinternals
part ofSharedProgress
) and timestamps at the output (theproduceds
part ofSharedProgress
). A timely operator must carefully manipulate those frequencies in order to let timely know at what times data might be produced from the operator. An operator can misuse these frequencies if for example it clears aconsumed
timestampt1
without adding it to itsinternal
timestamps and later decides that it wants to produce data att1
.With the exception of the raw operator builder timely provides a safe API that ensures timestamp frequency counts are correct via the
CapabilityRef
andCapability
types. These act as a witness of the existence of aconsumed
orinternal
timestamp respectively.CapabilityRef misuse
The
Capability
type which represents aninternal
timestamp contains within it a shared reference to the internal timestamp frequency counts and it automatically subtracts its timestamp from the counts onDrop
. This allows users to keep aCapability
around in whatever way they wish and have the internal operator timestamp counts follow accordingly.However, the same is not true for the
CapabilityRef
type. When aCapabilityRef
is minted its timestamp is immediately subtracted fromconsumed
and provided to the user. This is a problem because if the operator logic held onto theCapabilityRef
then it could later produce data that is behind the frontier.In practice, holding onto a
CapabilityRef
across operator invocations is quite difficult as the user will be tasked with storing an input handle and its producedCapabilityRef
(which is lifetimed) at the same time. Self-referencial structs are notoriously difficult (but not impossible nor unsound) to do in Rust and so synchronous operators don't usually face this problem.This situation becomes trivially possible with an async operator. In async rust, where the compiler is tasked with generating a struct that captures the stack of a future at any given yield point, making this self-referencial is easy. A user simply needs to obtain a
CapabilityRef
from an input handle and thenawait
on something. When this yield point is reached the current stack will be preserved, keeping theCapabilityRef
alive across timely invocations.Solution in this PR
The solution this PR implements is to make
CapabilityRef
behave in the same way as aCapability
. Instead of relying on the difficulty of self-referencial structs in Rust this PR adds an extra guard field inCapabilityRef
that when dropped it will update theconsumed
timestamps of the operator accordingly. This allows users to keepCapabilityRef
s around for as long as they wish.