Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In #327 the pointstamp validation logic was made more strict, requiring either that an input message be consumed or that a capability existed. This is too strict, owing to the hierarchical nature of progress tracking.
Specifically, running differential's
bfs
the following way, with two processesresults in a crash.
The reason here is (I think) that a subgraph, which presents upward as an operator, may internally receive a message from another peer. The subgraph presents the existence of that message upward as an internal capability. However, the subgraph operator may not have yet received notice of any incoming message, nor hold any other capability. Nonetheless, nothing is actually wrong (other than that the protocol is hard to verify locally).
I think the summary is: prior to #327 the
validate_progress
test could pass progress information that could be invalid (as the issue notes, because races could mean that the progress information is about to change). After #327 thevalidate_progress
test can fail progress information that is valid.There is a legit complaint that the progress traffic is hard to locally verify, which I 100% agree with. I think there is a dopey answer that is "operators should buffer progress information that does not yet make sense" which is permitted because (unless there are bugs) there no safety requirements that progress traffic move at any speed.