Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax pointstamp validation #331

Merged

Conversation

frankmcsherry
Copy link
Member

In #327 the pointstamp validation logic was made more strict, requiring either that an input message be consumed or that a capability existed. This is too strict, owing to the hierarchical nature of progress tracking.

Specifically, running differential's bfs the following way, with two processes

cargo run --example bfs -- 10 10 10 100000000 no -w1 -n2 -p0
cargo run --example bfs -- 10 10 10 100000000 no -w1 -n2 -p1

results in a crash.

The reason here is (I think) that a subgraph, which presents upward as an operator, may internally receive a message from another peer. The subgraph presents the existence of that message upward as an internal capability. However, the subgraph operator may not have yet received notice of any incoming message, nor hold any other capability. Nonetheless, nothing is actually wrong (other than that the protocol is hard to verify locally).

I think the summary is: prior to #327 the validate_progress test could pass progress information that could be invalid (as the issue notes, because races could mean that the progress information is about to change). After #327 the validate_progress test can fail progress information that is valid.

There is a legit complaint that the progress traffic is hard to locally verify, which I 100% agree with. I think there is a dopey answer that is "operators should buffer progress information that does not yet make sense" which is permitted because (unless there are bugs) there no safety requirements that progress traffic move at any speed.

@frankmcsherry
Copy link
Member Author

Landing this, but with an eye towards more effective validation. Issue #107 seems like a good place to conduct further discussion, and I'll record some notes there!

@frankmcsherry frankmcsherry merged commit 888c81f into TimelyDataflow:master Jun 24, 2020
@github-actions github-actions bot mentioned this pull request Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant