-
Notifications
You must be signed in to change notification settings - Fork 801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
async_backing: Candidate timeouts on group rotation boundary in versi #3165
Comments
The backing time should be irrelevant for this. As I described, I think on the parathreads ticket for determining the backing group we have to use the relay parent of the candidate. If this is not the case, we need to fix that. I don't see how anything else can work. |
Consequences in this comment. |
I confirmed that is not the case, reproducibility steps:
|
Nice find, and thanks! The linked PR fixes the OBO correctly. The backing group is determined based on the relay-parent and not the number the block was backed in. |
Fixes: #3165 --------- Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
This issue has been mentioned on Polkadot Forum. There might be relevant details there: https://forum.polkadot.network/t/async-backing-development-updates/6176/1 |
When running tick and glutton parachains on versi with async-backing ever 2-3 minutes candidate timeout on backing and that blocks the core for about 2 min.
Some logs regarding 1 candidate: https://grafana.teleport.parity.io/goto/Nke60BpSR?orgId=1.
Root cause
It seems that at group rotation boundary there is a problem in the way group assignment work and we end up in a situation where the collator and the backing subsystem end up using the group assignment before the rotation, but since the candidate is backed in a block after rotation the availability will use a different group for fetching the chunk which results in the candidate timing out.
The main culprit for this problem seems to be the backed assumption in runtime function validator_groups
polkadot-sdk/polkadot/runtime/parachains/src/runtime_api_impl/v7.rs
Line 46 in 8a8f6f9
Hence the usage of
validator_groups
in backing subsystem andgroup_responsible
polkadot-sdk/polkadot/runtime/parachains/src/runtime_api_impl/v7.rs
Line 108 in 8a8f6f9
availability-distribution
will give us different groups, so the candidate never passes the availability part.@rphmeier: Thoughts on how to fix this ?
The text was updated successfully, but these errors were encountered: