Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use higher priority for PVF preparation in dispute/approval context #4172
Use higher priority for PVF preparation in dispute/approval context #4172
Changes from 1 commit
09772da
ba36b2d
89bf59c
750c33a
e334d84
d2fcc44
3d26ea7
28f979f
79163e4
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we never used the priority logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(commenting here because I can't comment higher)
Please correct me if I'm wrong, but from a quick reading of this code, it seems to me that the priority is not for PVF execution, but for PVF compilation (prepare).
If that's the case, first, the comments here are misleading. And second, these changes would only help if the nodes restart and have no artifacts for old PVFs that are relevant for unapproved blocks or maybe the PVF artifact pruning is too aggressive?
Normally, I would expect them to have the artifacts even for new PVFs due to pre-checking. What would probably help if preparation is slow is having more workers for PVF compilation (currently, it's 1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is currently used for preparation only. You are right, with pre-checking the artifact should be already there. It helps mostly with missing artifacts due to prunning or executor parameter changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't say it is too aggressive, we prune an artifact if it hasn't been used for 24 hours, which seems like a fair measure to prevent artifact cache bloating.
Generally, yes, but if a validator pre-checked a PVF and didn't use it for 24 hours, then again, it has to re-compile it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU, pruning after 24 hours only applies if we haven't used the artifact. For example, if we were out of the active (parachain) validator set. Otherwise, they should not be pruned. Assuming minimal churn in validator set changes, this was not the case in the scenario we observed.
For old artifacts, we should have them compiled unless we just started the node (in which case, yes, we should prioritize compiling PVFs of the unapproved blocks first). If we have them compiled already, then we should actually prioritize compiling new PVFs as soon as we see them (backing). So again, this change would not have helped in this case. Bumping preparation workers would though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the executor parameter changed case . We are more often assigned to validate candidates rather than back candidates for which the PVF needs to be recompiled. I don't understand why would backing priority helps in this case. Assuming we prioritize backing, we'd basically no-show on more of the assignments we have for many blocks, leading to other tranches triggering increasing the work of approval subsystems needlessly as finality lag increases. Bumping prepration workers hard cap might be a better choice for the time being along with this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. But it's still not the best behavior possible. I'd better limit the artifact cache size and remove the most stale artifact on its overflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was describing a case when we have PVFs already compiled for unapproved (old) blocks. In that case, we would not no-show. What I meant is that we should put new PVF for preparation as soon as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean, but in this case, the preparation backlog should be empty and we would prepare new PVFs as soon as we see them anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backing was deemed more important than approvals and disputes? 😨
If I understand it correctly that is an awesome find and fix. I was not aware of it. It also aligns much better with out backpressure plans - first thing you should stop doing when overworked is backing, NOT approvals.
But despite the comments indicating that... I can't find any anything in the original code that sets anything to
Priority::Critical
. Huh.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am looking at the execution path as well, but that seem to require a lot more changes.
However the backing execution backpressure to activate at the network level requires at least the backing group to be overloaded, but that also depends on the validator hardware, some might be faster or some might provision more CPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked the original paritytech/polkadot#2710 and it seems it was like that from the very beginning 🤷
Maybe @pepyakin could remember why it was implemented like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, there is a good argument to have for backing being first, bare in mind that in the pre-async, backing a candidate is in a really tight deadline. On approvals and disputes you must not fall behind too much on work, but it does not need to happen under a tight deadline .
On a very high level our scheduling policy needs to answer to the following questions, do we have a backlog of work(approval & dispute) we can't handle or is it just a temporary spike, if it is a temporary spike then the right call is to actual prioritise backing first and get back to approvals and disputes after it.
So, you actually want to handle a backing request as soon as possible and fill in the gaps with the approvals and disputes and only if our backlog of approvals and disputes gets out of hand throttle backing, otherwise we might accidentally throttle backing too often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you might be right that this was far worse before async backing, but even now it is not so great either. Depending on the actual PVF, the preparation can take 10-20s so the relay parent could go out of scope. Given that we have 1 single worker doing this right now, finality can be delayed for quite a lot of time and this would affect finality on all parachains vs just one parachain that won't be able to get it's blocks included for a while
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we could also bump
prepare_workers_hard_max_num
from 1 to 2. This would be used only for preparing PVFs tagged withCritical
priority. Or maybe leave it for changing in #4126 . WDYT ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a really good idea, but again, it would be good to have a number of CPU cores reserved in the hardware requirements for the maximum number of workers that may run in parallel.