Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not force collators to update after enabling async backing #1920

Merged
merged 11 commits into from
Oct 19, 2023

Conversation

bkchr
Copy link
Member

@bkchr bkchr commented Oct 17, 2023

The validators are checking if async backing is enabled by checking the version of the runtime api. If the runtime api is upgraded by a runtime upgrade, the validators start to also enable the async backing logic. However, just because async backing is enabled, it doesn't mean that all collators and parachain runtimes have upgraded. This pull request fixes an issue about advertising collations to the relay chain when it has async backing enabled, but the collator is still using the old networking protocol. The implementation is actually backwards compatible as we can not expect that everyone directly upgrades. However, the collation advertisement logic was requiring V2 networking messages after async backing was enabled, which was wrong. This is now fixed by this pull request.

Closes: #1923

Copy link
Member

@eskimor eskimor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I think this should actually work. For @rphmeier comment, I don't think we need to check the parent here for synchronous backing as it is checked in candidate validation anyway. This requires fetching the collation, but adversaries also have other means for letting us fetch a collation that turns out invalid, so checking this beforehand does not help that much.

(Nothing gets worse than it was before.)

@Polkadot-Forum
Copy link

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/polkadot-digest-18-oct-2023/4318/1

@bkchr bkchr added the T0-node This PR/Issue is related to the topic “node”. label Oct 18, 2023
@ordian ordian requested a review from slumber October 18, 2023 09:34
if !per_relay_parent.collations.is_seconded_limit_reached(relay_parent_mode) {
return Err(AdvertisementError::SecondedLimitReached)
}

if let Some((candidate_hash, parent_head_data_hash)) = prospective_candidate {
let is_seconding_allowed = !relay_parent_mode.is_enabled() ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!relay_parent_mode.is_enabled() || is there in case two nodes communicate with V2 but async backing is still off, meaning we don't need to check for can_second -- there's only one slot for candidate per relay parent anyway.

Also, seems like this branch won't be executed at all even with async backing enabled if collator chooses to communicate with v1, which opens some room for spamming the node with validation work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1920 (comment)

Also, seems like this branch won't be executed at all even with async backing enabled if collator chooses to communicate with v1, which opens some room for spamming the node with validation work.

You mean because we would add the collation without further checks?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1920 (comment)

In the previous version of this PR use_non_prospective_chains_mode was defined based on what collator told us, in the original code it relies on the runtime request from validators.

You mean because we would add the collation without further checks?

The idea is to only request collation once the parent parachain block becomes backable. Until then, we don't do any unnecessary work. With this code collator can

  1. Choose to use v1 networking
  2. Send maximum number of sequentially based collations and validators will start requesting & validating without waiting for the "backable parent" part.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, in the v1 networking version it should only accept parent_head_data_hash == current_head_hash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, in the v1 networking version it should only accept parent_head_data_hash == current_head_hash.

Should be done now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this line should be reverted:

} else {
// Relay parent is unknown or async backing is disabled.
false

If validator and collator both use v2 while async backing is disabled this will block all advertisements.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done @slumber

@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable 1/3
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4010046

@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable 1/3
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4010048

Copy link
Contributor

@slumber slumber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it works based on insert_advertisement call in handle_advertisement.

if !per_relay_parent.collations.is_seconded_limit_reached(relay_parent_mode) {
return Err(AdvertisementError::SecondedLimitReached)
}

if let Some((candidate_hash, parent_head_data_hash)) = prospective_candidate {
let is_seconding_allowed = !relay_parent_mode.is_enabled() ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this line should be reverted:

} else {
// Relay parent is unknown or async backing is disabled.
false

If validator and collator both use v2 while async backing is disabled this will block all advertisements.

@bkchr bkchr enabled auto-merge (squash) October 19, 2023 11:01
Comment on lines 1828 to 1834
(CollationVersion::V1, _) =>
request_persisted_validation_data(
ctx.sender(),
candidate_receipt.descriptor().relay_parent,
candidate_receipt.descriptor().para_id,
)
.await?,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we would've requested pvd from chain even for V2 protocol in case async backing is disabled.
This is correct, because prospective parachains subsystem doesn't do any job otherwise.

With new version this is not the case anymore, and here it implies that V2 -> async backing is enabled, which is wrong assumption.

So it should be

(CollationVersion::V2, Some(ProspectiveCandidate { parent_head_data_hash, .. })) 
    if relay_parent_mode.is_enabled() => // <-- this
				request_prospective_validation_data(
					ctx.sender(),
					relay_parent,
					parent_head_data_hash,
					pending_collation.para_id,
				)
				.await?,

@bkchr bkchr merged commit b967ba5 into master Oct 19, 2023
8 checks passed
@bkchr bkchr deleted the bkchr-not-force-collator-update branch October 19, 2023 19:45
ordian added a commit that referenced this pull request Oct 20, 2023
* tsv-disabling: (39 commits)
  Handling of disabled validators in backing subsystem (#1259)
  Switch trie cache random seed (#1935)
  Expose prometheus metrics for minimal-relay-chain node in collators (#1942)
  Do not force collators to update after enabling async backing (#1920)
  Pin PRDoc image to v0.0.5 until we are ready for v0.0.6 (#1947)
  [prdoc] Start BEEFY gadget by default for Polkadot nodes (#1945)
  Update bridges subtree  (#1944)
  bump zombienet version (#1931)
  [FRAME] Message Queue use proper overweight limit (#1873)
  Cumulus: Allow aura to use initialized collation request receiver (#1911)
  Use prebuilt try-runtime binary in CI (#1898)
  Update kusama/polkadot bootnodes (#1895)
  Introduce XcmFeesToAccount fee manager (#1234)
  upgraded review bot to v2.1.0 (#1908)
  Trading trait and deal with metadata in Mutate trait for nonfungibles_v2 (#1561)
  Add Runtime Missing Crate Descriptions (#1909)
  Switch to the release env (#1910)
  Bump paritytech/review-bot from 2.0.1 to 2.1.0 (#1924)
  Bump actions/checkout from 4.1.0 to 4.1.1 (#1925)
  Start BEEFY client by default for Polkadot nodes (#1913)
  ...
tdimitrov pushed a commit that referenced this pull request Oct 23, 2023
The validators are checking if async backing is enabled by checking the
version of the runtime api. If the runtime api is upgraded by a runtime
upgrade, the validators start to also enable the async backing logic.
However, just because async backing is enabled, it doesn't mean that all
collators and parachain runtimes have upgraded. This pull request fixes
an issue about advertising collations to the relay chain when it has
async backing enabled, but the collator is still using the old
networking protocol. The implementation is actually backwards compatible
as we can not expect that everyone directly upgrades. However, the
collation advertisement logic was requiring V2 networking messages after
async backing was enabled, which was wrong. This is now fixed by this
pull request.

Closes: #1923

---------

Co-authored-by: eskimor <eskimor@users.noreply.github.com>
bkchr pushed a commit that referenced this pull request Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T0-node This PR/Issue is related to the topic “node”.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some Rococo Parachains Stuck
7 participants