Collator execution environment parametrization #6497

s0me0ne-unkn0wn · 2023-01-03T17:09:04Z

See: paritytech/polkadot-sdk#917, PR #6161

After adopting #6161, we should make collators aware of execution environment parameters, too. Otherwise, they could submit a PVF that they checked to be valid, but that would be invalid for any validator because of different execution environment parameters in the relay chain runtime.

It's not an urgent concern as, right now, execution environment parameters are set to default values, and collators share the same defaults, so the situation described cannot happen. It will only become problematic if we change any execution environment parameters in the relay chain runtime.

s0me0ne-unkn0wn · 2023-03-01T20:39:33Z

After some research (thanks @ordian for the consultations) I didn't find anything that has to be done here.

A PVF is never prepared nor executed on the collator side. A collator submits PVF to the relay chain where it is included in a block and caught up by the PVF pre-checker which is already parametrized. Pre-checkers vote for the validation code hash, and if supermajority ensures it can produce a valid artifact, it is considered valid. That is, all the checking is done on the relay chain side.

So I'm closing it for now, feel free to reopen if I'm missing something.

eskimor · 2023-03-03T13:58:49Z

They could submit a block/PoV that would be considered invalid. PVF is less of a problem: All that could happen is that a runtime upgrade gets rejected.

s0me0ne-unkn0wn · 2023-03-03T14:21:35Z

They could, but how to avoid it? I mean, without bringing the whole executor infrastructure into the collator code?

eskimor · 2023-03-03T16:26:51Z

As long as we are only using them for limits which honest code should never exceed any way, it does not matter too much, but once we also use it to upgrading/switching compilers it is starting to matter. I mean if the risk of incompatibilities is low enough, we might still be able to punt on it for quite a while. Worst thing that can happen is that a parachain is stalled until they upgraded their nodes - bad enough for a production parachain though.

Considering that we have Kusama this might still be acceptable for quite a while: If things break, they break on Kusama first (where chaos is expected) and collators can upgrade their nodes, before we hit Polkadot.

Also I just found this comment: This is actually an interestingidea: If collators demand a particular execution environment in the candidate receipt, this would work as well - at the cost of complicating validators.

Anyhow, resolving this can easily wait until we have versioning - we might make creation of these tickets a todo on that ticket, so we have less clutter until the time comes when we should start thinking about it.

bkchr · 2023-03-03T20:10:33Z

They could, but how to avoid it? I mean, without bringing the whole executor infrastructure into the collator code?

The whole executor (substrate) infrastructure is already in the collator code. At some point we will need to fetch these parameters as well for configuring the block building environment.

s0me0ne-unkn0wn · 2023-03-03T20:46:20Z

The whole executor (substrate) infrastructure is already in the collator code

That's true, but currently, the collator doesn't feature any interface to the executor. We could create one if need be, but do we really have to? The collator always runs a full node nearby, is it possible to use its already available executor interface (probably using RPC calls?) to check what we want to check and not to incorporate that logic into the collator itself?

My concern is that we'll have to support two different yet homogenous executor interfaces.

bkchr · 2023-03-03T20:48:57Z

I have later seen that @eskimor already created issues for the things paritytech/polkadot-sdk#2339

My concern is that we'll have to support two different yet homogenous executor interfaces.

I don't know exactly what you mean by this.

s0me0ne-unkn0wn · 2023-03-03T21:05:15Z

I don't know exactly what you mean by this.

Well, we have quite a bit of infrastructure on the node side to run PVFs and validate candidates. PVF host with preparation end execution pools/queues/workers, the executor interface itself, which implements validation backend that wraps the substrate executor and serves as an intermediary between execution requests and the execution itself, and so on. If we want similar functionality in collators, we should either abstract part of that infrastructure and reuse it in collators (which would create a kind of overkill if we don't put some effort into extensively tearing the abstractions apart), or implement a similar but more simple infrastructure in collators (which would result in some functionality duplication).

We can go either way, but I believe first we should evaluate if we can reuse something that already exists, that's why I'm asking if we can make use of the node running along with the collator. All those things are already there in the node software. If we implement an RPC call that accepts a candidate, validates it off-chain, and provides the collator with the answer like "valid/not valid", that would be enough. Do you think it makes any sense?

bkchr · 2023-03-04T07:16:27Z

If we want similar functionality in collators, we should either abstract part of that infrastructure and reuse it in collators (which would create a kind of overkill if we don't put some effort into extensively tearing the abstractions apart), or implement a similar but more simple infrastructure in collators (which would result in some functionality duplication).

Ahh okay, so you really spoke about the entire executor infrastructure. So no, we don't need this. From the collator POV the runtime is trusted. However, we will need to run block building under certain constraints as it is done while validation on the relay chain. For sure we need to run with less memory than we have while validating a block, because validation stores more stuff in memory. We need to respect the maximum stack depth. We need to build a block while ensuring that it doesn't take more time than we give backers to validate it #6793. And whatever things come up over time. Stuff like preparation time of the wasm binary isn't really important to check. Wasm binaries that don't fit into the timeout will not be accepted as runtime upgrade and the collator doesn't need to check this.

s0me0ne-unkn0wn · 2023-03-06T19:50:31Z

So no, we don't need this. From the collator POV the runtime is trusted. However, we will need to run block building under certain constraints as it is done while validation on the relay chain.

And to validate anything, we need at least the following:

A preparation worker running as a separate process that should prepare a runnable binary artifact out of Wasm PVF and which is time-limited not to cause a timeout during precheck on the relay chain;
An execution worker running as a separate process because it may die during the execution, and we want to be able to kill it on timeout;
An artifact storage because we don't want to prepare the artifact every time we want to validate;
Probably some other things I forgot.

So it seems to me like we still need pretty much the same infrastructure, just vastly scaled down, it doesn't need support parallel preparation and execution and other bells and whistles, but neither it looks like something trivial. Am I correct?

bkchr · 2023-03-06T20:15:05Z

So it seems to me like we still need pretty much the same infrastructure, just vastly scaled down, it doesn't need support parallel preparation and execution and other bells and whistles, but neither it looks like something trivial. Am I correct?

We already share the same infrastructure by using the sc-executor as basis for the validation.

s0me0ne-unkn0wn · 2023-03-06T20:22:36Z

We already share the same infrastructure by using the sc-executor as basis for the validation.

Sure, but I'm talking about the part between sc-executor and a subsystem that has a PVF as raw Wasm binary and a candidate and that wants to validate the candidate using the PVF. In validator software, it consists of the PVF host at the higher level and the executor interface at the lower level. We still need all that stuff in the collator code if we want the collator to be able to use sc-executor to validate anything, right?

rphmeier · 2023-03-08T18:36:04Z

In validator software, it consists of the PVF host at the higher level and the executor interface at the lower level. We still need all that stuff in the collator code if we want the collator to be able to use sc-executor to validate anything, right?

No, collators don't use the PVF mechanism. Their job is to make something that passes the PVF, and they do this with a Substrate runtime & client. It's also executed as WebAssembly, yes, so some things like stack depth and memory limits might need to be shared. But pre-checking or other execution worker logic is unnecessary. For a collator the runtime is trusted, for a validator the runtime is not.

s0me0ne-unkn0wn · 2023-03-08T19:15:10Z

I feel like I'm losing the thread 😅 Let's try from another end. I'll make a numbered list to address points easily.

What we're trying to achieve? IIUC, we want the collator to be able to validate a candidate it's just produced, and we want it to be done by the collator itself before it submits the candidate to the relay chain, right?
To validate a candidate, we must run the PVF on that candidate, right?
To run the PVF, we have to prepare binary artifact from Wasm code, in other words, to compile it, right?
Then we have to run that binary artifact providing it with the candidate as input, and if validation goes awry (timeout, crash, etc.), we don't want the whole collator process to go down, so we need to execute the binary in a separate process, right?
And that separate process, that is, the execution worker, should provide some glue between the collator code and the Substrate executor, parametrizing the Substrate executor in such a way it could execute the PVF properly, right?

Am I missing something? Because if I am not, it still sounds to me like "we need a somewhat scaled-down implementation of PVF host for collators" 🤷

bkchr · 2023-03-11T16:16:59Z

IIUC, we want the collator to be able to validate a candidate it's just produced, and we want it to be done by the collator itself before it submits the candidate to the relay chain, right?

No we don't want this. Normal mode is that you produce a PoV and send it to the relay chain. You are clearly trusting yourself to build a proper candidate ;)

s0me0ne-unkn0wn added the I3-bug Fails to follow expected behavior. label Jan 3, 2023

eskimor added this to Parachains-core Jan 5, 2023

github-project-automation bot moved this to To do in Parachains-core Jan 5, 2023

eskimor added the T5-parachains_protocol This PR/Issue is related to Parachains features and protocol changes. label Jan 5, 2023

eskimor moved this from To do to In progress in Parachains-core Jan 5, 2023

s0me0ne-unkn0wn closed this as completed Mar 1, 2023

github-project-automation bot moved this from In progress to Done in Parachains-core Mar 1, 2023

s0me0ne-unkn0wn mentioned this issue Mar 3, 2023

Include PVF execution and preparation timeouts into executor environment parameters #6793

Closed

s0me0ne-unkn0wn mentioned this issue Mar 14, 2023

PVF: Document that preparation cannot lead to disputes #6873

Merged

ordian mentioned this issue Jul 24, 2023

Companion for Polkadot#7337 paritytech/cumulus#2929

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collator execution environment parametrization #6497

Collator execution environment parametrization #6497

s0me0ne-unkn0wn commented Jan 3, 2023 •

edited

Loading

s0me0ne-unkn0wn commented Mar 1, 2023

eskimor commented Mar 3, 2023

s0me0ne-unkn0wn commented Mar 3, 2023

eskimor commented Mar 3, 2023

bkchr commented Mar 3, 2023

s0me0ne-unkn0wn commented Mar 3, 2023

bkchr commented Mar 3, 2023

s0me0ne-unkn0wn commented Mar 3, 2023

bkchr commented Mar 4, 2023

s0me0ne-unkn0wn commented Mar 6, 2023

bkchr commented Mar 6, 2023

s0me0ne-unkn0wn commented Mar 6, 2023

rphmeier commented Mar 8, 2023 •

edited

Loading

s0me0ne-unkn0wn commented Mar 8, 2023

bkchr commented Mar 11, 2023

Collator execution environment parametrization #6497

Collator execution environment parametrization #6497

Comments

s0me0ne-unkn0wn commented Jan 3, 2023 • edited Loading

s0me0ne-unkn0wn commented Mar 1, 2023

eskimor commented Mar 3, 2023

s0me0ne-unkn0wn commented Mar 3, 2023

eskimor commented Mar 3, 2023

bkchr commented Mar 3, 2023

s0me0ne-unkn0wn commented Mar 3, 2023

bkchr commented Mar 3, 2023

s0me0ne-unkn0wn commented Mar 3, 2023

bkchr commented Mar 4, 2023

s0me0ne-unkn0wn commented Mar 6, 2023

bkchr commented Mar 6, 2023

s0me0ne-unkn0wn commented Mar 6, 2023

rphmeier commented Mar 8, 2023 • edited Loading

s0me0ne-unkn0wn commented Mar 8, 2023

bkchr commented Mar 11, 2023

s0me0ne-unkn0wn commented Jan 3, 2023 •

edited

Loading

rphmeier commented Mar 8, 2023 •

edited

Loading