Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow runtime-wide TryState checks #235

Open
ggwpez opened this issue Jan 4, 2023 · 30 comments · May be fixed by #4631
Open

Allow runtime-wide TryState checks #235

ggwpez opened this issue Jan 4, 2023 · 30 comments · May be fixed by #4631
Labels
C1-mentor A task where a mentor is available. Please indicate in the issue who the mentor could be. D1-medium Can be fixed by a coder with good Rust knowledge but little knowledge of the codebase. I5-enhancement An additional feature request. T1-FRAME This PR/Issue is related to core FRAME, the framework.

Comments

@ggwpez
Copy link
Member

ggwpez commented Jan 4, 2023

Currently the TryState can only be utilized on a per-pallet level by implementing the try_state hook.
This limits the usage tor multi-pallet try_state functionality. Executive could be extended to accept an optional TryState which will then be executed by the try-runtime CLI on a runtime-wide level.

cc @kianenigma

@kianenigma
Copy link
Contributor

Yup, should be an easy one relatively (cc-ing @gpestana @Ank4n).

@bkchr
Copy link
Member

bkchr commented Feb 14, 2023

Executive could be extended to accept an optional TryState which will then be executed by the try-runtime CLI on a runtime-wide level.

We don't really need to forward this into Executive. We have all pallets as tuple in the runtime and just calling AllPalletsWithSystem::try_state should be enough. Before that we would need to change:
https://github.com/paritytech/substrate/blob/05d11c213b8e097a5368a887b6d055db61ce05db/frame/support/procedural/src/pallet/expand/hooks.rs#L210-L226

to:

		#frame_support::try_runtime_enabled! {
			impl<#type_impl_gen>
				#frame_support::traits::TryState<<T as #frame_system::Config>::BlockNumber>
				for #pallet_ident<#type_use_gen> #where_clause
			{
				fn try_state(
					n: <T as #frame_system::Config>::BlockNumber,
					_s: #frame_support::traits::TryStateSelect
				) -> Result<(), &'static str> {
					#log_try_state
					<
						Self as #frame_support::traits::Hooks<
							<T as #frame_system::Config>::BlockNumber
						>
					>::try_state(n)
				}
			}
		}

@kianenigma
Copy link
Contributor

@bkchr I am not sure why you are against this. The main goal is to enable a try-state check that will look into multiple pallets. The only way to achieve this would be through doing it at the runtime (probably via executive)

@bkchr
Copy link
Member

bkchr commented Mar 10, 2023

I did not say that I'm against this. I just said that we don't need to move this to frame-executive, because we just can call AllPalletsWithSystem::try_state in the runtime. If you now call Executive::try_state or what I have written before, there would be no difference. (just less code in frame-executive)

@liamaharon
Copy link
Contributor

So we can already write and execute TryState checks in Executive? Then we can close this issue?

@juangirini juangirini transferred this issue from paritytech/substrate Aug 24, 2023
@liamaharon
Copy link
Contributor

liamaharon commented Aug 25, 2023

So we can already write and execute TryState checks in Executive? Then we can close this issue?

On further thought I don't this is sufficient. Most runtimes use the Executive pallet from FRAME rather than defining their own. I think we'd want some way to pass try-state hooks into the executive pallet, like we do with migrations, so that chain-specific try-state checks can be written

@the-right-joyce the-right-joyce added I5-enhancement An additional feature request. C1-mentor A task where a mentor is available. Please indicate in the issue who the mentor could be. T1-FRAME This PR/Issue is related to core FRAME, the framework. and removed J0-enhancement labels Aug 25, 2023
@ggwpez ggwpez added the D1-medium Can be fixed by a coder with good Rust knowledge but little knowledge of the codebase. label Feb 4, 2024
@ggwpez
Copy link
Member Author

ggwpez commented Feb 28, 2024

I just said that we don't need to move this to frame-executive, because we just can call AllPalletsWithSystem::try_state in the runtime

You mean we copy&paste the same try-state check into each runtime where we want to use it?
I think we could for example have a multi-pallet check for Staking that takes a few generics and then checks that all the staking pallets have consistent overall state.
This can then be re-used, but yes it not necessarily be in Executive.

@kianenigma
Copy link
Contributor

Sensing some confusion while reading this.

So, looking at the issue again afresh, we need to do exactly what we do with 'COnRuntimeUpgrade': Pass a new 'CTryState = ()' to executive, and run all checks on '(AllPallets, CTryState)' instead. Sounds right?

jonathanudd pushed a commit to jonathanudd/polkadot-sdk that referenced this issue Apr 10, 2024
@ntn-x2
Copy link

ntn-x2 commented Apr 23, 2024

Any update on this issue? I just stumbled across a similar need, and the only way around it that I found is feature-gating an additional type parameter on each pallet's Config, to allow the runtime to be injected in there.

@ntn-x2
Copy link

ntn-x2 commented May 20, 2024

@kianenigma @ggwpez I've been going over this issue and this code again, and I'm willing to put together a PR if we can agree on a design for the solution.

As far as I can see, the same checks are repeated both in try_runtime_upgrade and in try_execute_block for the frame Executive. Currently, the frame_try_runtime::TryRuntime trait implementation for a runtime, or at least our runtime, simply calls Executive::try_runtime_upgrade and Executive::try_execute_block, which in turns, as I said, delegates those calls to various sub-components, including the specified AllPalletsWithSystem, which right now it's an implementation details and cannot be changed.

The easiest option would be to include a new generic that implements the TryState trait and simply call (AllPalletsWithSystem, TryStateType)::try_state... everywhere where needed. I don't really understand the solution proposed by @bkchr, on which maybe you can shed a bit more light? 😊 From what I've found, as long as we're not able to "override" the implementation details of the executive, we can't implement custom runtime-wide try-state checks in a reliable way, but I'm not very deep into the technicalities. On a side note, I also find it "weird" that the OnRuntimeUpgrade generic for the executive has to only implement the OnRuntimeUpgrade logic and not also the TryState logic, which would fix this issue.

@ntn-x2
Copy link

ntn-x2 commented May 20, 2024

So we could replace

if checks.try_state() {
    let _guard = frame_support::StorageNoopGuard::default();
    <AllPalletsWithSystem as frame_support::traits::TryState<BlockNumberFor<System>>>::try_state(
        frame_system::Pallet::<System>::block_number(),
	frame_try_runtime::TryStateSelect::All,
    )?;
}

with

if checks.try_state() {
    let _guard = frame_support::StorageNoopGuard::default();
    <(COnRuntimeUpgrade, AllPalletsWithSystem) as frame_support::traits::TryState<BlockNumberFor<System>>>::try_state(
        frame_system::Pallet::<System>::block_number(),
	frame_try_runtime::TryStateSelect::All,
    )?;
}

everywhere where relevant (this is taken from substrate v1.0, but it seems it still applies here and here as of today).

@liamaharon
Copy link
Contributor

So we could replace

if checks.try_state() {
    let _guard = frame_support::StorageNoopGuard::default();
    <AllPalletsWithSystem as frame_support::traits::TryState<BlockNumberFor<System>>>::try_state(
        frame_system::Pallet::<System>::block_number(),
	frame_try_runtime::TryStateSelect::All,
    )?;
}

with

if checks.try_state() {
    let _guard = frame_support::StorageNoopGuard::default();
    <(COnRuntimeUpgrade, AllPalletsWithSystem) as frame_support::traits::TryState<BlockNumberFor<System>>>::try_state(
        frame_system::Pallet::<System>::block_number(),
	frame_try_runtime::TryStateSelect::All,
    )?;
}

everywhere where relevant (this is taken from substrate v1.0, but it seems it still applies here and here as of today).

This sounds reasonable to me, similar API as runtime upgrades.

@ntn-x2
Copy link

ntn-x2 commented May 21, 2024

This sounds reasonable to me, similar API as runtime upgrades.

It would nevertheless be a breaking change since COnRuntimeUpgrade is now required to implement the TryState trait as well. Having a second generic with a default value of () would not be a breaking change, on the other hand, but I feel this approach would be less scalable in the long-term.

@liamaharon
Copy link
Contributor

Yeah I wouldn't add it to the existing trait. Maybe a new one like COnTryState?

@kianenigma
Copy link
Contributor

Sensing some confusion while reading this.

So, looking at the issue again afresh, we need to do exactly what we do with 'COnRuntimeUpgrade': Pass a new 'CTryState = ()' to executive, and run all checks on '(AllPallets, CTryState)' instead. Sounds right?

have you tried this?

@ntn-x2
Copy link

ntn-x2 commented May 21, 2024

@kianenigma that solution would work, for sure. I am wondering whether it's the best we can do here. If there's yet a new testing function, that would most likely mean a new generic to account for those? But if you guys think that's fine, I can go ahead and work up a little PR that adds the new feature.

@kianenigma
Copy link
Contributor

If there's yet a new testing function, that would most likely mean a new generic to account for those

The nice thing about generics is that they can have defaults, so this is not a breaking change, as long as you add it to the end of the list.

This is not pretty or elegant, but it is backwards compat, and near zero effort, I would do this.

Longer term, I have shared some ideas around improving the development surface of executive with @gupnik or in other issues, but I am not sure if they are formalized or worth doing at this point.

@liamaharon
Copy link
Contributor

I agree with @kianenigma this approach is worth trying. It shouldn't be breaking, COnRuntimeUpgrade type is actually optional on Executive so CTryState should be able to be as well.

@ntn-x2
Copy link

ntn-x2 commented May 21, 2024

Yes, I agree this approach would not be breaking and relatively straightforward. I was just concerned it might not be the best, but I'm happy to hear there's discussion going on about improving DX on that side.

I'll try to work a PR shortly, and would be great to get this backported to at least as back as 1.6.

@ntn-x2
Copy link

ntn-x2 commented May 22, 2024

The matter is less trivial than I was expecting. The TryState trait is implemented for tuples, but only where all the elements implement the PalletInfoAccess trait as well, which is clearly not required for a runtime-wide test:

for_tuples!( where #( Tuple: crate::traits::PalletInfoAccess )* );
.

Also, I was hoping for a solution where whatever type wants to implement TryState, would not have to deal with the targets: Select argument of the trait, but rather implement something closer to the try_state function exposed as part of the Hooks trait.

I am wondering if declaring another trait that does not have the second argument, have a blanket impl of that trait for everything that implements the TryState trait except for AllPallets* types (there's no native way to exclude types from a blanket impl), and manually implement the new trait for AllPallets*. Otherwise, I see no other way to solve this issue.

Cc @kianenigma since I see he was the one working on this feature last.

Or alternatively require the new type to also implement PalletInfoAccess, making it clear that it would not be on the same level as an actual pallet since it would only be used in the TryState filtering logic, but this feels dirty to me as well.

@liamaharon
Copy link
Contributor

Do we know what does it use PalletInfoAccess for?

@ntn-x2
Copy link

ntn-x2 commented May 22, 2024

@liamaharon yes, you can filter which pallets you want to run the tests for:

#( (<Tuple as crate::traits::PalletInfoAccess>::name(), Tuple::try_state) ),*
.

@liamaharon
Copy link
Contributor

liamaharon commented May 22, 2024

How about we add a new method id and remove the PalletInfoAccess requirement?

Something like

pub trait TryState<BlockNumber> {
	/// Execute the state checks.
	fn try_state(_: BlockNumber, _: Select) -> Result<(), TryRuntimeError>;
        fn id() -> String
}

When pallet implements it, it can use its name. This might be a bit tricky, let me know if u get stuck.

@ntn-x2
Copy link

ntn-x2 commented May 22, 2024

This would be a breaking change, not sure if that's fine? I think we were trying to solve the issue in a way that does not introduce breaking changes.

@ntn-x2
Copy link

ntn-x2 commented May 24, 2024

@liamaharon I am not sure that would be an ideal design. The trait already requires the implementor to perform some try_state depending on the checks argument that it is passed. Having an id() -> String method on the trait would add unnecessary duplication; also, this is a requirement that Executive seems to have, but it would not necessarily apply to other trait implementors. Rather, I would go with adding a new trait that contains the id() -> String function and require that in pallet executive rather. Then we can implement the trait for PalletInfoAccess implementing types as well. I am not sure there is a method within Substrate codebase that allows to return a specific type and that also takes &self as a parameter, something like a GetWithArg<T> which has a fn get(&self) -> T method.

What's your opinion on that?

@liamaharon
Copy link
Contributor

Hm yes you are right. I missed that id would need to be manually specified, I initially thought it could be done in the pallet macros.

Your approach sounds good to try, let's see how it looks

@ntn-x2
Copy link

ntn-x2 commented May 27, 2024

Your approach sounds good to try, let's see how it looks

@liamaharon what approach are you referring to? Adding a new trait that has the id() -> String function? Or adding a GetWithArg<Arg, Output> trait?

@ntn-x2
Copy link

ntn-x2 commented May 27, 2024

Ok I have to pause work on this again right now, but I found that returning a String ID does not work either, as it would not be implementable by the AllPalletsWithSystem and AllPalletsWithoutSystem types (what's the return value when iterating over a tuple of elements each with its own ID?).
I found an alternative way, which is for the tuple to simply implement an Eq<&[u8]> logic, and the specific try_state is executed if any of the provided names matches what was provided as argument.

@kianenigma
Copy link
Contributor

How about we add a new method id and remove the PalletInfoAccess requirement?

Something like

pub trait TryState<BlockNumber> {
	/// Execute the state checks.
	fn try_state(_: BlockNumber, _: Select) -> Result<(), TryRuntimeError>;
        fn id() -> String
}

When pallet implements it, it can use its name. This might be a bit tricky, let me know if u get stuck.

Maybe it is not a breaking change if we provide an auto-implementation of fn id() when this is being generated from within a pallet? also, if this is used just for logging, providing something like this is reasonable to me:

pub trait TryState<BlockNumber> {
	/// Execute the state checks.
	fn try_state(_: BlockNumber, _: Select) -> Result<(), TryRuntimeError>;
        fn id() -> String { "PLEASE-PROVIDE-ID" }
}

@ntn-x2 ntn-x2 linked a pull request May 29, 2024 that will close this issue
@ntn-x2
Copy link

ntn-x2 commented May 29, 2024

@kianenigma @liamaharon I opened a PR that fixes this: #4631. I did not find a way to make it work by adding an id() -> String function to the TryState trait, which must be implemented for tuples (since AllPalletsWithSystem must implement it), but I found a nice solution that is backwards compatible, and allows developer to hook their try-state logic without having to implement any logic regarding the Select parameter of the try_state function. Eager to get your feedback on it.

github-merge-queue bot pushed a commit that referenced this issue Nov 5, 2024
This PR updates litep2p to the latest release.

- `KademliaEvent::PutRecordSucess` is renamed to fix word typo
- `KademliaEvent::GetProvidersSuccess` and
`KademliaEvent::IncomingProvider` are needed for bootnodes on DHT work
and will be utilized later


### Added

- kad: Providers part 8: unit, e2e, and `libp2p` conformance tests
([#258](paritytech/litep2p#258))
- kad: Providers part 7: better types and public API, public addresses &
known providers ([#246](paritytech/litep2p#246))
- kad: Providers part 6: stop providing
([#245](paritytech/litep2p#245))
- kad: Providers part 5: `GET_PROVIDERS` query
([#236](paritytech/litep2p#236))
- kad: Providers part 4: refresh local providers
([#235](paritytech/litep2p#235))
- kad: Providers part 3: publish provider records (start providing)
([#234](paritytech/litep2p#234))

### Changed

- transport_service: Improve connection stability by downgrading
connections on substream inactivity
([#260](paritytech/litep2p#260))
- transport: Abort canceled dial attempts for TCP, WebSocket and Quic
([#255](paritytech/litep2p#255))
- kad/executor: Add timeout for writting frames
([#277](paritytech/litep2p#277))
- kad: Avoid cloning the `KademliaMessage` and use reference for
`RoutingTable::closest`
([#233](paritytech/litep2p#233))
- peer_state: Robust state machine transitions
([#251](paritytech/litep2p#251))
- address_store: Improve address tracking and add eviction algorithm
([#250](paritytech/litep2p#250))
- kad: Remove unused serde cfg
([#262](paritytech/litep2p#262))
- req-resp: Refactor to move functionality to dedicated methods
([#244](paritytech/litep2p#244))
- transport_service: Improve logs and move code from tokio::select macro
([#254](paritytech/litep2p#254))

### Fixed

- tcp/websocket/quic: Fix cancel memory leak
([#272](paritytech/litep2p#272))
- transport: Fix pending dials memory leak
([#271](paritytech/litep2p#271))
- ping: Fix memory leak of unremoved `pending_opens`
([#274](paritytech/litep2p#274))
- identify: Fix memory leak of unused `pending_opens`
([#273](paritytech/litep2p#273))
- kad: Fix not retrieving local records
([#221](paritytech/litep2p#221))

See release changelog for more details:
https://github.com/paritytech/litep2p/releases/tag/v0.8.0

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
lexnv added a commit that referenced this issue Nov 15, 2024
This PR updates litep2p to the latest release.

- `KademliaEvent::PutRecordSucess` is renamed to fix word typo
- `KademliaEvent::GetProvidersSuccess` and
`KademliaEvent::IncomingProvider` are needed for bootnodes on DHT work
and will be utilized later

- kad: Providers part 8: unit, e2e, and `libp2p` conformance tests
([#258](paritytech/litep2p#258))
- kad: Providers part 7: better types and public API, public addresses &
known providers ([#246](paritytech/litep2p#246))
- kad: Providers part 6: stop providing
([#245](paritytech/litep2p#245))
- kad: Providers part 5: `GET_PROVIDERS` query
([#236](paritytech/litep2p#236))
- kad: Providers part 4: refresh local providers
([#235](paritytech/litep2p#235))
- kad: Providers part 3: publish provider records (start providing)
([#234](paritytech/litep2p#234))

- transport_service: Improve connection stability by downgrading
connections on substream inactivity
([#260](paritytech/litep2p#260))
- transport: Abort canceled dial attempts for TCP, WebSocket and Quic
([#255](paritytech/litep2p#255))
- kad/executor: Add timeout for writting frames
([#277](paritytech/litep2p#277))
- kad: Avoid cloning the `KademliaMessage` and use reference for
`RoutingTable::closest`
([#233](paritytech/litep2p#233))
- peer_state: Robust state machine transitions
([#251](paritytech/litep2p#251))
- address_store: Improve address tracking and add eviction algorithm
([#250](paritytech/litep2p#250))
- kad: Remove unused serde cfg
([#262](paritytech/litep2p#262))
- req-resp: Refactor to move functionality to dedicated methods
([#244](paritytech/litep2p#244))
- transport_service: Improve logs and move code from tokio::select macro
([#254](paritytech/litep2p#254))

- tcp/websocket/quic: Fix cancel memory leak
([#272](paritytech/litep2p#272))
- transport: Fix pending dials memory leak
([#271](paritytech/litep2p#271))
- ping: Fix memory leak of unremoved `pending_opens`
([#274](paritytech/litep2p#274))
- identify: Fix memory leak of unused `pending_opens`
([#273](paritytech/litep2p#273))
- kad: Fix not retrieving local records
([#221](paritytech/litep2p#221))

See release changelog for more details:
https://github.com/paritytech/litep2p/releases/tag/v0.8.0

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C1-mentor A task where a mentor is available. Please indicate in the issue who the mentor could be. D1-medium Can be fixed by a coder with good Rust knowledge but little knowledge of the codebase. I5-enhancement An additional feature request. T1-FRAME This PR/Issue is related to core FRAME, the framework.
Projects
Status: Backlog
Development

Successfully merging a pull request may close this issue.

7 participants