Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Replace RollingSessionWindow in approval-voting with RuntimeInfo #7123

Merged
merged 26 commits into from
May 12, 2023

Conversation

tdimitrov
Copy link
Contributor

@tdimitrov tdimitrov commented Apr 24, 2023

closes #6144

This is the 2nd change for #6812. It's similar to #6968 but addresses approval-voting.

Please refer to the description in #6968 for some context.

This PR introduces SessionInfoProvider which wraps RuntimeInfo and some additional fields for book keeping. Unlike dispute-coordinator, approval-voting doesn't wait for the first leaf to initialize its session information and directly starts its main loop. For this reason RollingSessionWindow was wrapped in Option. It is None on startup and gets initialized on first leaf update. I can't see a reason for this to be honest but it probably handles some edge case so I kept this behaviour. If this is not the case - please leave a comment. Removing this might make the overall code a bit nicer.

RollingSessionWindow was part of State. However RuntimeInforequires&mut selffor each operation which created some complications so I decided to separate them. Furthermore this allowed me to remove some mut borrows ofState` throughout the code.

RuntimeInfo is async which required to convert some of the functions in approval-voting to async ones but this change is small and self-explanatory.

Another change worth mentioning is that session_info (renamed to get_session_info) and cache_session_info_for_head are now free standing functions, but more or less their behaviour is the same. cache_session_info_for_head pre-caches SessionInfo in a fashion similar to the one used in dispute-coordinator.

@tdimitrov tdimitrov added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. T4-parachains_engineering This PR/Issue is related to Parachains performance, stability, maintenance. labels Apr 24, 2023
@tdimitrov tdimitrov requested review from eskimor and ordian April 24, 2023 11:47
@ordian
Copy link
Member

ordian commented Apr 24, 2023

While it's ok for dispute-coordinator to only store the past dispute_period sessions, for approval-voting, we want to go as deep as the last finalized block. See also: #6165

node/core/approval-voting/src/import.rs Outdated Show resolved Hide resolved
node/core/approval-voting/src/import.rs Outdated Show resolved Hide resolved
node/core/approval-voting/src/lib.rs Outdated Show resolved Hide resolved
node/core/approval-voting/src/lib.rs Show resolved Hide resolved
node/core/approval-voting/src/lib.rs Outdated Show resolved Hide resolved
@tdimitrov
Copy link
Contributor Author

While it's ok for dispute-coordinator to only store the past dispute_period sessions, for approval-voting, we want to go as deep as the last finalized block. See also: #6165

I see what you mean but with RuntimeInfo we are just caching, so if we don't cache something we can look it up later as long as it's not pruned. And if approval-voting works on unfinalized data - this should not be the case, at least generally speaking. So I guess the biggest problem is the case when we work on a candidate, its corresponding SessionInfo is not cached for some reason, it gets finalized meanwhile (and pruned).

Am I thinking in the right direction?

@sandreim
Copy link
Contributor

The runtime keeps only 6 sessions in state. In the case of a long finality stall we would need to query session infos from older blocks. See #7127

@tdimitrov
Copy link
Contributor Author

tdimitrov commented Apr 25, 2023

The runtime keeps only 6 sessions in state. In the case of a long finality stall we would need to query session infos from older blocks. See #7127

You mean the cache in runtime api subsystem or the actual runtime?

If the former - we are good. All get_session_info() calls are made with a corresponding block hash, not the latest seen head.

@sandreim
Copy link
Contributor

The runtime keeps only 6 sessions in state. In the case of a long finality stall we would need to query session infos from older blocks. See #7127

You mean the cache in runtime api subsystem or the actual runtime?

https://github.com/paritytech/polkadot/blob/master/runtime/parachains/src/session_info.rs#L150

If the former - we are good. All get_session_info() calls are made with a corresponding block hash, not the latest seen head.

That sounds good then!

Copy link
Member

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misreading the code, but I think we should make sure #6165 is addressed properly here.

node/core/approval-voting/src/lib.rs Outdated Show resolved Hide resolved
node/core/approval-voting/src/import.rs Outdated Show resolved Hide resolved
@tdimitrov
Copy link
Contributor Author

@ordian just to be sure we are on the same page:

  1. earliset_session() implementation is bad, because it uses dispute window, instead of the last finalized block's session.
  2. Caching is not good, because it goes six blocks back. In case of stalled finality it will miss sessions (potentially a lot of sessions on testnets).
  3. This is my opinion and I want to know if you agree - get_session_info() usage is fine because it uses block's hash (or candidate's parent hash) to query SessionInfo. Even if this session is not cached we will be able to fetch it from the runtime.

Do you agree (especially with 3)?

@ordian
Copy link
Member

ordian commented Apr 26, 2023

@ordian just to be sure we are on the same page:

1. `earliset_session()` implementation is bad, because it uses dispute window, instead of the last finalized block's session.

2. Caching is not good, because it goes six blocks back. In case of stalled finality it will miss sessions (potentially a lot of sessions on testnets).

3. This is my opinion and I want to know if you agree - `get_session_info()` usage is fine because it uses block's hash (or candidate's parent hash) to query SessionInfo. Even if this session is not cached we will be able to fetch it from the runtime.

Do you agree (especially with 3)?

In principle yes, I agree. But I'm not sure exactly what is the purpose of RuntimeInfo in this case. If we rely on runtime-api caching to be enough for all unfinalized blocks, what does this abstraction bring? In other words, why not simply query session info with the appropriate relay parent from the runtime-api subsystem?

@tdimitrov
Copy link
Contributor Author

Okay, what you wrote above is sort of correct. We already agreed it is wrong. earliest_session is getting fixed/removed no matter what.

But the scenario you described is only valid for the session caching. The actual session fetching in the rest of the code (in this PR) is made via parent hash so it should be fine.

I think I understand the concerns of both of you - let me rework the PR and we'll continue discussing the final solution.

@tdimitrov
Copy link
Contributor Author

tdimitrov commented Apr 28, 2023

I've removed the check for earliest session and check if the block is finalized instead.

Also I removed the pre-caching. I think it won't help much because of the reasons @ordian and @sandreim - in case of a finality lag we won't be able to fill in the cache. During normal operation - the cache will be populated the first time when a session is requested. Any thoughts on this @eskimor?

The last two commits are work in progress and need polishing and fixing the tests. Please ignore (for now) things like commented out code blocks in the tests, misleading variable names and broken tests. I want to get some feedback first because adjusting the tests will be time consuming and I want to do it only once :)

@eskimor
Copy link
Member

eskimor commented Apr 29, 2023

I've removed the check for earliest session and check if the block is finalized instead.

Also I removed the pre-caching. I think it won't help much because of the reasons @ordian and @sandreim - in case of a finality lag we won't be able to fill in the cache. During normal operation - the cache will be populated the first time when a session is requested. Any thoughts on this @eskimor?

The last two commits are work in progress and need polishing and fixing the tests. Please ignore (for now) things like commented out code blocks in the tests, misleading variable names and broken tests. I want to get some feedback first because adjusting the tests will be time consuming and I want to do it only once :)

Sounds sensible.

@tdimitrov tdimitrov requested review from eskimor and ordian May 3, 2023 06:16
Copy link
Member

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice except for the log levels which need double checking.

node/core/approval-voting/src/import.rs Show resolved Hide resolved
node/core/approval-voting/src/import.rs Outdated Show resolved Hide resolved
Copy link
Member

@eskimor eskimor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Actual removal of RollingSessionWindow and db migration will be a separate PR?

node/core/approval-voting/src/import.rs Outdated Show resolved Hide resolved
node/core/approval-voting/src/import.rs Outdated Show resolved Hide resolved
@tdimitrov
Copy link
Contributor Author

I've fixed all outstanding issues. I'll do a burnin on versi and this can be merged.

Actual removal of RollingSessionWindow and db migration will be a separate PR?

My plan was to remove RollingSessionWindow as a separate PR but I forgot about the storage migration. I'll create a new PR on top of this one (to keep the diffs clean) and will merge them together after the burnin.

Copy link
Member

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

@tdimitrov tdimitrov mentioned this pull request May 9, 2023
4 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. T4-parachains_engineering This PR/Issue is related to Parachains performance, stability, maintenance.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor RollingSessionWindow into a subsystem
4 participants