Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not merge: don't endorse if has chunk extra #12597

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Longarithm
Copy link
Member

No description provided.

github-merge-queue bot pushed a commit that referenced this pull request Dec 13, 2024
I'm not doing it separately because for my test setups partial fixes
didn't work, and we are shipping soon... So I'll explain everything
here.

1. Filtering receipts

On the shard split boundary, we must filter outgoing receipts from
parent chunks, because not all of them actually target the first chunk
in our shard. **But the chunk proof we send to chunk validators must
include all receipts.** That's why we need to add awful parameter
`filter_receipts` to skip filtering when we generate proof, and enable
filtering when we apply chunk.
Also filtering was completely missing on chunk validation; added that.

2. State dumper fix

`verify_path` assertion was failing. 

<details>
<summary>[OUTDATED]</summary>
`compute_state_response_header` wasn't fully reflecting resharding logic
after moving sync hash to current epoch.
First, last chunk' shard layout corresponds to the same epoch id, not
the previous one.
Second, on aggregating incoming receipts, shard layout actually **may**
change, so we need to maintain target shard id properly.
</details>

CurrentEpochStateSync provides sync_hash for which there are 2 chunks
for each shard. However, state sync code goes 1 block back so it gives
only 1 chunk. To get incoming receipts for this chunk, we will cross
epoch and resharding boundary. So I move sync_hash one block forward for
now.

These two cases are caught by
`test_resharding_v3_shard_shuffling_intense`, which creates nontrivial
outgoing receipt distributions on resharding boundary. The next two
cases are more insidious and caught by patch #12597.
`test_resharding_v3_outgoing_receipts_from_splitted_shard` and similar
tests were useful.
I tried to make a proper test but I think I need to enable single shard
tracking everywhere. For that I tried to add rpc node tracking all
shards but it wasn't simple because we have assumptions on what is rpc
node account id.

3. Reassignment

Once I understood what
`ChainStore::reassign_outgoing_receipts_for_resharding` is for, I
understood as well that it must be called for chunk validators. It's
better to read its definition on review. Without it I was getting
`InvalidReceiptsProof` for **one of the parent shards**.

4. Cache by target shard id

`MainStateTransitionCache` must cache results by child shard id, not
parent, otherwise there is a collision. For shard merging we probably
need to key entries by both shard ids.

Maybe we can un-ignore some of tests after that, I didn't check.
It also uses #12586.
github-merge-queue bot pushed a commit that referenced this pull request Jan 2, 2025
Partially applies the patch: #12597

It was failing for delayed receipt tests with
`MissingTrieValue(TrieStorage)` inside
`check_state_shard_uid_mapping_after_resharding`.
The reason is that trie access interface returns `None` for a key with
an empty value (which is the case for delayed receipts and negative
refcounts).
That was not caught up before the patch, because the
`check_state_shard_uid_mapping_after_resharding` was called after
resharding, but not in the end of the test (after gc kicked in).

**Changes**
This PR modifies the `check_state_shard_uid_mapping_after_resharding` to
not check entries with negative refcount.
The test is refactored so that the main loop does not finish an epoch
after resharding, but keeps running until gc kicks in.
For that, the code responsible for testing temporary account is moved to
a loop action,
and all loop actions are moved to a separate file for readability.

**Note**
`test_resharding_v3_yield_resume` is still failing with the original
patch: #12597
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant