Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/state, eth, trie: stabilize memory use, fix memory leak #21491

Merged
merged 1 commit into from
Aug 27, 2020

Conversation

karalabe
Copy link
Member

@karalabe karalabe commented Aug 26, 2020

We've seen for a while that the bootnodes go OOM while fast syncing. Lately this became more and more pronounced, so it was clear that we have some memory issues in state sync. This PR addresses 3 independent issues:

  • State sync expands the account and storage tries depth first. This ensures that we keep as little of it in memory as possible and flush completed subtries to disk quickly. In the case of the bootnodes however, they are running with 250 full node peers. Even though the trie.Sync is depth first, with so many peers available to retrieve from, the trie got expanded too heavily breadth wise too. This resulted in a notable memory usage.
    • The PR fixes this by limiting each state trie depth to a maximum of 16K active fetches. This way at worse we can have 128 (64 account + 64 storage) depths * 16K requests. In practice the tries are saturated only till depth 8-9, so the depths below will be missing altogether. On mainnet currently this limits pending trie retrieval tasks to about 600K, which is a reasonable amount.
  • State sync previously expanded same-depth trie nodes randomly. This was a deliberate decision so that we scatter the state randomly across the network, so even if a seed node goes down, there's a chance that others pulled enough to reconstruct between each other. This isn't makes state sync a bit non-deterministic.
    • This PR changes the undefined ordering to ~lexicographic one, by modifying the trie.Sync priority queue to not only take into account the depth of a node, but also the path prefix in the trie (only first 14 nibbles fit into the int64 priority beside the depth). This will allow us a further optimization where we can detect if a peer we're syncing from doesn't have the state we need yet, so we can throttle requests going to them (and coming back with an empty response).
  • Every time the pivot moved, we cancelled the currently running state sync and restarted it from scratch with a new head. Unfortunately, we also called a defer sync.Close() on the new object. This resulted in all these partially completed and discarded sync objects to be kept referenced in memory until sync fully completed or failed. This was leaking out memory and leading to crashes,
    • The PR fixes this by removing the defer from within the loop and doing just 1 lazy defer at the top where the original sync object is created. If the object is recreated, the loop will close the old one before replacing it, so there's no leak there.

A quirk in the PR is that I needed to modify the LeafCallback to pass the path too, but that was shared by both the syncer as well as the committer. I hacked a nil into the committer, but maybe it would be nice to split the callback to have a different version for SyncLeafCallback and CommitLeafCallback (I don't want to track the path for commit since it's useless burden).

Screenshot from 2020-08-26 15-09-20

@karalabe karalabe added this to the 1.9.21 milestone Aug 26, 2020
@karalabe karalabe requested a review from holiman August 26, 2020 10:19
@karalabe karalabe requested a review from rjl493456442 as a code owner August 26, 2020 10:19
Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines +306 to +310
prio := int64(len(req.path)) << 56 // depth >= 128 will never happen, storage leaves will be included in their parents
for i := 0; i < 14 && i < len(req.path); i++ {
prio |= int64(15-req.path[i]) << (52 - i*4) // 15-nibble => lexicographic order
}
s.queue.Push(req.hash, prio)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a little playground gist to check how this worked: https://play.golang.org/p/QKezpVe3ZX7 . Might be worth taking the gist and making a test out of it, to verify that paths are indeed prioritized correctly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to postpone adding this into a followup PR. I want to expose the path and code/trie node separation into the Missing and that will make this test a lot simpler. Otherwise we'd need to mess around with hard coding hashes into the tester, which we could definitely do, but it will be kind of like a black magic test with random values nobody knows where they originate from.

@holiman
Copy link
Contributor

holiman commented Aug 26, 2020

This PR is great. Just look at that mem consumption:
Screenshot_2020-08-26 Dual Geth - Grafana(2)

And that steady trickle of state

Screenshot_2020-08-26 Dual Geth - Grafana(3)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants