Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warp sync triggers OOM on Astar #1110

Open
bLd75 opened this issue Dec 19, 2023 · 7 comments
Open

Warp sync triggers OOM on Astar #1110

bLd75 opened this issue Dec 19, 2023 · 7 comments
Labels
bug Something isn't working project Issue is part of an ongoing project

Comments

@bLd75
Copy link
Contributor

bLd75 commented Dec 19, 2023

Description

Warp sync in not operational on Astar in latest versions, after downloading state (5.3+ Gb), import state triggers OOM on the server

Steps to Reproduce

Start Astar node sync with --sync warp option

Environment

Quite similar to this issue but on para side.
Issue will be solved after uplifting to Polkadot v1.0.0

@bLd75 bLd75 added the bug Something isn't working label Dec 19, 2023
@Dinonard Dinonard added the project Issue is part of an ongoing project label Dec 19, 2023
@ashutoshvarma
Copy link
Member

It is resolved right?

@bLd75
Copy link
Contributor Author

bLd75 commented Jun 12, 2024

@ashutoshvarma we have OOM case currently ongoing on Astar latest version.
@paradox-tt will provide details here.

@Dinonard
Copy link
Member

We're still uplifting & catching up to the latest version.

But please do provide command, environment & logs if you have them.

@paradox-tt
Copy link

Hey team,

Here's my flags with public address hidden

ExecStart=/usr/local/bin/astar-collator \
  --validator \
  --rpc-cors all \
  --name Dox-Astar-01 \
  --execution wasm \
  --state-cache-size 1 \
  --chain astar \
  --public-addr=/ip4/x.x.x.x/tcp/30330 \
  --listen-addr=/ip4/172.19.12.15/tcp/30330 \
  --bootnodes /ip4/20.93.150.146/tcp/30330/p2p/12D3KooWKZwcaofXPmXWHSSfnh34VFJ8zSRJScnNu9UA75x8kNXi \
  --allow-private-ipv4 \
  --discover-local \
  --rpc-port=9110 \
  --prometheus-external \
  --prometheus-port=9702 \
  --rpc-methods=Unsafe \
#  --sync=warp \
  --blocks-pruning=1000 \
  --state-pruning=1000 \
  --telemetry-url 'wss://telemetry-backend.w3f.community/submit/ 1' \
  --telemetry-url 'wss://telemetry.polkadot.io/submit/ 1' \
#  --relay-chain-rpc-urls "wss://rpc.ibp.network/polkadot" \

There's no error in the logs, except that warping continues until the server's out of memory or the instance reboots

Jun 12 11:23:56 doxastar astar-collator[52243]: 2024-06-12 11:23:56 [Parachain] ⏩ Warping, Downloading state, 406.43 Mib (22 peers), best: #0 (0x9eb7…29c6), finalized #0 (0x9eb7…29c6), ⬇ 0.7kiB/s ⬆ 0.4kiB/s
Jun 12 11:24:00 doxastar astar-collator[52243]: 2024-06-12 11:24:00 [Relaychain] ✨ Imported #21184086 (0x5200…14a5)
Jun 12 11:24:01 doxastar astar-collator[52243]: 2024-06-12 11:24:01 [Relaychain] 💤 Idle (15 peers), best: #21184086 (0x5200…14a5), finalized #21184083 (0xb341…3109), ⬇ 145.2kiB/s ⬆ 192.6kiB/s
Jun 12 11:24:02 doxastar astar-collator[52243]: 2024-06-12 11:24:01 [Parachain] ⏩ Warping, Downloading state, 409.57 Mib (22 peers), best: #0 (0x9eb7…29c6), finalized #0 (0x9eb7…29c6), ⬇ 272.5kiB/s ⬆ 0.9kiB/s
Jun 12 11:24:06 doxastar astar-collator[52243]: 2024-06-12 11:24:06 [Relaychain] ✨ Imported #21184087 (0x6345…6470)
-- Boot 5e8c89c8388a471daea298612802f1e0 --
Jun 12 11:27:01 doxastar systemd[1]: Started Astar Node.
Jun 12 11:27:01 doxastar astar-collator[738]: `--state-cache-size` was deprecated. Please switch to `--trie-cache-size`.
Jun 12 11:27:01 doxastar astar-collator[738]: CLI parameter `--execution` has no effect anymore and will be removed in the future!
Jun 12 11:27:01 doxastar astar-collator[738]: 2024-06-12 11:27:01 Astar Collator
Jun 12 11:27:01 doxastar astar-collator[738]: 2024-06-12 11:27:01 ✌️  version 5.39.1-111d18fbfba
Jun 12 11:27:01 doxastar astar-collator[738]: 2024-06-12 11:27:01 ❤️  by Stake Technologies <devops@stake.co.jp>, 2019-2024
Jun 12 11:27:01 doxastar astar-collator[738]: 2024-06-12 11:27:01 📋 Chain specification: Astar
Jun 12 11:27:01 doxastar astar-collator[738]: 2024-06-12 11:27:01 🏷  Node name: Dox-Astar-01
Jun 12 11:27:01 doxastar astar-collator[738]: 2024-06-12 11:27:01 👤 Role: AUTHORITY
Jun 12 11:27:01 doxastar astar-collator[738]: 2024-06-12 11:27:01 💾 Database: RocksDb at /home/astar_1/.local/share/astar-collator/ch

@bLd75
Copy link
Contributor Author

bLd75 commented Jul 16, 2024

Update on pre v5.42.0 client test: the issue is still the same.
In my tests with 32GB RAM, the node always gets OOM at the same time: importing state at 5762.42 Mib.
Once it arrives at this state size, suddenly memory gets filled and bursts to 100% in less than 2 minutes.
image

@bLd75
Copy link
Contributor Author

bLd75 commented Jul 16, 2024

I can't see any significant correlation do disk usage, meaning the problem is targetted on RAM usage by warp sync.
image

@bLd75
Copy link
Contributor Author

bLd75 commented Jul 16, 2024

More insights on memory on short time frame
image
image
image
image
image
image
image
image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working project Issue is part of an ongoing project
Projects
None yet
Development

No branches or pull requests

4 participants