Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node stops syncing for a while with err="shutting down" spam in logs #24623

Closed
stickx opened this issue Apr 1, 2022 · 22 comments · Fixed by #24652
Closed

Node stops syncing for a while with err="shutting down" spam in logs #24623

stickx opened this issue Apr 1, 2022 · 22 comments · Fixed by #24652
Labels

Comments

@stickx
Copy link

stickx commented Apr 1, 2022

System information

Geth version: 1.10.16-stable
OS & Version: Linux

Expected behaviour

Node to stay synced with the latest block

Actual behaviour

The node will be syncing happily.
Then, regularly, something is causing my node to go crazy, spamming Synchronisation failed, retrying err="shutting down" 1-3000 times, then fall behind.

to reduce the spam in the example log below, i've change the spammed lines to ... REPEAT X TIMES ...

INFO [03-31|22:28:35.079] Unindexing transactions                  blocks=1   txs=216   total=1  elapsed=2.922s
INFO [03-31|22:28:47.688] Unindexed transactions                   blocks=1   txs=216   tail=12,146,619 elapsed=15.531s
INFO [03-31|22:28:51.865] Imported new chain segment               blocks=1   txs=403   mgas=6.647   elapsed=19.706s     mgasps=0.337   number=14,496,619 hash=b209c9..f6a164 dirty=387.95MiB  ignored=2
INFO [03-31|22:28:53.147] Unindexed transactions                   blocks=1   txs=267   tail=12,146,620 elapsed=1.256s
WARN [03-31|22:28:58.556] Synchronisation failed, dropping peer    peer=28bec7d0a2e477e13492106cfc45e2a953ebc19b2203fb83d701f38590a63047 err=timeout
INFO [03-31|22:28:58.909] Imported new chain segment               blocks=1   txs=86    mgas=6.898   elapsed=6.969s      mgasps=0.990   number=14,496,620 hash=5de4b6..7236cd dirty=386.86MiB
INFO [03-31|22:28:59.647] Unindexed transactions                   blocks=1   txs=241   tail=12,146,621 elapsed=107.758ms
WARN [03-31|22:29:01.504] Synchronisation failed, retrying         err="write tcp 10.0.2.51:57462->35.173.195.85:30303: use of closed network connection"
INFO [03-31|22:29:22.732] Writing clean trie cache to disk         path=/data/gethdata/geth/triecache threads=1
WARN [03-31|22:29:23.439] Synchronisation failed, dropping peer    peer=366ba3a49b528e8b041b85fd8c1a33364b629ede18d1a4218bc046b3ffaed4bb err=timeout
WARN [03-31|22:29:23.458] Synchronisation failed, retrying         err="shutting down"
WARN [03-31|22:29:23.660] Synchronisation failed, retrying         err="write tcp 10.0.2.51:44564->135.181.218.249:30303: use of closed network connection"
WARN [03-31|22:29:23.694] Synchronisation failed, retrying         err="shutting down"
... REPEAT 587 TIMES ...
WARN [03-31|22:29:24.261] Synchronisation failed, retrying         err="shutting down"
INFO [03-31|22:29:24.327] Regenerated local transaction journal    transactions=2  accounts=1
WARN [03-31|22:29:24.331] Synchronisation failed, retrying         err="shutting down"
WARN [03-31|22:29:24.364] Synchronisation failed, retrying         err="shutting down"
WARN [03-31|22:29:24.428] Synchronisation failed, retrying         err="shutting down"
WARN [03-31|22:29:24.448] Synchronisation failed, retrying         err="shutting down"
INFO [03-31|22:29:25.149] Imported new chain segment               blocks=1   txs=250   mgas=26.190  elapsed=26.233s     mgasps=0.998   number=14,496,621 hash=ef6425..f44957 age=1m10s    dirty=389.08MiB
INFO [03-31|22:29:29.071] Unindexing transactions                  blocks=1   txs=271   total=1  elapsed=3.902s
INFO [03-31|22:29:32.474] Unindexed transactions                   blocks=1   txs=271   tail=12,146,622 elapsed=7.305s
INFO [03-31|22:29:33.508] Imported new chain segment               blocks=1   txs=76    mgas=8.471   elapsed=8.351s      mgasps=1.014   number=14,496,622 hash=8e1a74..ad6edf dirty=389.39MiB
INFO [03-31|22:29:33.511] Deep froze chain segment                 blocks=4   elapsed=1.278s      number=14,406,621 hash=aa8681..aa0725
INFO [03-31|22:29:34.433] Unindexed transactions                   blocks=1   txs=201   tail=12,146,623 elapsed=919.394ms
WARN [03-31|22:29:37.833] Synchronisation failed, dropping peer    peer=360eb84884ae4e2d7499a66b91a2e2097dcabe66bef6fab097002a0067e8c9d9 err=timeout
WARN [03-31|22:29:44.043] Synchronisation failed, dropping peer    peer=360eb84884ae4e2d7499a66b91a2e2097dcabe66bef6fab097002a0067e8c9d9 err=timeout
INFO [03-31|22:29:45.763] Imported new chain segment               blocks=1   txs=177   mgas=12.228  elapsed=11.299s     mgasps=1.082   number=14,496,623 hash=239f9b..002ce0 dirty=388.29MiB
INFO [03-31|22:29:45.768] Unindexed transactions                   blocks=1   txs=135   tail=12,146,624 elapsed=1.991ms
INFO [03-31|22:29:45.868] Imported new chain segment               blocks=1   txs=18    mgas=1.681   elapsed=98.543ms    mgasps=17.063  number=14,496,624 hash=2df861..37f705 dirty=386.40MiB
INFO [03-31|22:29:45.900] Unindexed transactions                   blocks=1   txs=239   tail=12,146,625 elapsed=28.830ms
INFO [03-31|22:29:53.908] Imported new chain segment               blocks=1   txs=39    mgas=4.020   elapsed=8.028s      mgasps=0.501   number=14,496,625 hash=6c2ae1..b3fd5a dirty=385.93MiB
INFO [03-31|22:29:55.362] Persisted the clean trie cache           path=/data/gethdata/geth/triecache elapsed=32.630s
INFO [03-31|22:29:57.865] Unindexed transactions                   blocks=1   txs=209   tail=12,146,626 elapsed=3.195s
INFO [03-31|22:29:58.164] Submitted transaction                    
INFO [03-31|22:29:59.799] Downloader queue stats                   receiptTasks=0 blockTasks=0 itemSize=81.81KiB  throttle=3205
INFO [03-31|22:30:06.257] Imported new chain segment               blocks=1   txs=316   mgas=28.785  elapsed=10.966s     mgasps=2.625   number=14,496,626 hash=c54210..6bdaec dirty=388.43MiB
INFO [03-31|22:30:06.401] Unindexed transactions                   blocks=1   txs=0     tail=12,146,627 elapsed=61.060ms
INFO [03-31|22:30:14.169] Imported new chain segment               blocks=1   txs=283   mgas=26.098  elapsed=7.817s      mgasps=3.338   number=14,496,626 hash=abfd2c..488d9e age=1m7s     dirty=388.67MiB
INFO [03-31|22:30:16.808] Submitted transaction                    
INFO [03-31|22:30:34.636] Deep froze chain segment                 blocks=5   elapsed=667.379ms   number=14,406,626 hash=8df878..8219a3
INFO [03-31|22:30:35.373] Chain reorg detected                     number=14,496,625 hash=6c2ae1..b3fd5a drop=1 dropfrom=c54210..6bdaec add=1 addfrom=abfd2c..488d9e
INFO [03-31|22:30:39.304] Imported new chain segment               blocks=1   txs=322   mgas=26.098  elapsed=25.042s     mgasps=1.042   number=14,496,626 hash=abfd2c..488d9e age=1m32s    dirty=388.67MiB  ignored=1
INFO [03-31|22:30:48.406] Imported new chain segment               blocks=1   txs=192   mgas=18.567  elapsed=9.102s      mgasps=2.040   number=14,496,627 hash=03fa85..aaa344 age=1m23s    dirty=389.38MiB
INFO [03-31|22:30:50.857] Unindexed transactions                   blocks=1   txs=155   tail=12,146,628 elapsed=2.313s
INFO [03-31|22:31:04.929] Imported new chain segment               blocks=1   txs=208   mgas=18.686  elapsed=7.084s      mgasps=2.638   number=14,496,628 hash=8adabd..ad824c age=1m22s    dirty=388.96MiB
INFO [03-31|22:31:04.933] Unindexed transactions                   blocks=1   txs=214   tail=12,146,629 elapsed=4.646ms
INFO [03-31|22:31:05.150] Imported new chain segment               blocks=1   txs=135   mgas=9.788   elapsed=214.895ms   mgasps=45.549  number=14,496,629 hash=b9bf41..bef833 age=1m2s     dirty=389.07MiB
INFO [03-31|22:31:06.404] Unindexed transactions                   blocks=1   txs=107   tail=12,146,630 elapsed=1.252s
INFO [03-31|22:31:11.050] Imported new chain segment               blocks=1   txs=375   mgas=29.962  elapsed=5.893s      mgasps=5.084   number=14,496,630 hash=f4fd03..7dbacc age=1m2s     dirty=391.50MiB
INFO [03-31|22:31:11.837] Unindexed transactions                   blocks=1   txs=132   tail=12,146,631 elapsed=41.625ms
INFO [03-31|22:31:15.458] Imported new chain segment               blocks=1   txs=194   mgas=15.442  elapsed=3.675s      mgasps=4.201   number=14,496,631 hash=5c5696..4ae1ac dirty=392.56MiB
INFO [03-31|22:31:15.646] Unindexed transactions                   blocks=1   txs=158   tail=12,146,632 elapsed=185.094ms
INFO [03-31|22:31:22.648] Imported new chain segment               blocks=1   txs=114   mgas=8.958   elapsed=4.565s      mgasps=1.962   number=14,496,632 hash=1663e5..7ca1df dirty=393.29MiB
INFO [03-31|22:31:22.651] Unindexed transactions                   blocks=1   txs=89    tail=12,146,633 elapsed=1.870ms
INFO [03-31|22:31:34.259] Imported new chain segment               blocks=1   txs=91    mgas=9.979   elapsed=5.898s      mgasps=1.692   number=14,496,633 hash=2ea7ca..90c8f4 dirty=392.18MiB
INFO [03-31|22:31:34.670] Deep froze chain segment                 blocks=7   elapsed=30.268ms    number=14,406,633 hash=f731c9..53b991
INFO [03-31|22:31:35.943] Unindexed transactions                   blocks=1   txs=256   tail=12,146,634 elapsed=1.295s
INFO [03-31|22:31:36.143] Submitted transaction                    
INFO [03-31|22:31:53.838] Submitted transaction                    
INFO [03-31|22:31:59.360] Downloader queue stats                   receiptTasks=0 blockTasks=0 itemSize=78.89KiB  throttle=3323
INFO [03-31|22:32:13.738] Imported new chain segment               blocks=1   txs=164   mgas=12.744  elapsed=14.378s     mgasps=0.886   number=14,496,634 hash=a48305..367e0f dirty=391.74MiB
INFO [03-31|22:32:37.396] Imported new chain segment               blocks=1   txs=78    mgas=5.961   elapsed=23.657s     mgasps=0.252   number=14,496,635 hash=b19f77..5a267f dirty=391.01MiB
INFO [03-31|22:32:37.506] Unindexed transactions                   blocks=2   txs=452   tail=12,146,636 elapsed=109.653ms
INFO [03-31|22:32:43.046] Imported new chain segment               blocks=1   txs=91    mgas=11.533  elapsed=5.634s      mgasps=2.047   number=14,496,636 hash=e3b4c7..5adfba dirty=390.93MiB
WARN [03-31|22:32:43.402] Synchronisation failed, dropping peer    peer=355b39725ef096fa59d339a6f15ecc17465ea7689ba3855965bf7c70d82365fb err=timeout
INFO [03-31|22:32:47.739] Unindexed transactions                   blocks=1   txs=260   tail=12,146,637 elapsed=4.339s
WARN [03-31|22:32:50.045] Synchronisation failed, dropping peer    peer=355b39725ef096fa59d339a6f15ecc17465ea7689ba3855965bf7c70d82365fb err=timeout
WARN [03-31|22:32:50.454] Synchronisation failed, retrying         err="shutting down"
... REPEAT 190 TIMES ...
WARN [03-31|22:32:50.536] Synchronisation failed, retrying         err="shutting down"
WARN [03-31|22:32:57.469] Synchronisation failed, retrying         err="peer is unknown or unhealthy"
... REPEAT 64 TIMES ...
WARN [03-31|22:32:59.003] Synchronisation failed, retrying         err="peer is unknown or unhealthy"
INFO [03-31|22:33:01.594] Imported new chain segment               blocks=1   txs=159   mgas=11.721  elapsed=18.129s     mgasps=0.646   number=14,496,637 hash=29fae3..39c437 age=1m3s     dirty=390.70MiB
INFO [03-31|22:33:01.956] Unindexed transactions                   blocks=1   txs=187   tail=12,146,638 elapsed=122.076ms
INFO [03-31|22:33:08.680] Downloader queue stats                   receiptTasks=0 blockTasks=0 itemSize=83.40KiB  throttle=3144
INFO [03-31|22:33:12.008] Imported new chain segment               blocks=1   txs=192   mgas=15.487  elapsed=5.909s      mgasps=2.621   number=14,496,638 hash=efe708..54859c age=1m1s     dirty=392.18MiB
INFO [03-31|22:33:12.200] Unindexed transactions                   blocks=1   txs=164   tail=12,146,639 elapsed=190.257ms
INFO [03-31|22:33:20.809] Imported new chain segment               blocks=3   txs=806   mgas=38.897  elapsed=8.799s      mgasps=4.420   number=14,496,641 hash=8a8f4f..f6d3d9 dirty=392.62MiB  ignored=2
INFO [03-31|22:33:20.838] Unindexed transactions                   blocks=3   txs=523   tail=12,146,642 elapsed=26.423ms
INFO [03-31|22:33:29.886] Imported new chain segment               blocks=1   txs=274   mgas=27.064  elapsed=9.048s      mgasps=2.991   number=14,496,642 hash=4f28a1..e7f37c dirty=393.39MiB
INFO [03-31|22:33:31.103] Unindexed transactions                   blocks=1   txs=228   tail=12,146,643 elapsed=1.215s
INFO [03-31|22:33:40.263] Deep froze chain segment                 blocks=9   elapsed=5.574s      number=14,406,642 hash=ed368c..0d909d
INFO [03-31|22:34:33.207] Imported new chain segment               blocks=1   txs=434   mgas=30.074  elapsed=23.861s     mgasps=1.260   number=14,496,643 hash=b8a1d6..d4596f age=1m28s    dirty=395.35MiB
INFO [03-31|22:34:33.290] Unindexed transactions                   blocks=1   txs=90    tail=12,146,644 elapsed=23.551ms
INFO [03-31|22:34:34.436] Downloader queue stats                   receiptTasks=0 blockTasks=0 itemSize=91.22KiB  throttle=2874
INFO [03-31|22:34:36.947] Chain reorg detected                     number=14,496,642 hash=4f28a1..e7f37c drop=1 dropfrom=b8a1d6..d4596f add=1 addfrom=51e5f4..5bde2a
INFO [03-31|22:34:48.336] Imported new chain segment               blocks=1   txs=32    mgas=3.545   elapsed=15.068s     mgasps=0.235   number=14,496,643 hash=51e5f4..5bde2a age=1m43s    dirty=395.48MiB
INFO [03-31|22:34:53.643] Chain reorg detected                     number=14,496,642 hash=4f28a1..e7f37c drop=1 dropfrom=51e5f4..5bde2a add=2 addfrom=c4e0b2..1ecba3
INFO [03-31|22:35:02.879] Imported new chain segment               blocks=1   txs=905   mgas=22.506  elapsed=14.533s     mgasps=1.548   number=14,496,644 hash=c4e0b2..1ecba3 dirty=396.18MiB  ignored=2
INFO [03-31|22:35:03.231] Imported new chain segment               blocks=1   txs=13    mgas=1.098   elapsed=351.699ms   mgasps=3.121   number=14,496,645 hash=e04b7e..476160 dirty=395.44MiB
INFO [03-31|22:35:05.021] Unindexed transactions                   blocks=2   txs=357   tail=12,146,646 elapsed=1.617s

Steps to reproduce the behaviour

Synced 1.10.16-stable on a fresh aws2 instance.

@stickx stickx added the type:bug label Apr 1, 2022
@stickx
Copy link
Author

stickx commented Apr 4, 2022

thanks for the help. moving to other software.

@MariusVanDerWijden
Copy link
Member

You opened this issue on a Friday, today is Tuesday.
We are working very hard on getting the merge done which has high priority over other issues at the moment.
As you can see there are > 220 issues in our issue tracker at the moment so we can not jump into every single one to fix it.

Your node was synced before the log spam, thus shut down the downloader. But your nodes seems to be falling behind with block processing so it starts the downloader again, which sees that the downloader is shutting down thus printing the error.
Judging from your logs, the issue does self correct after a couple of seconds, so not much to do for us here other than maybe reduce the amount of log spam.

@stickx
Copy link
Author

stickx commented Apr 5, 2022

Apologies, I worked all weekend, so it felt like a long time.

The logs I pasted are on the mild side, sometimes it gets 10 minutes behind.
This is constantly happening, multiple times per hour.

Do you think this could be a lack of processing power? If there a recommended AWS instance type?

@MariusVanDerWijden
Copy link
Member

We usually use t3.xl afaik

@stickx
Copy link
Author

stickx commented Apr 5, 2022

I'm on an m5.2xl ...

@holiman
Copy link
Contributor

holiman commented Apr 5, 2022

What are the exact command line parameters used for starting the node?

@holiman
Copy link
Contributor

holiman commented Apr 5, 2022

We fixed a similar issue earlier, but that doesn't appear to be the same cause: #24202 (comment) . If you can obtain a stacktrace from when it is in this state (e.g. debug.stacks()), that would be helpful, because then we might be able to see what those 600 goroutines are waiting for.

@stickx
Copy link
Author

stickx commented Apr 5, 2022

./go-ethereum/build/bin/geth --datadir=/data --http --http.addr=x.x.x.x --http.port=xx --http.api=eth,web3,net,debug,trace,txpool

will work on getting the stacks to you asap

@stickx
Copy link
Author

stickx commented Apr 5, 2022

CPU usage

This seems troubling too, yes?

@stickx
Copy link
Author

stickx commented Apr 5, 2022

gethlogs.zip
stack logs. hopefully i captured it...

@holiman
Copy link
Contributor

holiman commented Apr 5, 2022

gethlogs1 has 190 goroutines stuck here https://github.com/ethereum/go-ethereum/blob/master/eth/fetcher/block_fetcher.go#L480 . gethlogs2 has 189 stuck in the same place.

One example:

goroutine 383702607 [chan receive, 1006 minutes]:
github.com/ethereum/go-ethereum/eth/fetcher.(*BlockFetcher).loop.func1.1(0xc187f6aab0, 0xc00ee56d20, 0xc0aedb6c00, 0x40, 0x3abb85767d2ade3, 0xc3584e164a06fc8f, 0x959c3fc21cba98e6, 0x1372c272f0101e91)
	github.com/ethereum/go-ethereum/eth/fetcher/block_fetcher.go:480 +0xdb
created by github.com/ethereum/go-ethereum/eth/fetcher.(*BlockFetcher).loop.func1
	github.com/ethereum/go-ethereum/eth/fetcher/block_fetcher.go:471 +0x12e

So it's waiting for the fetcher to deliver something. It indicates that something is amiss in the dispatcher, similar to the bug fixed here

@stickx
Copy link
Author

stickx commented Apr 6, 2022

Does the CPU usage look high too? Or is that normal?

@holiman
Copy link
Contributor

holiman commented Apr 6, 2022

Original error report was:

Then, regularly, something is causing my node to go crazy, spamming Synchronisation failed, retrying err="shutting down" 1-3000 times, then fall behind.

IMO this is fixed by #24652,so it should indeed be closed.

@stickx
Copy link
Author

stickx commented Apr 6, 2022

No worries, thanks!
If you think the CPU usage isn't a problem.
I'll keep an eye on things post-upgrade

@holiman
Copy link
Contributor

holiman commented Apr 6, 2022

Can't say much about that CPU usage. Looks a bit on the high side, unless it's a multi-core machine and e.g. 800% is the max

@mining-visualizer
Copy link

I'm seeing the exact same thing.

Here's my setup:
v1.10.16
Raspberry Pi 4, 8GB
8GB swap file on the SD card
Samsung SSD 1TB data disk

My observations are that the problem develops over time. It will run fine for about a week and then slowly it gets worse and worse. I set up metrics collection using prometheus/grafana to get a better idea of what is going on, which you can see below. It seems to be correlated with db compacting. During that time it seems as if geth basically freezes. It stops providing metrics, stops responding to RPC calls, drops peers, etc.

I'm going to try moving my swap file to the Samsung, since the SD cards generally have very poor performance.

image

image

image

image

@karalabe
Copy link
Member

I'm going to try moving my swap file to the Samsung, since the SD cards generally have very poor performance.

When LevelDB starts compacting - if enough load piles up - it essentially blocks all database reads, causing all subsystems to lock up eventually. I think in general you should avoid using a swap file at all, it usually just makes things significantly worse since you're using an already overloaded disk for memory accesses.

Oh, and just don't use an SD card for anything. They are insanely slow.

@mining-visualizer
Copy link

mining-visualizer commented Apr 14, 2022

I think in general you should avoid using a swap file at all, it usually just makes things significantly worse since you're using an already overloaded disk for memory accesses.

Thanks! I've been thinking about disabling the swap file, but was looking for some confirmation from someone with more expertise.

@mining-visualizer
Copy link

mining-visualizer commented Apr 15, 2022

@karalabe Just thought I would report back. I followed your suggestion to disable the swap file, but after about 24 hours geth crashed with a simple "Killed" message on the screen. Couldn't find anything more detailed in journalctl. So I put the swap file back on. I can file a separate issue if you want.

@kemorebiyyj
Copy link

ce45685157a6535c2882e023dda402f This error often occurs in my geth logs,after setting maxpeers = 900,I restarted the client and geth started normal synchronization.but a few days later,it still reported "err = shutting down". Please help me.

This Geth fullnode is used to synchronize bsc chain data,geth version 1.1.20

@kemorebiyyj
Copy link

msg="synchronisation failed,retrying" err= "shutting down"maybe because the other nodes are unstable?

@skliarovartem
Copy link

did you fix it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants