Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Read Disk I/O recently??? #21809

Closed
fabdarice opened this issue Nov 10, 2020 · 11 comments
Closed

High Read Disk I/O recently??? #21809

fabdarice opened this issue Nov 10, 2020 · 11 comments

Comments

@fabdarice
Copy link

fabdarice commented Nov 10, 2020

System information

Geth version: 1.9.23-stable
OS & Version: Arch Linux

Expected behaviour

I have been experiencing for the past month an average of under 1MB/s for read and write disk i/o

Actual behaviour

Recently (last 4 days or so), I have ben experiencing an average of 20MB/s for read disk i/o.
Doing sudo iotop clearly shows that geth is responsible for this.
The weird thing is that I've tried restarting geth with different parameters (such as turning off --metrics --pprof), but the same thing happens, but 1 out of 15-20 times, the geth node is doing normal write and read i/o (under 1MB/s) and as long as I don't restart the process, it'll work fine.

image

@fabdarice fabdarice changed the title High Disk I/O recently??? High Read Disk I/O recently??? Nov 10, 2020
@ethereum ethereum deleted a comment from watsoncoders Nov 10, 2020
@ligi
Copy link
Member

ligi commented Nov 12, 2020

Is your RPC open to the outside? Can you provide some logs? I see you have --metrics on - can you share them?

@fabdarice
Copy link
Author

Is your RPC open to the outside? Can you provide some logs? I see you have --metrics on - can you share them?

Hey @ligi , thanks for looking into this.
Nah, I'm running geth with the flags --http --http.addr localhost
Metrics: https://gist.github.com/fabdarice/7fb03e6730aaffa7a7a6f8d0632eb2d6

Also one thing I've noticed, everytime I restart geth, for 2 minutes, I get an massive spike of disk i/o (write ~130MB/s and read 70MB/s).
For unknown reasons, it always causes my internet to drop.

@holiman
Copy link
Contributor

holiman commented Nov 12, 2020

I get an massive spike of disk i/o (write ~130MB/s and read 70MB/s).

That's geth flusing a lot of in-memory data structures (cached trie nodes) to disk, that's to be expected.

For unknown reasons, it always causes my internet to drop.

That's.. not to be expected... :/

@fabdarice
Copy link
Author

do you guys know if those numbers are expected?

@holiman
Copy link
Contributor

holiman commented Nov 22, 2020 via email

@holiman
Copy link
Contributor

holiman commented Nov 22, 2020

Could you please give this one a test-run: #21887 ? Should be quick to run, and will tell us if the leveldb database is in a somwhat 'sane' state.

@fabdarice
Copy link
Author

fabdarice commented Nov 22, 2020

@holiman thanks for looking into it.

Here's some results:
$ ./build/bin/geth dbstats

Compactions
 Level |   Tables   |    Size(MB)   |    Time(sec)  |    Read(MB)   |   Write(MB)
-------+------------+---------------+---------------+---------------+---------------
   0   |          5 |    1145.36859 |       0.00000 |       0.00000 |       0.00000
   1   |         50 |      99.40603 |       0.00000 |       0.00000 |       0.00000
   2   |        609 |     999.01700 |       0.00000 |       0.00000 |       0.00000
   3   |       7786 |    9998.56247 |       0.00000 |       0.00000 |       0.00000
   4   |      50948 |   99998.97362 |       0.00000 |       0.00000 |       0.00000
   5   |      20982 |   41855.70432 |       0.00000 |       0.00000 |       0.00000
-------+------------+---------------+---------------+---------------+---------------
 Total |      80380 |  154097.03203 |       0.00000 |       0.00000 |       0.00000

$ ./build/bin/geth --nousb dbstats

Compactions
 Level |   Tables   |    Size(MB)   |    Time(sec)  |    Read(MB)   |   Write(MB)
-------+------------+---------------+---------------+---------------+---------------
   0   |          1 |       9.62615 |       0.00000 |       0.00000 |       0.00000
   1   |         56 |     106.88867 |       0.00000 |       0.00000 |       0.00000
   2   |        627 |    1075.44599 |       0.00000 |       0.00000 |       0.00000
   3   |       8336 |   10753.07116 |       0.00000 |       0.00000 |       0.00000
   4   |      51095 |  100269.49947 |       0.00000 |       0.00000 |       0.00000
   5   |      20982 |   41855.70432 |       0.00000 |       0.00000 |       0.00000
-------+------------+---------------+---------------+---------------+---------------
 Total |      81097 |  154070.23576 |       0.00000 |       0.00000 |       0.00000

EDIT: Unfortunately, as I'm running those commands, my disk i/o were back to normal.
I'll run it again next time we're back to 20-30MB/s

@fabdarice
Copy link
Author

fabdarice commented Nov 22, 2020

Note that it fluctuates, and I haven't been able to find a correlation.
The only thing I'm sure of is that the high spikes are caused by geth.

See metrics below.
image

@fabdarice
Copy link
Author

hey @holiman, just ran the command as my disk I/O are again on avg 20-30MB/s

 Level |   Tables   |    Size(MB)   |    Time(sec)  |    Read(MB)   |   Write(MB)
-------+------------+---------------+---------------+---------------+---------------
   0   |          2 |     430.23841 |       0.00000 |       0.00000 |       0.00000
   1   |        477 |     926.86194 |       0.00000 |       0.00000 |       0.00000
   2   |        823 |    1293.13021 |       0.00000 |       0.00000 |       0.00000
   3   |       7873 |    9999.59098 |       0.00000 |       0.00000 |       0.00000
   4   |      50962 |   99999.91003 |       0.00000 |       0.00000 |       0.00000
   5   |      21537 |   42954.67393 |       0.00000 |       0.00000 |       0.00000
-------+------------+---------------+---------------+---------------+---------------
 Total |      81674 |  155604.40551 |       0.00000 |       0.00000 |       0.00000

@ca42
Copy link

ca42 commented Nov 30, 2020

@holiman we are experiencing this issue as well across all of our full nodes regardless of there being active client connections or not. We are regularly seeing sustained read throughput of around 100 MB/s:

image

This is happening for client versions 1.9.21 and 1.9.24, both compiled with go1.15.5 running on linux.

@holiman
Copy link
Contributor

holiman commented Oct 25, 2021

We're always looking into how to reduce the disk IO, but since this bug is fairly old, I don't see any reason to keep it open any longer.

@holiman holiman closed this as completed Oct 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants