-
Notifications
You must be signed in to change notification settings - Fork 879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Besu block import an order of magnitude slower than Geth (side by side comparison) #4549
Comments
Thanks @kayagoban, we're looking into optimizing block processing time, let me just check those results on similar hardware setup. |
Some anecdotal stuff from the rocketpool discord: Storage IOPS may matter. Users who report issues with Besu 22.7 are on SATA SSDs, maybe 40k read / 11k write IOPS on a 150G file with rw-mix ratio 75%. NVMe is roughly 4x that, 175k/60k or thereabouts. Raid6 in this case is not well suited for DB workloads for example, so I’d expect somewhat lower IOPS there because of the raid6 write amplification. It also matters which generation raid controller this is: Anything before 9400/9500 does not pass through TRIM and so SSD performance suffers over time. Hypothesis: Besu needs more IOPS than Geth, and so on storage that is “in between” performance wise - plenty for Geth but a little tight for Besu - this causes the block import time issues seen. The performance anecdotes are with TLC and DRAM drives, Samsung 860/870 EVO. I am syncing a Besu on NVMe and will check IF this bears out to be a major contributing factor, this then begs the question of what Besu can do to reduce its IOPS use, and whether it’s worth tuning Besu for use on “middling” storage - SATA, VPS, AWS gp3, sub-optimal raid layouts - or it should just be documented that Besu is an NVMe, no QLC no DRAMless, mirrors (RAID1) are fine, raid5/6 are not play. |
@kayagoban I have similar results on one of our nodes
We're looking into this to improve block processing time. @yorickdowne Thanks for your sharing, your feedback is helpfull as always. |
@ahamlat I was talking about post sync. Something causes some systems to have block processing times in excess of 1s while others are doing well: It doesn't appear to be CPU or memory, that's why I'm suspecting disk. And, more testing needed. |
I agree with you, CPU and memory are not the bottleneck post sync, and we can use memory to reduce the number of IO (RocksDB cache). We already identified RocksDB.get method as one of the most consuming methods (almost half of block processing time without high spec flag). This is mainly due to getting different World state storage and account nodes from Merkle Patricia trie. These account and storage nodes should be already stored in the flat databases at the end of Sync, which is one of the benefits from using Bonsai as a database layer. Our current implementation is not optimized because at the end of the sync, there are some gaps in these two flat databases. We're working to optimize this implementation to have a complete database and thus reduce read amplification. |
If I understand correctly, using the Forest strategy would probably be faster on my machine than Bonsai - do you think that’s correct? |
Good question, even with these gaps in the flat databases, Bonsai is still faster than Forest implementation. |
A clarification - this is about already-synced block import times. (I hear you guys calling it block processing, but the logs actually say " Imported #15,788,19" so I'm using your own terminology here). I can see how it doesn't seem like an issue on faster storage, but if geth is usable on these slower storage systems and besu is not, with a 10x difference, simply saying "wontfix" doesn't bode well for client diversity. |
I haven't heard "wontfix". Right now it's about figuring out where the slowdown is from an operator perspective, and the Besu team can figure out where it happens in code and design. Design as in "DB layout", broadly, if that's where the issue lies. I expect this won't get solved overnight. I agree that running on "whatever" is core to Ethereum's mission, and running well on "middling" storage is highly desirable. That said, RAID6 is close to a worst-case scenario for DB storage. Is this a hardware RAID controller, and if so, which make and model? |
I can confirm that we're working on this currently, it is our top priority. We're working on a fix on the account and storage flat database, and we're investigating other optimizations (RocksDB bloom filters, parallelize calculateRootHash method, ...etc). |
It's a PERC H701P Mini hw raid controller with 8x 1TB SSD drives. I didn't know it was a worst case when I set it up as RAID6, although I thought there would be a performance hit. I'm colocated, so letting the drives failover cleanly was a priority for me. Collecting that beast and reformatting the drives and reinstalling ESXi is not on my to-do list, even if it would get me more IOPS. |
The H710P (is that the one you meant?) does not pass through TRIM. Performance will suffer as the drives get older. I had a PERC H710P with RAID1 and performance on its SATA SSDs got so bad I couldn't even sync Geth any more. Writeup here: https://gist.github.com/yorickdowne/fd36009c19fdbee0337bffc0d5ad8284 I think this is good, if also anecdotal, evidence that IOPS is the bottleneck. That Besu can do something about, and as ahamlat says the team is focusing on that right now. Your personal setup may never run particularly well until that's a RAID10 on mdadm (or whatever ESXI's software RAID equivalent is) with the PERC710P as just an HBA, but that's neither here nor there with regards to the discussion of Besu's IOPS hunger relative to Geth. I think everyone's agreed that the desired state is that Besu runs as well as Geth on any given storage platform. |
I read your post about the 710P degrading SSD drives. It didn't exactly make me cry, but I'm reaching for some alcohol and I'll just be rocking back and forth in a corner if anybody wants something. |
Going to try 4568 on my test setup here. It's a 24-core 512 GiB RAM NVME baremetal, but no high perf flag that'd trade RAM for speed. So I have a baseline. And it definitely spikes to just under 2s on that machine:
|
Feels a little better, though not night and day, with that PR.
Will try same PR with high perf spec |
With high performance flag and 8 GiB heap
|
Thanks @yorickdowne for testing PR #4568, it is hard to see the impact just by displaying the logs. What I do usually to evaluate the impact of new PRs is to :
|
PR-4568
Results without the PR on the same machine
|
Results with current 22.10.0 release
|
Huge improvement with the latest release. I need to test this again on a less beefy machine and, this looks great.
|
On a machine with 6 cores instead of 24, it's still good. Getting that 0.5 down to under 500ms would be great and, progress.
|
Same setup but with
|
closing for tracking of performance issues in #4625 |
Description
As a validator, I would like Besu to match Geth's performance
Acceptance Criteria
Block import times should be comparable to Geth
Steps to Reproduce (Bug)
Expected behavior:
Besu import time is similiar to geth
Actual behavior:
Besu block import is an order of magnitude slower than geth
Frequency:
100%
Versions (Add all that apply)
openjdk 17.0.4 2022-07-19
OpenJDK Runtime Environment (build 17.0.4+8-Debian-1deb11u1)
OpenJDK 64-Bit Server VM (build 17.0.4+8-Debian-1deb11u1, mixed mode, sharing)
Additional Information (Add any of the following or anything else that may be relevant)
[Service]
Environment="BESU_OPTS=-Xmx8g"
ExecStart=/opt/app/besu-22.7.7/bin/besu
--data-path=/opt/besu-data
--network=MAINNET
--rpc-http-host=127.0.0.1
--rpc-http-apis="ADMIN,ETH,NET,WEB3"
--rpc-http-enabled=true
--host-allowlist="*"
--engine-jwt-secret=/opt/jwtsecret
--engine-rpc-enabled=true
--engine-rpc-http-port=8552
--data-storage-format=BONSAI
--sync-mode=X_CHECKPOINT
--p2p-host=xxx.xxx.xxx.xxx
--max-peers=50
--p2p-port=xxxxx
--Xplugin-rocksdb-high-spec-enabled=true
--nat-method=UPNP
Besu block import times range from approximately 1 second to 4.5 seconds. This isn't quite good enough to do validation with - too many incorrect voted head and missed attestations.
We are showing blocks #15,788,198 to #15,788,204
Now, we see Geth, which is running side by side with Besu, importing the exact same blocks.
As we see, Geth is importing blocks about 10 times faster than Besu. Hundreds of ms instead of one to several seconds.
The text was updated successfully, but these errors were encountered: