Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking end to end transaction throughput performance #396

Open
imlvts opened this issue Jan 24, 2022 · 6 comments
Open

Benchmarking end to end transaction throughput performance #396

imlvts opened this issue Jan 24, 2022 · 6 comments
Labels
I10-unconfirmed Issue might be valid, but it's not yet known.

Comments

@imlvts
Copy link

imlvts commented Jan 24, 2022

Benchmarking end to end transaction throughput performance

It would be nice for substrate to have a setup/demonstration/documentation for of E2E benchmarks.
There are plenty of benchmarks in the codebase and a runtime benchmarking framework,
however we were not able to find E2E benchmarks demonstrating throughput of the network.

Furthermore, our benchmarks show peak throughput at around 800 transactions per second,
which is a bit lower than claimed 1000 TPS.
We would like to know where the discrepancy comes from, how to achieve higher throughput
and learn how to analyze Substrate's performance.

Setup

Since we haven't been able to find the setup for E2E benchmarks, we've implemented the following setup:

  • 4x AWS t3.xlarge* instances, each running a substrate node in the same network
  • 1x custom client node that creates transactions over HTTP RPC, evenly distributing between nodes
  • 1x Prometheus server collecting stats from substrate nodes

* t3.xlarge were used because they supposedly meet the server requirements

Possible limits

Substrate has built-in limits that we suppose ensure a smooth operation of substrate nodes.
However, to find the limits we would like to disable those limits.

We have found the following limits:

  • Maximum block weight
  • Maximum block byte length
  • Transaction pool queue limit

They were increased with this change to substrate-node-template.

Observations

  • The peak transaction throughput measured at 800 TPS.
  • Increasing the client TPS above that decreased the number of transactions in block, meaning more transactions were dropped as the load increased.
  • CPU utilization during this benchmark was ~50% on substrate nodes and ~10% on the client node. There is still room for compute to spare.
  • The client was capable of generating up to 3000 TPS when used on 12-node network. However, most transactions were dropped. This demonstrates that the client was not the bottleneck.
  • We were not able to find where substrate node spends its CPU time. This is complicated by the use of async, and lack of debug info by default. Any suggestions on how to collect performance info (a la flamegraphs) would be greatly appreciated.

We would like to know if we're missing anything from this approach, whether these results are reasonable and if you have any suggestions on how to evaluate end to end substrate performance.

@shawntabrizi
Copy link
Member

shawntabrizi commented Jan 24, 2022

@imlvts awesome write up!

The original benchmark test we did was about 1.5 years ago, and done by @NikVolf.

I also am not 100% positive of exactly the setup that was used, but I don't think you were that far off.

One thing that I don't see you have noted was that this benchmarking network used a 3 second blocktime versus the 6 second block time you find on average across Substrate and Polkadot.

There also was some tuning done to the transaction pool, things like increasing the memory allocated to the tx pool or something like that, since we also saw that without it, tx throughput started going down as we got past a "sweet spot".

Also, did you run your throughput test using wasm or native execution?

The client was capable of generating up to 3000 TPS when used on 12-node network. However, most transactions were dropped. This demonstrates that the client was not the bottleneck.

What does this mean? Are you saying that a client is simply able to sign up to 3000 tranactions per second? Not exactly sure how this is relevant?

We recently hired @ggwpez into Parity, with the singular focus on benchmarking and optimization within FRAME and runtime development. We too were about to go through the process of rebenchmarking all of Substrate, and do so in a reproducible well documented way, so your message comes at perfect timing.

@shawntabrizi
Copy link
Member

Also, @imlvts what team do you work on, or what is prompting you to look at Substrate in this way?

Would be very happy to include you into our efforts here more closely if you are open to that.

@imlvts
Copy link
Author

imlvts commented Jan 24, 2022

@shawntabrizi thank you for your swift response.

Also, did you run your throughput test using wasm or native execution?

Not on the target machines. I'll be sure to do that and get back with the results.

What does this mean? Are you saying that a client is simply able to sign up to 3000 tranactions per second? Not exactly sure how this is relevant?

This means that the client was able to send 3000 extrinsics per second to multiple nodes in total over HTTP RPC. Retrospectively, it may be more efficient to use P2P protocol instead. This is relevant because was a bottleneck at some point.

@bkchr
Copy link
Member

bkchr commented Jan 24, 2022

  • Increasing the client TPS above that decreased the number of transactions in block, meaning more transactions were dropped as the load increased.

If you have a way to reproduce this easily, please share your scripts. Then I can take a look and fix this.

@burdges
Copy link

burdges commented Jan 24, 2022

Is there a sensible strategy for measuring how much the memepool consumes?

@bkchr
Copy link
Member

bkchr commented Jan 24, 2022

Is there a sensible strategy for measuring how much the memepool consumes?

Consumes what?

@apopiak apopiak changed the title Bechmkarking end to end transaction throughput performance Benchmarking end to end transaction throughput performance Jan 25, 2022
@juangirini juangirini transferred this issue from paritytech/substrate Aug 24, 2023
@the-right-joyce the-right-joyce added I10-unconfirmed Issue might be valid, but it's not yet known. and removed J2-unconfirmed labels Aug 25, 2023
jonathanudd pushed a commit to jonathanudd/polkadot-sdk that referenced this issue Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I10-unconfirmed Issue might be valid, but it's not yet known.
Projects
Development

No branches or pull requests

5 participants