Benchmarking end to end transaction throughput performance #396

imlvts · 2022-01-24T08:29:58Z

Benchmarking end to end transaction throughput performance

It would be nice for substrate to have a setup/demonstration/documentation for of E2E benchmarks.
There are plenty of benchmarks in the codebase and a runtime benchmarking framework,
however we were not able to find E2E benchmarks demonstrating throughput of the network.

Furthermore, our benchmarks show peak throughput at around 800 transactions per second,
which is a bit lower than claimed 1000 TPS.
We would like to know where the discrepancy comes from, how to achieve higher throughput
and learn how to analyze Substrate's performance.

Setup

Since we haven't been able to find the setup for E2E benchmarks, we've implemented the following setup:

4x AWS t3.xlarge* instances, each running a substrate node in the same network
1x custom client node that creates transactions over HTTP RPC, evenly distributing between nodes
1x Prometheus server collecting stats from substrate nodes

* t3.xlarge were used because they supposedly meet the server requirements

Possible limits

Substrate has built-in limits that we suppose ensure a smooth operation of substrate nodes.
However, to find the limits we would like to disable those limits.

We have found the following limits:

Maximum block weight
Maximum block byte length
Transaction pool queue limit

They were increased with this change to substrate-node-template.

Observations

The peak transaction throughput measured at 800 TPS.
Increasing the client TPS above that decreased the number of transactions in block, meaning more transactions were dropped as the load increased.
CPU utilization during this benchmark was ~50% on substrate nodes and ~10% on the client node. There is still room for compute to spare.
The client was capable of generating up to 3000 TPS when used on 12-node network. However, most transactions were dropped. This demonstrates that the client was not the bottleneck.
We were not able to find where substrate node spends its CPU time. This is complicated by the use of async, and lack of debug info by default. Any suggestions on how to collect performance info (a la flamegraphs) would be greatly appreciated.

We would like to know if we're missing anything from this approach, whether these results are reasonable and if you have any suggestions on how to evaluate end to end substrate performance.

shawntabrizi · 2022-01-24T09:26:02Z

@imlvts awesome write up!

The original benchmark test we did was about 1.5 years ago, and done by @NikVolf.

I also am not 100% positive of exactly the setup that was used, but I don't think you were that far off.

One thing that I don't see you have noted was that this benchmarking network used a 3 second blocktime versus the 6 second block time you find on average across Substrate and Polkadot.

There also was some tuning done to the transaction pool, things like increasing the memory allocated to the tx pool or something like that, since we also saw that without it, tx throughput started going down as we got past a "sweet spot".

Also, did you run your throughput test using wasm or native execution?

The client was capable of generating up to 3000 TPS when used on 12-node network. However, most transactions were dropped. This demonstrates that the client was not the bottleneck.

What does this mean? Are you saying that a client is simply able to sign up to 3000 tranactions per second? Not exactly sure how this is relevant?

We recently hired @ggwpez into Parity, with the singular focus on benchmarking and optimization within FRAME and runtime development. We too were about to go through the process of rebenchmarking all of Substrate, and do so in a reproducible well documented way, so your message comes at perfect timing.

shawntabrizi · 2022-01-24T09:28:53Z

Also, @imlvts what team do you work on, or what is prompting you to look at Substrate in this way?

Would be very happy to include you into our efforts here more closely if you are open to that.

imlvts · 2022-01-24T09:50:11Z

@shawntabrizi thank you for your swift response.

Also, did you run your throughput test using wasm or native execution?

Not on the target machines. I'll be sure to do that and get back with the results.

What does this mean? Are you saying that a client is simply able to sign up to 3000 tranactions per second? Not exactly sure how this is relevant?

This means that the client was able to send 3000 extrinsics per second to multiple nodes in total over HTTP RPC. Retrospectively, it may be more efficient to use P2P protocol instead. This is relevant because was a bottleneck at some point.

bkchr · 2022-01-24T09:55:47Z

Increasing the client TPS above that decreased the number of transactions in block, meaning more transactions were dropped as the load increased.

If you have a way to reproduce this easily, please share your scripts. Then I can take a look and fix this.

burdges · 2022-01-24T16:41:04Z

Is there a sensible strategy for measuring how much the memepool consumes?

bkchr · 2022-01-24T19:29:59Z

Is there a sensible strategy for measuring how much the memepool consumes?

Consumes what?

…h#396)

github-actions bot added the J2-unconfirmed label Jan 24, 2022

apopiak changed the title ~~Bechmkarking end to end transaction throughput performance~~ Benchmarking end to end transaction throughput performance Jan 25, 2022

shawntabrizi added the Z7-question label Jan 25, 2022

This was referenced Apr 20, 2022

Where is the TPS test tool paritytech/substrate#11152

Closed

Benchmark Weight Assumptions #393

Open

juangirini transferred this issue from paritytech/substrate Aug 24, 2023

the-right-joyce added I10-unconfirmed Issue might be valid, but it's not yet known. and removed J2-unconfirmed labels Aug 25, 2023

claravanstaden pushed a commit to Snowfork/polkadot-sdk that referenced this issue Dec 8, 2023

Update baseline hardware / software in docs to c5a.2xlarge (paritytec…

8f22e3b

…h#396)

jonathanudd pushed a commit to jonathanudd/polkadot-sdk that referenced this issue Apr 10, 2024

Bump serde_json from 1.0.57 to 1.0.58 (paritytech#396)

fe2d786

This was referenced Jun 5, 2024

Update polkadot-sdk from v1.7.0 to v1.11.0 moondance-labs/tanssi#573

Closed

Update polkadot-sdk from v1.10.0 to v1.11.0 moondance-labs/tanssi#577

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking end to end transaction throughput performance #396

Benchmarking end to end transaction throughput performance #396

imlvts commented Jan 24, 2022 •

edited by apopiak

Loading

shawntabrizi commented Jan 24, 2022 •

edited

Loading

shawntabrizi commented Jan 24, 2022

imlvts commented Jan 24, 2022

bkchr commented Jan 24, 2022

burdges commented Jan 24, 2022

bkchr commented Jan 24, 2022

Benchmarking end to end transaction throughput performance #396

Benchmarking end to end transaction throughput performance #396

Comments

imlvts commented Jan 24, 2022 • edited by apopiak Loading

Benchmarking end to end transaction throughput performance

Setup

Possible limits

Observations

shawntabrizi commented Jan 24, 2022 • edited Loading

shawntabrizi commented Jan 24, 2022

imlvts commented Jan 24, 2022

bkchr commented Jan 24, 2022

burdges commented Jan 24, 2022

bkchr commented Jan 24, 2022

imlvts commented Jan 24, 2022 •

edited by apopiak

Loading

shawntabrizi commented Jan 24, 2022 •

edited

Loading