Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking feedback/notes #222

Open
MarcoPolo opened this issue Jul 5, 2023 · 4 comments
Open

Benchmarking feedback/notes #222

MarcoPolo opened this issue Jul 5, 2023 · 4 comments

Comments

@MarcoPolo
Copy link
Contributor

MarcoPolo commented Jul 5, 2023

Spent a bit of time looking at some parts of the benchmarking setup, and had a couple of notes and comments:

  • I think we're using iperf wrong. We are using the data from the sender, we should be looking at the receiver. Notice how the Bitrate is very different for sender vs receiver in this example:
$ iperf -c 127.0.0.1 -u -b 10g
Connecting to host 127.0.0.1, port 5201
[  5] local 127.0.0.1 port 50191 connected to 127.0.0.1 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec  1.16 GBytes  10.0 Gbits/sec  38132  
[  5]   1.00-2.00   sec  1.16 GBytes  10.0 Gbits/sec  38161  
[  5]   2.00-3.00   sec  1.16 GBytes  9.99 Gbits/sec  38106  
[  5]   3.00-4.00   sec  1.17 GBytes  10.0 Gbits/sec  38184  
[  5]   4.00-5.00   sec  1.16 GBytes  10.0 Gbits/sec  38151  
[  5]   5.00-6.00   sec  1.16 GBytes  10.0 Gbits/sec  38143  
[  5]   6.00-7.00   sec  1.16 GBytes  9.99 Gbits/sec  38114  
[  5]   7.00-8.00   sec  1.16 GBytes  10.0 Gbits/sec  38165  
[  5]   8.00-9.00   sec  1.16 GBytes  9.99 Gbits/sec  38140  
[  5]   9.00-10.00  sec  1.16 GBytes  10.0 Gbits/sec  38169  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec  11.6 GBytes  10.0 Gbits/sec  0.000 ms  0/381465 (0%)  sender
[  5]   0.00-10.00  sec  8.43 GBytes  7.24 Gbits/sec  0.023 ms  105024/381411 (28%)  receiver

We need to use the bitrate on the receiver side. The sender can push as much data as you want, but for these measurements we care about the data that was actually received. Look at the difference here: https://github.com/libp2p/test-plans/actions/runs/5466146370/jobs/9950640038#step:12:29

  • The hypothetical max for this use case should be 50% of the instance bandwidth according to https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html. (3.4 Gbps). I think it's worth linking this doc somewhere.
  • The "local" vs "remote" backends are a bit confusing. These are both running on AWS hardware. Could we consolidate them or rename them? I would suggest alternate names, but I don't really understand them.
  • What's the ami of the short-lived module? Doesn't seem set, and I can't find the default
  • Should we make sure to set the MTU to 1500? (This might not be the default)
  • Do we need to bump the UDP send window as well? I'm not sure, but it might be fine since quic-go doesn't complain about it. Any insight here @marten-seemann?
  • Can we add comments around the AMI ids to describe them? It wasn't clear that these were the Amazon Linux AMIs
    • Maybe include this one-liner:
aws ec2 describe-images \
    --image-id ami-06e46074ae430fba6 \
    --query "Images[*].Description[]" \
    --output text \
    --region us-east-1
@MarcoPolo
Copy link
Contributor Author

cc @mxinden

@marten-seemann
Copy link
Contributor

Regarding iperf: Thank you for digging into this, @MarcoPolo!

It would be really useful to have an iperf on TCP for comparison, as I've asked for in my review last month. UDP and TCP shouldn't differ by too much.

  • Do we need to bump the UDP send window as well? I'm not sure, but it might be fine since quic-go doesn't complain about it. Any insight here @marten-seemann?

It certainly won't hurt. quic-go forces an increase in buffer size (thanks to your PR: quic-go/quic-go#3804), if run with sufficient permissions. I'm not sure if iperf does the same. The cost of running two sysctl commands during setup seems low, and we'll probably achieve a more reproducible setup. I'd recommend setting it to 10 MB each.
We might need to tweak TCP (flow control, congestion control?) window sizes as well, depending on the iperf / TCP result.

@mxinden
Copy link
Member

mxinden commented Jul 11, 2023

Thank you @MarcoPolo!

I am sorry for the delay. I am currently focusing on https://github.com/protocol/bifrost-infra/issues/2622. I have not forgotten about this issue.

@mxinden
Copy link
Member

mxinden commented Aug 31, 2023

Documenting progress thus far:

We need to use the bitrate on the receiver side. The sender can push as much data as you want, but for these measurements we care about the data that was actually received. Look at the difference here: https://github.com/libp2p/test-plans/actions/runs/5466146370/jobs/9950640038#step:12:29

Good call-out. Thank you. Addressed in #241.

The hypothetical max for this use case should be 50% of the instance bandwidth according to https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html. (3.4 Gbps). I think it's worth linking this doc somewhere.

I think the relevant limit in our case is the single flow limit of 5gbit/s. That is also what we see in #276.

The "local" vs "remote" backends are a bit confusing. These are both running on AWS hardware. Could we consolidate them or rename them? I would suggest alternate names, but I don't really understand them.

I don't have an opinion on these names. If someone can come up with a better name, please post here and I will change it. Until then, I treat this as a low priority.

What's the ami of the short-lived module? Doesn't seem set, and I can't find the default

module "long_lived_server" {
count = var.long_lived_enabled ? 1 : 0
source = "../../modules/long_lived"
region = "us-west-2"
ami = "ami-0747e613a2a1ff483"
providers = {
aws = aws.us-west-2
}
}
module "long_lived_client" {
count = var.long_lived_enabled ? 1 : 0
source = "../../modules/long_lived"
region = "us-east-1"
ami = "ami-06e46074ae430fba6"
providers = {
aws = aws.us-east-1
}
}

Do we need to bump the UDP send window as well? I'm not sure, but it might be fine since quic-go doesn't complain about it. Any insight here @marten-seemann?

👍 done in #254.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants