Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a performance dashboard #156

Closed
5 of 6 tasks
p-shahi opened this issue Mar 21, 2023 · 16 comments
Closed
5 of 6 tasks

Create a performance dashboard #156

p-shahi opened this issue Mar 21, 2023 · 16 comments
Assignees

Comments

@p-shahi
Copy link
Member

p-shahi commented Mar 21, 2023

Overarching tracking issue: #63

Goal

Create a dashboard that can showcase libp2p performance benchmark results, more specifically benchmark-results.json.

Tasks

Preview Give feedback
@MarcoPolo
Copy link
Contributor

I have some more thoughts on what the runner looks like and what implmentations should implement here: https://github.com/libp2p/test-plans/blob/marco/perf/perf-dashboard/README.md

@BigLep
Copy link
Contributor

BigLep commented Jun 3, 2023

@mxinden : Some thoughts immediately coming to mind when taking a look at https://observablehq.com/@mxinden-workspace/libp2p-performance-dashboard

  1. I believe a best practice is that any time we discuss percentiles, we also share the "n". p90 isn't very meaningful of a sample size of 5. It is for a sample size of 100+.
  2. Will we expose more dimensionality or combinational (security, multiplexers). The reasons I think we want that are to show why things like QUIC are better and to also catch when there are regressions.
  3. How are you going to show performance over time? I was assuming you would just have multiple versions on each plot
  4. I think it could be useful to have a way to add filters for the results they want to focus on.
    • Maybe a textbox where could enter reges? For example, "iperf*,rust*" would match all results for iperf or rust. I'm thinking we have a bunch of "dimension sets" (I'm not sure what to call this, but basically what you have for "rust-libp2p/v0.52/quic") and then any "dimension set" that matches the regex would be included.
  • This set of filters that are applied should make it into the URL as well so the view is bookmarkable.

What are the next steps in your mind?

Has or will there be a maintainer review? I would like to make sure the team is aligned on where we're headed and will end up.

Thanks for putting this together.

@mxinden
Copy link
Member

mxinden commented Jun 5, 2023

I believe a best practice is that any time we discuss percentiles, we also share the "n". p90 isn't very meaningful of a sample size of 5. It is for a sample size of 100+.

👍 Good point. Thanks. I plan to increase the sample size of the upload and download tests. Will also make the sample size more explicit on the dashboard.

Will we expose more dimensionality or combinational (security, multiplexers). The reasons I think we want that are to show why things like QUIC are better and to also catch when there are regressions.

In regards to security protocols, yes, though only tracked as a follow-up, i.e. not in the first iteration. Tracked in "Outstanding work for future pull requests:" in #184.

In regards to muxers, we really only have one muxer per transport protocol. E.g. on TCP we only have Yamux at this point. I don't we should consider mplex a valid option at this point. See libp2p/specs#402.

How are you going to show performance over time? I was assuming you would just have multiple versions on each plot

Today there is a single version per implementation only. Once we have more than one version per implementation we can visualize the data. Support for multiple versions per implementation is already in #184. Adding support in the dashboard is not difficult.

A placeholder is already on the dashboard, see bottom:

Performance over time (across releases)

Given that all our implementations implement the perf protocol for one version only, this section is still blank.

I think it could be useful to have a way to add filters for the results they want to focus on.

Sounds good to me. You can already filter implementations in the visualizations, i.e. disable showing some. I suggest once we have more graphs, we can introduce more advanced filtering mechanisms. Though at this point, I don't want to add a filtering mechanism without the need for filtering.

Has or will there be a maintainer review? I would like to make sure the team is aligned on where we're headed and will end up.

Not yet. At this point I want to drive consensus and get the first iteration of #184 into master. Once we have that I will continue work on the dashboard and ask for more input.

Thanks @BigLep. Updated the task list in the issue description.

@mxinden
Copy link
Member

mxinden commented Jun 27, 2023

The libp2p Performance Dashboard is now ready for review.

https://observablehq.com/@libp2p-workspace/performance-dashboard

@libp2p/github-mgmt-stewards, @sukunrt and @thomaseizinger I welcome your feedback and suggestions.. If you have an ObservableHQ account feel free to comment on the notebook directly. If not, comments on this GitHub issue are just fine.

I am tracking outstanding tasks in the description of this GitHub issue and on #63.

Also in case you have an ObservableHQ account and would like to be part of the libp2p ObservableHQ workspace, ping me.

@thomaseizinger
Copy link
Contributor

@libp2p/github-mgmt-stewards, @sukunrt and @thomaseizinger I welcome your feedback and suggestions

It is really cool! From looking at it, I take that we've moved away from the idea of making this an interactive blog post and instead just represent the data? For a blog post, the text between the visualisations is a bit sparse.

I know you've been identifying and fixing some minor issues already as a result of this work and #63 tracks additional work around the tooling. Do we also have an issue that collects ideas on how we can improve performance based on these measurements?

@ianopolous
Copy link

Nice! Are there plans to include other languages here?

@mxinden
Copy link
Member

mxinden commented Jun 28, 2023

It is really cool! From looking at it, I take that we've moved away from the idea of making this an interactive blog post and instead just represent the data? For a blog post, the text between the visualisations is a bit sparse.

@thomaseizinger blog post is shelved for now, waiting for some optimizations to land.

Non-sparse draft can be found here: https://observablehq.com/@mxinden-workspace/libp2p-perf

I know you've been identifying and fixing some minor issues already as a result of this work and #63 tracks additional work around the tooling. Do we also have an issue that collects ideas on how we can improve performance based on these measurements?

Not so far. Some items thus far:

Nice! Are there plans to include other languages here?

@ianopolous ideally all of them, including jvm-libp2p. Do you have time to drive a jvm-libp2p integration? Vague guess is that this would take you 4h total. Happy to help.

See https://github.com/libp2p/test-plans/blob/master/perf/README.md#adding-a-new-implementation

@ianopolous
Copy link

@mxinden Is the current methodology to spin up an instance, make a measurement, and then kill it? If so we'll need to change this to have some warm up to get meaningful results for the jitted languages like JS and Java.

@mxinden
Copy link
Member

mxinden commented Jun 28, 2023

Is the current methodology to spin up an instance, make a measurement, and then kill it?

Correct. No warm-up phase currently. Each iteration is a new process.

If so we'll need to change this to have some warm up to get meaningful results for the jitted languages like JS and Java.

Good point. Though, based on intuition, I suggest we give it a shot anyways. I would assume that neither of of our tests today (latency, throughput) are CPU bound. Let's not prematurely fix this issue without seeing it happen.

@mxinden
Copy link
Member

mxinden commented Jul 3, 2023

Status Update


@libp2p/github-mgmt-stewards, @sukunrt I welcome your feedback and suggestions.. If you have an ObservableHQ account feel free to comment on the notebook directly. If not, comments on this GitHub issue are just fine.

I am tracking outstanding tasks in the description of this GitHub issue and on #63.

Also in case you have an ObservableHQ account and would like to be part of the libp2p ObservableHQ workspace, ping me.

Friendly ping @libp2p/github-mgmt-stewards, @sukunrt in case you want to review.

@BigLep
Copy link
Contributor

BigLep commented Jul 17, 2023

Thanks @mxinden . Things coming to mind when looking at https://observablehq.com/@libp2p-workspace/performance-dashboard fresh...

  1. go-libp2p 0.29 has been released, but I don't see it reflected. What is the process for updating this? Ideally this should be automated or part of release process.
    • Related it would be great to show the latest in master/main across versions so implementers can catch regressions before doing a release.
  2. Generally for any of the benchmarks, I want to know when do we start the timer and when does it end. I actually don't think we're explicit about this in https://github.com/libp2p/specs/blob/master/perf/perf.md and maybe we should get that updated there and just link there? https://github.com/libp2p/specs/blob/master/perf/perf.md#single-connection-throughput hits on this with "the total time it took from stream open to stream close" but that would imply we're including the "Tell the server how many bytes we want the server to send us as a single big-endian uint64 number" step which I assume is fine in practice, but is "extra work".
  3. How is someone going to discover this dashboard? I expect there will be multiple inroads. I want to make sure we're capitalizing on this work to give an overwhelming sense of "wow, libp2p maintainers take performance seriously..."
  4. I left some small cosmetic feedback in https://observablehq.com/@libp2p-workspace/performance-dashboard . I didn't quickly see how I could make edits without forking.

@MarcoPolo
Copy link
Contributor

  1. Generally for any of the benchmarks, I want to know when do we start the timer and when does it end. I actually don't think we're explicit about this in https://github.com/libp2p/specs/blob/master/perf/perf.md and maybe we should get that updated there and just link there? https://github.com/libp2p/specs/blob/master/perf/perf.md#single-connection-throughput hits on this with "the total time it took from stream open to stream close" but that would imply we're including the "Tell the server how many bytes we want the server to send us as a single big-endian uint64 number" step which I assume is fine in practice, but is "extra work".

This has to be the time from when we open the stream to when we read EOF from the server. The issue is we don't really know when the bytes have been uploaded and received. Even if they are given to the kernel they still haven't been sent. And even then they may be dropped and need to be retransmitted. The only true measurement we can make is when the test ends.

@BigLep
Copy link
Contributor

BigLep commented Jul 18, 2023

Good points - makes sense. This still affirms for me that we should be explicit for the benchmarks we're reporting on "when" the timer starts/stops and "where" the timer is (client or server).

@BigLep
Copy link
Contributor

BigLep commented Jul 18, 2023

Good points - makes sense. This still affirms for me that we should be explicit for the benchmarks we're reporting on "when" the timer starts/stops and "where" the timer is (client or server).

I guess this is fine as is. The dashboard links to https://github.com/libp2p/test-plans/tree/master/perf, which under the implementation section discusses "the measurement includes the time to (1) establish the connection, (2) upload the bytes and (3) download the bytes". I think it would be ideal to make it more front and center when/where the timing is happening, but the information is discoverable, so this can be skipped.

@mxinden
Copy link
Member

mxinden commented Jul 20, 2023

go-libp2p 0.29 has been released, but I don't see it reflected. What is the process for updating this? Ideally this should be automated or part of release process.

On my TODO list for v0.29. For future versions I will try to get it onto the go-libp2p release process documentation.

How is someone going to discover this dashboard? I expect there will be multiple inroads. I want to make sure we're capitalizing on this work to give an overwhelming sense of "wow, libp2p maintainers take performance seriously..."

Dashboard is linked here:

https://github.com/libp2p/test-plans/tree/master/perf#libp2p-performance-benchmarking

Another "inroad" will be the blog post.

I left some small cosmetic feedback in https://observablehq.com/@libp2p-workspace/performance-dashboard . I didn't quickly see how I could make edits without forking.

Saw them. Thanks @BigLep.

@mxinden
Copy link
Member

mxinden commented Jul 26, 2023

I addressed all dashboard-related outstanding items, thus closing here. I will take the last outstanding item, namely to " ideally mirrored in the URL for better sharing" to the main tracking issue.

@mxinden mxinden closed this as completed Jul 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants