Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Micro)Benchmarking reliability and consistency #753

Open
NthPortal opened this issue Dec 6, 2020 · 8 comments
Open

(Micro)Benchmarking reliability and consistency #753

NthPortal opened this issue Dec 6, 2020 · 8 comments
Assignees

Comments

@NthPortal
Copy link

Motivation

A frequent request for scala/scala PRs (particularly collections changes) is that the changes be benchmarked; however, many obstacles exist for contributors running benchmarks on their personal computers, to the extent that many or perhaps most results would generously be classified as "questionable".

Background

The following are some common causes of performance/timing variance, and whether a particular type of machine avoids it.

Laptop Overclocked Desktop Normally-clocked Desktop
Boost clock speed change 1
Thermal throttling 2
Background tasks

1 While theoretically possible to turn off overclocking/boost-clocking on a laptop, the CPU may clock down due to even brief changes in battery/power state as well.
2 A normally-clocked desktop with good ventilation and cooling shouldn't thermally throttle, but neither of those is a guarantee in a person's home (sometimes cats sit on computers, for example).

The only type of machine that avoids any of these issues is a normally-clocked desktop, and not everyone has one of those (many of us only have laptops).

Additionally, all personal computers suffer from the problem that there are almost certainly background tasks (if not foreground tasks) running on them at all times. Benchmarks can take a long time to run, and even if someone can manage to not use their computer for an hour or two while benchmarks run, they probably don't want to have to close their web browser, 3+ chat applications (that are all electron, so basically also web browsers), and half a dozen other running programs and services. If they can't spare potentially multiple hours of their computer being tied up, it's even worse, with foreground tasks taking arbitrary and inconsistent CPU time.

Ideal Setup

To have benchmarking be reliable, it should be done on a dedicated machine running nothing else, and where cron/scheduled jobs are never running while a benchmark is running.


How do we reliably benchmark library changes?

@lrytz
Copy link
Member

lrytz commented Dec 6, 2020

Thanks for bringing this up! For more technical aspects, see also #338.

We have one machine that we use for compiler benchmarks. It's not that busy, maybe we can find a good way to make it available to contributors; allow them to take the machine offline in Jenkins and ssh to it.

@lrytz
Copy link
Member

lrytz commented Dec 8, 2020

@retronym says he'll look into this.

@retronym
Copy link
Member

I spend some time trying to get our Jenkins instance to have a parameterized Job that could run specified benchmarks on our benchmarking server. Jenkins seems to actively resist this and wouldn't save my job configs and I had to park the attempt. I'll try again...

@retronym
Copy link
Member

@retronym
Copy link
Member

@adriaanm That Jenkins ticket mentions disabling the notifications plugin as a workaround -- after that the save/apply button actions on the job config UI worked again. I notice we're running 1.13 of the plugin but previous were running a custom build you'd created:

image

Can you provide context for that custom version? Is this something that you're working on now or something you worked on previously?

@Ichoran
Copy link

Ichoran commented Dec 16, 2020

There are a variety of ways to solve this experimentally instead of with quiet hardware. For instance, you can halve the number of iterations and run the whole thing twice. If any of the head-to-head comparisons aren't stable, you disbelieve the whole lot and do it all again. Usually, in my experience, they are pretty stable even on a laptop as long as you're not doing a million other things at the same time. (Watching video + compiling + benchmarking is probably a bad idea. Editing code and benchmarking is probably fine.)

You do always have to run the benchmarks head-to-head at roughly the same time and not expect them to be stable over days/months/whatever. If you're trying to search for performance regressions then you do want the quiet machine approach. But for regular PRs, I don't think it's necessary.

Note that a bigger problem is different architectures. It's often the case that code that is faster on one architecture is slower on the other. So you can have different people making different decisions about high-performance code based on accurate microbenchmarking on different hardware.

@SethTisue
Copy link
Member

an older ticket with a bunch of benchmarking advice: #606

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants