Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Adding the new fastest mandelbrot implementation to benchmarks-game. #14287

Merged
merged 2 commits into from
Nov 7, 2017
Merged

Adding the new fastest mandelbrot implementation to benchmarks-game. #14287

merged 2 commits into from
Nov 7, 2017

Conversation

tannergooding
Copy link
Member

FYI. @JosephTremoulet, @AndyAyersMS, @ViktorHofer, @danmosemsft

New fastest implementation. At this point, I'm pretty certain the only reason it isn't even faster is because the Q6600 processor the official benches use likely doesn't support the optimization where movups is as fast as movaps (it looks like that optimization was added in the subsequent microarchitecture).

@tannergooding
Copy link
Member Author

Also FYI. @mellinoe

@tannergooding
Copy link
Member Author

@dotnet-bot test Windows_NT x64 perf
@dotnet-bot test Windows_NT x86 perf
@dotnet-bot test linux perf flow

@JosephTremoulet
Copy link

Thanks for updating. You should remove mandelbrot-4 in this change, we generally don't want to keep more than two variants of each to avoid cluttering the reporting tracking these. You'll also, once this gets merged, want to port it to the release/1.1.0 and release/2.0.0 branches, like #14094 and #14095. Since you're just touching files in these directories, simple cherry-picks should work.

/cc @jorive

@benaadams
Copy link
Member

There is an issue with the benchmark games in that they also include Jit time; so vectorizing isn't as fast because the Jit startup for Vectors is longer :-/

I was hoping #14244 would elevate this

@tannergooding
Copy link
Member Author

Will fix up the build errors shortly.

@tannergooding
Copy link
Member Author

You should remove mandelbrot-4 in this change

@JosephTremoulet, any particular reason why mandelbrot-4 over mandelbrot-2?

@JosephTremoulet
Copy link

any particular reason why mandelbrot-4 over mandelbrot-2?

These benchmarks are just a subset of the benchmarks available in C# from
the Benchmarks Game site. The highest-scoring C# .NET Core variant of each
benchmark is included, and in the (common) case of benchmarks where the
best-scoring variant uses multiple threads, we've also selected variants
that do not rely on multiple threads, to ensure relative benchmark stability
across a variety of machines.

@tannergooding
Copy link
Member Author

@dotnet-bot test Windows_NT x64 perf
@dotnet-bot test Windows_NT x86 perf
@dotnet-bot test linux perf flow

Copy link

@JosephTremoulet JosephTremoulet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It would be good to verify that an InnerIterationCount of 7 is still reasonable (our rule of thumb has been to try to make the duration reported by run-xunit-performance roughly 1000ns).

@jorive
Copy link
Member

jorive commented Oct 4, 2017

@JosephTremoulet I think you meant to say 1000ms, right?

@JosephTremoulet
Copy link

@JosephTremoulet I think you meant to say 1000ms, right?

Whoops! Yes, ms, good catch, thanks.

@tannergooding
Copy link
Member Author

@JosephTremoulet, what is the correct way to validate that? Still not quite sure how to properly navigate bench view.

@ViktorHofer
Copy link
Member

Still not quite sure how to properly navigate bench view.

+1

@tannergooding
Copy link
Member Author

That is, I can see the numbers for the jobs, etc. I just don't know how to properly compare them for improvement/etc, since Mandelbrot 4 and Mandelbrot 7 are "separate" scenarios.

I see 5619.01ms for Mandelbrot 4 and 6196.77ms for Mandelbrot 7 (duration), which seems backwards since 7 is measurably faster.

@JosephTremoulet
Copy link

What we've been doing to validate the iteration count is just run the benchmark via run-xunit-performance.cmd locally. The point is just to make sure it's neither too fast-running to produce good measurements and profiles nor so long-running that it wastes lab resources.

The different variants of each test had their iteration counts set independently, to make sure we're getting usable measurements for each. Yes, this means that comparing them to each other in benchview is meaningless, but the goal there is to have usable measurements and look independently at their improvements/regressions over time, not to rank them against each other (which, of course, is what happens over at BenchmarksGame).

@tannergooding
Copy link
Member Author

tannergooding commented Oct 4, 2017

run the benchmark via run-xunit-performance.cmd locally

@JosephTremoulet, what hardware are the actual jobs run against?

If you have a 8-core, 16-thread machine (or higher) this can seriously skew the results as compared to a 4-core, 8-thread machine.

@JosephTremoulet
Copy link

@JosephTremoulet, what hardware are the actual jobs run against?

Not sure... @jorive?

@jorive
Copy link
Member

jorive commented Oct 4, 2017

Haswell machines with: 4 cores, 8 threads, 3.6GHz

@tannergooding
Copy link
Member Author

7 seems to be fine for the iteration count.

@tannergooding
Copy link
Member Author

Am I good to merge?

@JosephTremoulet
Copy link

Am I good to merge?

Yes. And please port to release/1.1.0 and release/2.0.0 afterward.

@tannergooding tannergooding merged commit 377f063 into dotnet:master Nov 7, 2017
tannergooding added a commit that referenced this pull request Nov 13, 2017
@tannergooding tannergooding deleted the benchmarks-game branch January 17, 2018 01:52
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants