Adding the new fastest mandelbrot implementation to benchmarks-game. #14287

tannergooding · 2017-10-02T21:12:14Z

FYI. @JosephTremoulet, @AndyAyersMS, @ViktorHofer, @danmosemsft

New fastest implementation. At this point, I'm pretty certain the only reason it isn't even faster is because the Q6600 processor the official benches use likely doesn't support the optimization where movups is as fast as movaps (it looks like that optimization was added in the subsequent microarchitecture).

tannergooding · 2017-10-02T21:20:09Z

Also FYI. @mellinoe

tannergooding · 2017-10-02T21:21:21Z

@dotnet-bot test Windows_NT x64 perf
@dotnet-bot test Windows_NT x86 perf
@dotnet-bot test linux perf flow

JosephTremoulet · 2017-10-02T21:23:30Z

Thanks for updating. You should remove mandelbrot-4 in this change, we generally don't want to keep more than two variants of each to avoid cluttering the reporting tracking these. You'll also, once this gets merged, want to port it to the release/1.1.0 and release/2.0.0 branches, like #14094 and #14095. Since you're just touching files in these directories, simple cherry-picks should work.

/cc @jorive

benaadams · 2017-10-02T22:08:52Z

There is an issue with the benchmark games in that they also include Jit time; so vectorizing isn't as fast because the Jit startup for Vectors is longer :-/

I was hoping #14244 would elevate this

tannergooding · 2017-10-02T22:24:26Z

Will fix up the build errors shortly.

tannergooding · 2017-10-03T15:37:14Z

You should remove mandelbrot-4 in this change

@JosephTremoulet, any particular reason why mandelbrot-4 over mandelbrot-2?

JosephTremoulet · 2017-10-03T15:52:19Z

any particular reason why mandelbrot-4 over mandelbrot-2?

coreclr/tests/src/JIT/Performance/CodeQuality/BenchmarksGame/README.TXT

Lines 32 to 37 in 8eb9adb

    
           These benchmarks are just a subset of the benchmarks available in C# from 
        
           the Benchmarks Game site. The highest-scoring C# .NET Core variant of each 
        
           benchmark is included, and in the (common) case of benchmarks where the 
        
           best-scoring variant uses multiple threads, we've also selected variants 
        
           that do not rely on multiple threads, to ensure relative benchmark stability 
        
           across a variety of machines.

tannergooding · 2017-10-03T17:14:45Z

@dotnet-bot test Windows_NT x64 perf
@dotnet-bot test Windows_NT x86 perf
@dotnet-bot test linux perf flow

JosephTremoulet

LGTM. It would be good to verify that an InnerIterationCount of 7 is still reasonable (our rule of thumb has been to try to make the duration reported by run-xunit-performance roughly 1000ns).

jorive · 2017-10-04T17:19:57Z

@JosephTremoulet I think you meant to say 1000ms, right?

JosephTremoulet · 2017-10-04T17:46:21Z

@JosephTremoulet I think you meant to say 1000ms, right?

Whoops! Yes, ms, good catch, thanks.

tannergooding · 2017-10-04T17:49:57Z

@JosephTremoulet, what is the correct way to validate that? Still not quite sure how to properly navigate bench view.

ViktorHofer · 2017-10-04T17:51:45Z

Still not quite sure how to properly navigate bench view.

+1

tannergooding · 2017-10-04T17:52:06Z

That is, I can see the numbers for the jobs, etc. I just don't know how to properly compare them for improvement/etc, since Mandelbrot 4 and Mandelbrot 7 are "separate" scenarios.

I see 5619.01ms for Mandelbrot 4 and 6196.77ms for Mandelbrot 7 (duration), which seems backwards since 7 is measurably faster.

JosephTremoulet · 2017-10-04T18:08:56Z

What we've been doing to validate the iteration count is just run the benchmark via run-xunit-performance.cmd locally. The point is just to make sure it's neither too fast-running to produce good measurements and profiles nor so long-running that it wastes lab resources.

The different variants of each test had their iteration counts set independently, to make sure we're getting usable measurements for each. Yes, this means that comparing them to each other in benchview is meaningless, but the goal there is to have usable measurements and look independently at their improvements/regressions over time, not to rank them against each other (which, of course, is what happens over at BenchmarksGame).

tannergooding · 2017-10-04T18:15:46Z

run the benchmark via run-xunit-performance.cmd locally

@JosephTremoulet, what hardware are the actual jobs run against?

If you have a 8-core, 16-thread machine (or higher) this can seriously skew the results as compared to a 4-core, 8-thread machine.

JosephTremoulet · 2017-10-04T18:35:37Z

@JosephTremoulet, what hardware are the actual jobs run against?

Not sure... @jorive?

jorive · 2017-10-04T19:37:57Z

Haswell machines with: 4 cores, 8 threads, 3.6GHz

tannergooding · 2017-10-07T20:20:40Z

7 seems to be fine for the iteration count.

tannergooding · 2017-11-06T17:08:45Z

Am I good to merge?

JosephTremoulet · 2017-11-06T17:19:23Z

Am I good to merge?

Yes. And please port to release/1.1.0 and release/2.0.0 afterward.

Port #14287 to release/2.0.0

dnfclas added the cla-already-signed label Oct 2, 2017

Removing the mandelbrot-4 implementation.

3012693

JosephTremoulet approved these changes Oct 4, 2017

View reviewed changes

Adding the new fastest mandelbrot implementation to benchmarks-game.

81bb85e

tannergooding merged commit 377f063 into dotnet:master Nov 7, 2017

tannergooding added a commit that referenced this pull request Nov 13, 2017

Merge pull request #14898 from tannergooding/benchmarks-game-2.0.0

7ed946a

Port #14287 to release/2.0.0

tannergooding deleted the benchmarks-game branch January 17, 2018 01:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding the new fastest mandelbrot implementation to benchmarks-game. #14287

Adding the new fastest mandelbrot implementation to benchmarks-game. #14287

tannergooding commented Oct 2, 2017

tannergooding commented Oct 2, 2017

tannergooding commented Oct 2, 2017

JosephTremoulet commented Oct 2, 2017

benaadams commented Oct 2, 2017

tannergooding commented Oct 2, 2017

tannergooding commented Oct 3, 2017

JosephTremoulet commented Oct 3, 2017

tannergooding commented Oct 3, 2017

JosephTremoulet left a comment

jorive commented Oct 4, 2017

JosephTremoulet commented Oct 4, 2017

tannergooding commented Oct 4, 2017

ViktorHofer commented Oct 4, 2017

tannergooding commented Oct 4, 2017

JosephTremoulet commented Oct 4, 2017

tannergooding commented Oct 4, 2017 •

edited

Loading

JosephTremoulet commented Oct 4, 2017

jorive commented Oct 4, 2017

tannergooding commented Oct 7, 2017

tannergooding commented Nov 6, 2017

JosephTremoulet commented Nov 6, 2017

Adding the new fastest mandelbrot implementation to benchmarks-game. #14287

Adding the new fastest mandelbrot implementation to benchmarks-game. #14287

Conversation

tannergooding commented Oct 2, 2017

tannergooding commented Oct 2, 2017

tannergooding commented Oct 2, 2017

JosephTremoulet commented Oct 2, 2017

benaadams commented Oct 2, 2017

tannergooding commented Oct 2, 2017

tannergooding commented Oct 3, 2017

JosephTremoulet commented Oct 3, 2017

tannergooding commented Oct 3, 2017

JosephTremoulet left a comment

Choose a reason for hiding this comment

jorive commented Oct 4, 2017

JosephTremoulet commented Oct 4, 2017

tannergooding commented Oct 4, 2017

ViktorHofer commented Oct 4, 2017

tannergooding commented Oct 4, 2017

JosephTremoulet commented Oct 4, 2017

tannergooding commented Oct 4, 2017 • edited Loading

JosephTremoulet commented Oct 4, 2017

jorive commented Oct 4, 2017

tannergooding commented Oct 7, 2017

tannergooding commented Nov 6, 2017

JosephTremoulet commented Nov 6, 2017

tannergooding commented Oct 4, 2017 •

edited

Loading