Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Update BenchmarksGame benchmarks to latest #13994

Merged
merged 8 commits into from
Sep 16, 2017

Conversation

JosephTremoulet
Copy link

Previously, for each benchmark, we had a copy of what was the fastest serialized implementation (since the mechanics of tracking these and spotting regressions/improvements through noise are easier than parallel ones). Update them all to reflect current standings, and to include the best overall for each as well as the best serial implementation for each. Test names are suffixed to match the indices used to distinguish them on the Benchmarks Game website.

@JosephTremoulet
Copy link
Author

@dotnet/jit-contrib

@JosephTremoulet
Copy link
Author

@dotnet/rap-team

@jorive
Copy link
Member

jorive commented Sep 14, 2017

@dotnet-bot test Windows_NT x64 perf
@dotnet-bot test Windows_NT x86 perf
@dotnet-bot test linux perf flow

@JosephTremoulet
Copy link
Author

@dotnet-bot test Ubuntu x64 Checked Build and Test (failure was an I/O error)

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to record the version numbers of each variant from their source control system? Also maybe note the URL for it somewhere?

Also have you looked at per-iteration times to see if they're reasonable?

// See the LICENSE file in the project root for more information.

// Helper functionality to locate inputs and find outputs for
// k-nucleotide benchmark in CoreCLR test harness
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "for reverse-complement"

namespace BenchmarksGame
{
class RevCompSequence { public List<byte[]> Pages; public int StartHeader, EndExclusive; public Thread ReverseThread; }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting looks strange here.

// See the LICENSE file in the project root for more information.

// Helper functionality to locate inputs and find outputs for
// k-nucleotide benchmark in CoreCLR test harness
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo "For regex-redux"

}

static Frequency[] IUB = {
new Frequency ('a', 0.27)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting looks odd here

};

static Frequency[] HomoSapiens = {
new Frequency ('a', 0.3029549426680)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting looks odd here


namespace BenchmarksGame
{
class RevCompSequence { public List<byte[]> Pages; public int StartHeader, EndExclusive; public Thread ReverseThread; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class RevCompSequence { public List<byte[]> Pages; public int StartHeader, EndExclusive; public Thread ReverseThread; } [](start = 4, length = 119)

On vs or vscode, Ctrl+A, Ctrl+K+F

@JosephTremoulet
Copy link
Author

Updated with formatting/typo updates and links to BenchmarksGame CVS repo w/ revision numbers.
Also added some diagnostic output to help determine why the OS X run is failing to find the input files.

@JosephTremoulet
Copy link
Author

Also have you looked at per-iteration times to see if they're reasonable?

Yes, I set InnerIterationCount for each benchmark to make the reported duration ~1000ms on my dev box.

@ViktorHofer
Copy link
Member

I didn't know that we store the benchmarks from BenchmarkGame in coreclr. Is that to improve our internal benchmarks and/or to test possible improvements to later submit to http://benchmarksgame.alioth.debian.org/?

@JosephTremoulet
Copy link
Author

@ViktorHofer it's so that our internal benchmarking will track them, and so we can use that to track the effect of compiler/library/runtime changes on them. They're not intended to diverge from the sources on the official site, except where needed to run in our benchmarking system. They might make a convenient place to test out possible source improvements, but we wouldn't check in such a change; instead, after it was submitted to the official site and became the new best variant, we'd pull that new variant down and retire the old one from our suite.

@ViktorHofer
Copy link
Member

I see, that makes sense! I'm currently looking at those benchmarks and trying to find optimization paths, that's why I asked.

Is there a special reason why you number them e.g. the regex-redux with 1 and 5? why not just name them serial and parallel?

@JosephTremoulet
Copy link
Author

Is there a special reason why you number them e.g. the regex-redux with 1 and 5? why not just name them serial and parallel?

From the perspective of using these to track the performance impact of compiler/library/runtime changes, looking at historical trendlines it's most helpful if they represent fixed sources. If we went with the serial/parallel naming scheme, then when we update to a new fastest variant, there'd be a jump in the historical trendline to stumble over. So I went the route of using the numbers for the names and indicating serial/parallel in comments. Less importantly, this also meant not having to figure out what to name a benchmark when the fastest implementation is a serial one.

@ViktorHofer
Copy link
Member

For historical trendlines it definitely makes sense. I was just thinking about keeping the names to reduce noise when the benchmarks change. But the historical argument is definitely stronger 👍

For each benchmark, grab the current best C# .NET entry, and also grab
the current best serial implementation (since these are easier to work
with from the benchmarking perspective).
Also insert namespace BenchmarksGame.
 - Add result validation
 - Add [Benchmark] attributes and appropriate iteration counts
 - Minor edits here and there to target .NET Standard 1.4
 - Exception: pi-digits rewritten to use managed BitInteger instead of
   p/invoke out to GMP.
Name each variant after its index on the site, not its comparative
status.
This will make it easier to track changes in the future.
Auto-formatting was leaving some new array expressions oddly indented.
@JosephTremoulet
Copy link
Author

OS X passing with latest update, so I'll kick off those perf tests again:

@dotnet-bot test Windows_NT x64 perf
@dotnet-bot test Windows_NT x86 perf
@dotnet-bot test linux perf flow

@JosephTremoulet
Copy link
Author

The Windows_NT arm Cross Checked Build and Test failure is happening because I've removed the old tests but the test script is still trying to run them. The Tests.lst file that lists the old tests has a big comment at the top saying that it's auto-generated...

  1. What do I need to to do re-generate the file?
  2. Why doesn't it get re-generated as part of running the CI?
  3. Failing that, why doesn't the comment telling me the file is auto-generated tell me how to re-generate it?

not sure to whom I should be directing these questions... @jashook?

@BruceForstall
Copy link
Member

@JosephTremoulet It's auto-generated by @jashook, manually. You can (temporarily) change the tests.lst file to cause the removed tests not to run by changing the Categories to:

Categories=EXPECTED_FAIL

The next update to the Tests.lst files will need to include the new
variants of these tests.
@JosephTremoulet
Copy link
Author

Thanks, @BruceForstall. Updated per your suggestion.

@dotnet-bot test Windows_NT x64 perf
@dotnet-bot test Windows_NT x86 perf
@dotnet-bot test linux perf flow

@JosephTremoulet
Copy link
Author

@AndyAyersMS, good with updates?

@AndyAyersMS
Copy link
Member

Yep, thanks for the changes.

@JosephTremoulet JosephTremoulet merged commit cd6b92a into dotnet:master Sep 16, 2017
@JosephTremoulet JosephTremoulet deleted the BenchmarksGame branch September 16, 2017 12:39
JosephTremoulet added a commit to JosephTremoulet/coreclr that referenced this pull request Sep 19, 2017
Change dotnet#13994 moved some tests that were excluded from Helix runs, but
failed to update the exclusion list; fix that oversight and exclude the
tests in their new locations.
JosephTremoulet added a commit to JosephTremoulet/coreclr that referenced this pull request Sep 19, 2017
Change dotnet#13994 moved some tests that were excluded from Helix runs, but
failed to update the exclusion list; fix that oversight and exclude the
tests in their new locations.

Fixes #14034.
JosephTremoulet added a commit to JosephTremoulet/coreclr that referenced this pull request Sep 19, 2017
For each benchmark, grab the current best C# .NET entry, and also grab
the current best serial implementation (since these are easier to work
with from the benchmarking perspective).

(cherry picked from commit 4d9e8b5)

** Apply default VS formatting

Also insert namespace BenchmarksGame.

(cherry picked from commit d0099ff)

** Modify benchmarks to run in perf test harness

 - Add result validation
 - Add [Benchmark] attributes and appropriate iteration counts
 - Minor edits here and there to target .NET Standard 1.4
 - Exception: pi-digits rewritten to use managed BitInteger instead of
   p/invoke out to GMP.

(cherry picked from commit e055116)

** Remove old versions of BenchmarksGame benchmarks

(cherry picked from commit 9a8151f)

** Rename BenchmarksGame files

Name each variant after its index on the site, not its comparative
status.

(cherry picked from commit 087f903)

** Add references to source CVS

This will make it easier to track changes in the future.

(cherry picked from commit be6743d)

** Manual formatting adjustments

Auto-formatting was leaving some new array expressions oddly indented.

(cherry picked from commit 0dcaa77)

** Update BenchmarksGames README.txt

Reflecting recent updates to the snapshot of these tests.

(cherry picked from commit 3bb67e9)

** Fix expected values in fannkuch-redux-5

The validation logic was testing against `chksum`, which actually can
vary depending on the number of processors (as that is used to determine
the number of threads across which the work is partitioned, and the
checksum is sensitive to the bucketing).  Change it to test against
`maxflips` instead, which is stable.

Fixes #14040.

(cherry picked from commit 307188e)

** Update exclusions for moved tests

Change dotnet#13994 moved some tests that were excluded from Helix runs, but
failed to update the exclusion list; fix that oversight and exclude the
tests in their new locations.

Fixes #14034.

(cherry picked from commit 13df954)
JosephTremoulet added a commit to JosephTremoulet/coreclr that referenced this pull request Sep 20, 2017
For each benchmark, grab the current best C# .NET entry, and also grab
the current best serial implementation (since these are easier to work
with from the benchmarking perspective).

(cherry picked from commit 4d9e8b5)

** Apply default VS formatting

Also insert namespace BenchmarksGame.

(cherry picked from commit d0099ff)

** Modify benchmarks to run in perf test harness

 - Add result validation
 - Add [Benchmark] attributes and appropriate iteration counts
 - Minor edits here and there to target .NET Standard 1.4
 - Exception: pi-digits rewritten to use managed BitInteger instead of
   p/invoke out to GMP.

(cherry picked from commit e055116)

** Remove old versions of BenchmarksGame benchmarks

(cherry picked from commit 9a8151f)

** Rename BenchmarksGame files

Name each variant after its index on the site, not its comparative
status.

(cherry picked from commit 087f903)

** Add references to source CVS

This will make it easier to track changes in the future.

(cherry picked from commit be6743d)

** Manual formatting adjustments

Auto-formatting was leaving some new array expressions oddly indented.

(cherry picked from commit 0dcaa77)

** Update BenchmarksGames README.txt

Reflecting recent updates to the snapshot of these tests.

(cherry picked from commit 3bb67e9)

** Fix expected values in fannkuch-redux-5

The validation logic was testing against `chksum`, which actually can
vary depending on the number of processors (as that is used to determine
the number of threads across which the work is partitioned, and the
checksum is sensitive to the bucketing).  Change it to test against
`maxflips` instead, which is stable.

Fixes #14040.

(cherry picked from commit 307188e)

** Update exclusions for moved tests

Change dotnet#13994 moved some tests that were excluded from Helix runs, but
failed to update the exclusion list; fix that oversight and exclude the
tests in their new locations.

Fixes #14034.

(cherry picked from commit 13df954)

** Reset static state per iteration for k-nucleotide-9 (dotnet#14081)

Otherwise iterations keep getting slower and slower.

Also bump inner iteration count to 10 to restore the nominal one second
duration per iteration.

(cherry picked from commit fd1000c)
BruceForstall added a commit to BruceForstall/coreclr that referenced this pull request Dec 19, 2017
It was removed in dotnet#13994 when these benchmarks were updated, expecting
a tests.lst autogeneration would follow. I'm adding them manually now
to ensure we have test coverage for these.

Fixes #15503
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants