-
Notifications
You must be signed in to change notification settings - Fork 6
Run performance regression checks in TravisCI #38
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I didn't merge yet as there are some failures. I haven't used Travis much before but I suspect that the performance might vary depending on the VM that the Travis backend is scheduling the work on. Thus it might be difficult to get reliable performance results? |
I tried to mitigate against this by have perfcheck collect benchmarks for the current revision and then collect benchmarks for the reference revision within the same test run so at least the benchmarks are collected and compared on the same VM at around the same time. It's looking like that's not quite enough though. This is likely because the underlying VM host has highly variable load. Maybe there needs to be more runs of benchmarks with the minimums for each benchmark being compared, but that might still not be enough. I could also look at increasing the maximum allowed variance (currently 10%). |
Adding some more iterations to the small test runs (perhaps they need to be called "medium" at that point) might help reduce effects caused by jitter. If necessary I don't mind increasing the variance to, say 30%. That should help us catch the most serious regressions. For the more fine-grained performance testing we can use our own dedicated boxes. |
I've been experiementing with Russ Cox's benchstat tool. It analyses the output of multiple benchmark runs for 2 revisions to compare and reports if the change is statistically significant. This is somewhat smarter than what benchcmp/benchcheck does. It looks promising on my machine. I'm now gathering data from a run on TravisCI with a modified perfcheck script which runs the benchmarks 5 times the current and reference revision (interleaved to minimise the effect of other load on the host) to see how it would work there. If it works out we can get rid of the benchcheck tool (which a hacked version of benchcmp. |
Instead of being based on benchcmp benchcmp now uses benchstat. benchstat analyses multiple benchmark runs and only reports problems when statistically signficant. This helps to avoid false positives when looking for performance regressions. Side benefit: The new benchcheck involves much less code.
To avoid false positives, reference and current benchmarks are now run 5 times - interleaved to minimise the effects of host system load. The outputs from each run are concatenated together resulting in a file of benchmark output for the reference revision & the current revision. These are then passed to the new benchstat-based benchcheck tool. This is approach is much more reliable than the previous approach.
I've just pushed the new approach here. I'm anticipating that the perfcheck run should now pass on TravisCI. There ended up being quite a bit of churn so another review from @oplehto would be helpful. |
Looks good! Tested it successfully on my system. Very useful to learn about the existence of the benchcheck utility. To speed up testing I think we can drop Go 1.8.x at this time. What do you think? |
+1 for dropping 1.8 |
Closes #33.