Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLAS performance tests #3566

Closed
ViralBShah opened this issue Jun 28, 2013 · 11 comments
Closed

BLAS performance tests #3566

ViralBShah opened this issue Jun 28, 2013 · 11 comments
Labels
performance Must go faster

Comments

@ViralBShah
Copy link
Member

We should have a set of BLAS performance tests that test various problems of various sizes with different numbers of threads. We can then compare performance of different BLAS libraries on different platforms.

@lindahua
Copy link
Contributor

This should also help to decide a size threshold -- when the sizes of matrices are below this threshold, we should fallback to hand crafted light weight implementation.

@ViralBShah
Copy link
Member Author

This would be useful to help make tuning decisions in openblas (OpenMathLib/OpenBLAS#103)

@blakejohnson
Copy link
Contributor

The BLAS performance tests are already showing something interesting. @staticfloat, what is the sysblas on criid? It seems to be at least 50% faster than openblas on the level-1 tests. At level-2, openblas finally is faster at the "large" and "huge" sizes of gemv. At level-3, the differences are pretty minimal.

@staticfloat
Copy link
Member

sysblas is the flavor of julia built against the system-provided BLAS. On a Debian-based system, that means whatever is providing /usr/lib/libblas.so.3. In most cases, that will be the reference blas, or perhaps ATLAS if the user has requested it. (In extremely rare cases, the user could have installed my libopenblas-base package, in which case this could be openblas!)

For the purposes of these benchmarks, sysblas is reference blas on Linux, and is Accelerate on OSX.

@blakejohnson
Copy link
Contributor

I see, so since 'criid' is an OSX machine, I'm looking at OpenBLAS vs Accelerate.

@staticfloat
Copy link
Member

Yep. Exactly. We definitely want to compare the BLAS implementations on various systems and see how they affect us on all our benchmarks.

@blakejohnson
Copy link
Contributor

I probably should have asked this before throwing a bunch of tests out there, but... What is the appropriate design for BLAS performance tests, in terms of how many times to loop an operation? I chose iteration counts such that each test would take >100ms on my machine, and scaled the counts inversely with the problem size to keep the runtime of each test within an order of magnitude of each other. It now occurs to me that this requires a little extra care in interpreting the results, because now the absolute time differences of the smaller problem sizes are 'levered up' by the large numbers of iterations.

Fortunately, one can still compare the relative difference between different BLAS implementations, because the percent differences should still accurately convey the performance gaps.

@ViralBShah
Copy link
Member Author

That was what I was going for to start with. This is also good enough to detect regressions over time.

@staticfloat
Copy link
Member

Indeed. I've also seen BLAS tests where the size of the problem is divided out so that the units being reported are no longer seconds, but rather FLOPS (Floating-Point Operations Per Second). I'm not sure we have a problem here yet, though. I still have yet to track down all the segfaults and bugs in codespeed that are preventing everything from working perfectly. :)

@oscardssmith
Copy link
Member

Can/should this be closed?

@ViralBShah
Copy link
Member Author

Closing this as too general and is somewhat addressed by all the benchmarking work in recent times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

5 participants