-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Julia now incompatible with Accelerate/vecLib #1322
Comments
I agree that we should want to use Accelerate. However, the right thing may be to only use BLAS from Accelerate, and download lapack. LAPACK still is undergoing active development, and it would be nice to have all our users on the same version of LAPACK. |
cc: @dmbates |
It would not be a big deal to revert to using I'm not sure why a comparison of results of |
|
@staticfloat Perhaps I misunderstood the purpose. A call to If Accelerate is not substantially faster than openblas on OS X is there a reason for using Accelerate? I am not opposed to Accelerate - I'm just asking. |
Honestly, I don't know if there is any speed difference between the two; as I have experience building Julia on OSX, I was going to try and arrange a Linera Algebra shootout, but then discovered that there was this incompatibility that crept in a few weeks ago. I think my last sentence above was written poorly, it should have been written: "I propose not littering Julia source with #ifdefs to retain compatibility with two versions of LAPACK, therefore I propose hanging Accelerate out to dry". However, I am going to hold off on saying that until we can prove that OpenBLAS is at least comparable to Accelerate in terms of performance. @dmbates, do you have a list of lapack functions that I can run that we should concern ourselves with, performance-wise? I'll do the shootout if you tell me how. |
There are several Lapack-based decompositions and operations tested in test/lapack.jl. It would not be worthwhile timing these operations on 10 by 10 matrices as they don't take sufficient time to provide meaningful results. Perhaps by bumping the sizes to 1000 by 1000 for square matrices and 10000 by 100 for rectangular it would become more meaningful. |
Alright. I'll take a look at this tonight, see if I can put something On Mon, Oct 1, 2012 at 10:39 AM, dmbates notifications@github.com wrote:
|
On my machine at least (the "official" benchmark machine) , Accelerate and MKL (via Matlab) used to both be significantly faster than OpenBLAS, but recent versions of OpenBLAS have closed and even reversed that gap — and will probably keep getting better. I think it's desirable to continue to support Accelerate easily, however, if it's not too difficult, which it doesn't seem to be. |
@StefanKarpinkski I know that there has been discussion previously about how difficult making the entire performance suite run on different machines would be, but for the specific test of LAPACK/BLAS performance, how did you put that together? |
Not sure what you mean — which LAPACK/BLAS performance test? I was mainly just talking about DGEMM performance, as measured by the rand_mat_mul benchmark, although OpenBLAS performance has also improved on other primitives too. |
Oh, I assumed that when you said "Accelerate and MKL (via Matlab) used to both be significantly faster than OpenBLAS" that you had some method of testing more than just |
It was kind of an anecdotal observation, definitely not broadly tested, although test/LAPACK.jl is probably a decent performance suite as well, although the problems may not be big enough. |
OpenBLAS was slower than Accelerate, because it did not have Sandybridge support until recently. In general, openblas will always be slower on newer architectures as they catch up, while MKL will have support for new architectures from day one. Much of LAPACK performance depends on DGEMM. If we were to set up a benchmark, I would try, Accelerate (BLAS + LAPACK) vs. OpenBLAS (BLAS + LAPACK) vs. Accelerate BLAS + Netlib LAPACK. My guess is that the performance difference by using Accelerate or MKL LAPACK is not noticeable, perhaps except in a few cases. We can even check the Apple opensource repositories to see if they have changed anything in LAPACK. What would be interesting to see is performance of matrices with weird aspect ratios (tall and skinny, short and fat), etc. BLAS performance varies dramatically. Also try decompositions of weird shapes. |
@dmbates; Would it be possible for you to point me in the right direction for substituting |
This will change after @ViralBShah commits his reorganization of the linear algebra code. For the moment, the source file base/lapack.jl has for (syconv, syev, sysv, sytrf, sytri, sytrs, elty) in
((:dsyconv_,:dsyev_,:dsysv_,:dsytrf_,:dsytri2_,:dsytrs_,:Float64),
(:ssyconv_,:ssyev_,:ssysv_,:ssytrf_,:ssytri2_,:ssytrs_,:Float32),
(:zheconv_,:zheev_,:zhesv_,:zhetrf_,:zhetri2_,:zhetrs_,:Complex128),
(:checonv_,:cheev_,:chesv_,:chetrf_,:chetri2_,:chetrs_,:Complex64))
@eval begin in the (SY) section. Just delete the 2's in those symbols. @ViralBShah I'll let you decide if this should be changed in the master branch or left as is so as not to complicate your merging your branch. |
Thanks - I can make that change on my branch. Will this have a performance impact? -viral On 02-Oct-2012, at 8:33 PM, dmbates wrote:
|
@ViralBShah I doubt there will be a noticeable performance impact. The only time that dsytri2 will be substantially faster than dsytri is when taking the inverse of a large symmetric non-positive-definite matrix, not an operation that is done frequently. Statisticians do use the inverse of a positive-definite symmetric matrices in calculating the variance-covariance of parameter estimates in linear models, etc. but that is done with dpotri. |
I think that the argument lwork is only in xsytri2 and hence the fix is slightly more than deleting the 2's. |
@andreasnoackjensen You're correct. The function should be changed to function sytri!(uplo::LapackChar, A::StridedMatrix{$elty}, ipiv::Vector{Int32})
chkstride1(A)
chksquare(A)
n = size(A,1)
work = Array($elty, n)
info = Array(Int32, 1)
ccall(dlsym(Base.liblapack, $(string(sytri))), Void,
(Ptr{Uint8}, Ptr{Int32}, Ptr{$elty}, Ptr{Int32},
Ptr{Int32}, Ptr{$elty}, Ptr{Int32}, Ptr{Int32}),
&uplo, &n, A, &stride(A,2), ipiv, work, info)
if info[1] != 0 throw(LapackException(info[1])) end
A
end In other words, remove |
Thanks @dmbates. Exactly what I was looking for. To anyone else On Tue, Oct 2, 2012 at 9:31 AM, dmbates notifications@github.com wrote:
|
@staticfloat Right again. Thanks for catching that. |
When I specify
USE_SYSTEM_LAPACK=1 USE_SYSTEM_BLAS=1
on my make commandline, everything compiles fine but when I runmake test
I get the now-familiar:on the
lapack
step. This is becausedsytri2
does not exist in the version of LAPACK that is in Accelerate. (I'm running 10.8.2, so I doubt any other versions of OSX have this in there either).For a quick-and-dirty performance comparison, I ran
mean([peakflops() for x=1:10])
on a Julia compile with OpenBLAS:and a Julia compile with Accelerate:
There's no appreciable difference, in my opinion, so for the sake of focused effort, I propose not littering Julia source with #ifdef's and hanging Accelerate out to dry.
The text was updated successfully, but these errors were encountered: