-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BLAS.scal! only supports StridedVectors #141
Comments
It looks like As for option 2: we can't get a stride from a It seems difficult to nicely mix |
It used to be that there functions were defined for As @timholy noted, most users shouldn't use the BLAS functions because most of the BLAS functionality is provided through other safe Julia function, .e.g |
Restricting the argument of BLAS.scal!(length(X), s, X, 1) to BLAS.scal!(length(X), s, vec(X), 1) on line 14 of linalg/dense.jl. Another possibility is to change the argument to |
Is BLAS really going to be that much faster than Julia here? Would it be better just to write generic Julia code that does this scaling operation? In the dense and sparse cases, it ought to be quite fast. |
I find this for large vectors julia> A=randn(10^9);
julia> @time Base.LinAlg.generic_scale!(A,3);
elapsed time: 1.196689975 seconds (80 bytes allocated)
julia> @time Base.LinAlg.generic_scale!(A,3);
elapsed time: 0.885319205 seconds (80 bytes allocated)
julia> @time Base.LinAlg.generic_scale!(A,3);
elapsed time: 0.915600821 seconds (80 bytes allocated)
julia> @time Base.LinAlg.generic_scale!(A,3);
elapsed time: 0.920074259 seconds (80 bytes allocated)
julia> @time Base.LinAlg.scale!(A,3);
elapsed time: 0.877416718 seconds (80 bytes allocated)
julia> @time Base.LinAlg.scale!(A,3);
elapsed time: 0.855078435 seconds (80 bytes allocated) and this for small vectors julia> A=randn(10);
julia> @time for i=1:1000000;Base.LinAlg.generic_scale!(A,3);end
elapsed time: 0.025317606 seconds (0 bytes allocated)
julia> @time for i=1:1000000;Base.LinAlg.generic_scale!(A,3);end
elapsed time: 0.025013646 seconds (0 bytes allocated)
julia> @time for i=1:1000000;Base.LinAlg.generic_scale!(A,3);end
elapsed time: 0.024841576 seconds (0 bytes allocated)
julia> @time for i=1:1000000;Base.LinAlg.scale!(A,3);end
elapsed time: 0.025048104 seconds (0 bytes allocated)
julia> @time for i=1:1000000;Base.LinAlg.scale!(A,3);end
elapsed time: 0.026964141 seconds (0 bytes allocated)
julia> @time for i=1:1000000;Base.LinAlg.scale!(A,3);end
elapsed time: 0.026892926 seconds (0 bytes allocated) Seems like there is indeed no point in using BLAS for this. |
How about the following solution:
Features:
Possible issue:
It would look like this: ## scal
for (fname, elty) in ((:dscal_,:Float64),
(:sscal_,:Float32),
(:zscal_,:Complex128),
(:cscal_,:Complex64))
@eval begin
# SUBROUTINE DSCAL(N,DA,DX,INCX)
function scal!(n::Integer, DA::$elty, DX::Union(Ptr{$elty},Array{$elty}), incx::Integer)
ccall(($(string(fname)),libblas), Void,
(Ptr{BlasInt}, Ptr{$elty}, Ptr{$elty}, Ptr{BlasInt}),
&n, &DA, DX, &incx)
DX
end
function scal!(DA::$elty, DX::Union(StridedVector{$elty},Array{$elty}))
scal!(length(DX), DA, pointer(DX), stride(DX,1))
end
function scal(DA::$elty, DX::Union(StridedVector{$elty},Array{$elty}))
scal!(DA,copy(DX))
end
end
end |
@Jutho For medium-sized to large arrays there is a non-negligible speed difference for me: julia> A=randn(10^9);
julia> @time Base.LinAlg.generic_scale!(A,3);
elapsed time: 0.870733492 seconds (80 bytes allocated)
julia> @time Base.LinAlg.generic_scale!(A,3);
elapsed time: 0.853314168 seconds (80 bytes allocated)
julia> @time Base.LinAlg.generic_scale!(A,3);
elapsed time: 0.856629254 seconds (80 bytes allocated)
julia> @time Base.LinAlg.scale!(A,3);
elapsed time: 0.50168307 seconds (80 bytes allocated)
julia> @time Base.LinAlg.scale!(A,3);
elapsed time: 0.517095924 seconds (80 bytes allocated)
julia> A=randn(2^14);
julia> @time for i=1:100000;Base.LinAlg.generic_scale!(A,3);end
elapsed time: 0.76773478 seconds (0 bytes allocated)
julia> @time for i=1:100000;Base.LinAlg.generic_scale!(A,3);end
elapsed time: 0.771294171 seconds (0 bytes allocated)
julia> @time for i=1:100000;Base.LinAlg.generic_scale!(A,3);end
elapsed time: 0.76905031 seconds (0 bytes allocated)
julia> @time for i=1:100000;Base.LinAlg.scale!(A,3);end
elapsed time: 0.332148663 seconds (0 bytes allocated)
julia> @time for i=1:100000;Base.LinAlg.scale!(A,3);end
elapsed time: 0.32993218 seconds (0 bytes allocated)
julia> @time for i=1:100000;Base.LinAlg.scale!(A,3);end
elapsed time: 0.333357126 seconds (0 bytes allocated) It's smaller for the large array if I |
Unfortunately JuliaLang/julia#8452 doesn't seem to have made a difference here. What does help is defining: function generic_scale!(X::AbstractArray, s::Number)
for i = 1:length(X)
@inbounds X[i] = X[i]*s
end
X
end Apparently we can generate more efficient code if we know that the input and output are the same array pointer, although there's still a gap: julia> A=randn(10^9);
julia> @time Base.LinAlg.generic_scale!(A,3);
elapsed time: 0.758208847 seconds (80 bytes allocated)
julia> @time Base.LinAlg.generic_scale!(A,3);
elapsed time: 0.757586562 seconds (80 bytes allocated)
julia> A=randn(2^14);
julia> @time for i=1:100000;Base.LinAlg.generic_scale!(A,3);end
elapsed time: 0.420178572 seconds (0 bytes allocated)
julia> @time for i=1:100000;Base.LinAlg.generic_scale!(A,3);end
elapsed time: 0.421258876 seconds (0 bytes allocated) For a fairer comparison, with julia> A=randn(10^9);
julia> @time Base.LinAlg.scale!(A,3);
elapsed time: 0.690413984 seconds (80 bytes allocated)
julia> @time Base.LinAlg.scale!(A,3);
elapsed time: 0.701861179 seconds (80 bytes allocated)
julia> A=randn(2^14);
julia> @time for i=1:100000;Base.LinAlg.scale!(A,3);end
elapsed time: 0.325523797 seconds (0 bytes allocated)
julia> @time for i=1:100000;Base.LinAlg.scale!(A,3);end
elapsed time: 0.31559956 seconds (0 bytes allocated) The difference is smaller, but there still is one. It might be interesting to try this with LLVM SVN. |
We should add that definition. We're handicapping ourselves against BLAS here. |
The difference between the generic version and BLAS varies much between machines. On my MacBook
and on julia.mit.edu
|
@andreasnoack How would the solution I proposed above affect your use case for |
Your proposal is good. Please open a pull request. I guess that it is only necessary to define the "second" method for |
The second and third methods:
compute the length and stride from the inputs. Question: is it always true that |
Yes. |
@jiahao this issue can be closed, a fix was merged: JuliaLang/julia#9141 |
As reported in julia-dev, this call to
LinAlg.BLAS.scal!
works but produces unexpected behavior:It looks like the _scal BLAS function doesn't really support strided matrices; it's only meant to be used with strided vectors. The issue here is that the function handles only one stride in one dimension and so blithely ignores all the stride information wrapped into
StridedArray
, and only uses the stride passed by the argumentincx
.Would there be any reason to pass
scal!
aStridedVector
and a value ofincx
not equal to its stride?If not, there are two reasonable solutions:
DX::Union(Ptr{$elty},StridedArray{$elty})
to removeStridedArray
entirely.DX::Union(Ptr{$elty},StridedArray{$elty})
toDX::Union(Ptr{$elty},StridedVector{$elty})
and removeincx
.Otherwise yet a third possibility is Option 2, but make
incx
default to the stride ofDX
.The text was updated successfully, but these errors were encountered: