BLAS.scal! only supports StridedVectors #141

jiahao · 2014-09-19T17:16:50Z

As reported in julia-dev, this call to LinAlg.BLAS.scal! works but produces unexpected behavior:

julia> A=ones(4,4);B = sub(A,1:2,3:4); BLAS.scal!(length(B),10.0,B,1); A #should have A[1:2,3:4] .== 10
4x4 Array{Float64,2}:
 1.0  1.0  10.0  1.0
 1.0  1.0  10.0  1.0
 1.0  1.0  10.0  1.0
 1.0  1.0  10.0  1.0

It looks like the _scal BLAS function doesn't really support strided matrices; it's only meant to be used with strided vectors. The issue here is that the function handles only one stride in one dimension and so blithely ignores all the stride information wrapped into StridedArray, and only uses the stride passed by the argument incx.

Would there be any reason to pass scal! a StridedVector and a value of incx not equal to its stride?

If not, there are two reasonable solutions:

Change DX::Union(Ptr{$elty},StridedArray{$elty}) to remove StridedArray entirely.
Change DX::Union(Ptr{$elty},StridedArray{$elty}) to DX::Union(Ptr{$elty},StridedVector{$elty}) and remove incx.

Otherwise yet a third possibility is Option 2, but make incx default to the stride of DX.

The text was updated successfully, but these errors were encountered:

nwh · 2014-09-19T17:49:51Z

It looks like dot! and axpy! will have the same issue because they accept StridedArrays as well.

As for option 2: we can't get a stride from a Ptr so incx is a required parameter.

It seems difficult to nicely mix Ptr with specialized array types in the same method.

andreasnoack · 2014-09-19T18:02:00Z

It used to be that there functions were defined for Union(Ptr,Array) and hence you could be sure that the memory layout was contiguous. I needed one of these functions for a non-contiguous array view an extended the definition to Union(Ptr,StridedArray). This didn't restrict the old use and it allowed my use case, but it opened up for the problem you described.

As @timholy noted, most users shouldn't use the BLAS functions because most of the BLAS functionality is provided through other safe Julia function, .e.g dot, scale!,*, \. However, I think it would be a good idea to separate out the pointer versions and do checks of the input arguments in the non-pointer versions to avoid as many accidents as possible.

Jutho · 2014-09-19T18:47:01Z

Restricting the argument of scal! to StridedVector would make it no longer accept a standard Array, which is in the end what happens when you call scale! on an Array{T,N}. However, there is a simple workaround to update

        BLAS.scal!(length(X), s, X, 1)

to

        BLAS.scal!(length(X), s, vec(X), 1)

on line 14 of linalg/dense.jl.

Another possibility is to change the argument to Union(StridedVector,Array) .

StefanKarpinski · 2014-09-19T21:33:31Z

Is BLAS really going to be that much faster than Julia here? Would it be better just to write generic Julia code that does this scaling operation? In the dense and sparse cases, it ought to be quite fast.

Jutho · 2014-09-19T21:53:06Z

I find this for large vectors

julia> A=randn(10^9);

julia> @time Base.LinAlg.generic_scale!(A,3);
elapsed time: 1.196689975 seconds (80 bytes allocated)

julia> @time Base.LinAlg.generic_scale!(A,3);
elapsed time: 0.885319205 seconds (80 bytes allocated)

julia> @time Base.LinAlg.generic_scale!(A,3);
elapsed time: 0.915600821 seconds (80 bytes allocated)

julia> @time Base.LinAlg.generic_scale!(A,3);
elapsed time: 0.920074259 seconds (80 bytes allocated)

julia> @time Base.LinAlg.scale!(A,3);
elapsed time: 0.877416718 seconds (80 bytes allocated)

julia> @time Base.LinAlg.scale!(A,3);
elapsed time: 0.855078435 seconds (80 bytes allocated)

and this for small vectors

julia> A=randn(10);
julia> @time for i=1:1000000;Base.LinAlg.generic_scale!(A,3);end
elapsed time: 0.025317606 seconds (0 bytes allocated)

julia> @time for i=1:1000000;Base.LinAlg.generic_scale!(A,3);end
elapsed time: 0.025013646 seconds (0 bytes allocated)

julia> @time for i=1:1000000;Base.LinAlg.generic_scale!(A,3);end
elapsed time: 0.024841576 seconds (0 bytes allocated)

julia> @time for i=1:1000000;Base.LinAlg.scale!(A,3);end
elapsed time: 0.025048104 seconds (0 bytes allocated)

julia> @time for i=1:1000000;Base.LinAlg.scale!(A,3);end
elapsed time: 0.026964141 seconds (0 bytes allocated)

julia> @time for i=1:1000000;Base.LinAlg.scale!(A,3);end
elapsed time: 0.026892926 seconds (0 bytes allocated)

Seems like there is indeed no point in using BLAS for this.

nwh · 2014-09-23T17:10:52Z

How about the following solution:

change primary scal! method back to DX::Union(Ptr{$elty},Array{$elty})
add scal! method that takes DX::Union(StridedVector{$elty},Array{$elty}) and passes a Ptr to primary scal!
add scal method that performs a copy.

Features:

works nicely with dense arrays
secondary methods work nicely with StridedVectors

Possible issue:

will there be a case when stride(A,1) != 1 where A is an Array?

It would look like this:

## scal
for (fname, elty) in ((:dscal_,:Float64),
                      (:sscal_,:Float32),
                      (:zscal_,:Complex128),
                      (:cscal_,:Complex64))
    @eval begin
        # SUBROUTINE DSCAL(N,DA,DX,INCX)
        function scal!(n::Integer, DA::$elty, DX::Union(Ptr{$elty},Array{$elty}), incx::Integer)
            ccall(($(string(fname)),libblas), Void,
                  (Ptr{BlasInt}, Ptr{$elty}, Ptr{$elty}, Ptr{BlasInt}),
                  &n, &DA, DX, &incx)
            DX
        end
        function scal!(DA::$elty, DX::Union(StridedVector{$elty},Array{$elty}))
            scal!(length(DX), DA, pointer(DX), stride(DX,1))
        end
        function scal(DA::$elty, DX::Union(StridedVector{$elty},Array{$elty}))
            scal!(DA,copy(DX))
        end
    end
end

simonster · 2014-09-23T18:17:57Z

@Jutho For medium-sized to large arrays there is a non-negligible speed difference for me:

julia> A=randn(10^9);

julia> @time Base.LinAlg.generic_scale!(A,3);
elapsed time: 0.870733492 seconds (80 bytes allocated)

julia> @time Base.LinAlg.generic_scale!(A,3);
elapsed time: 0.853314168 seconds (80 bytes allocated)

julia> @time Base.LinAlg.generic_scale!(A,3);
elapsed time: 0.856629254 seconds (80 bytes allocated)

julia> @time Base.LinAlg.scale!(A,3);
elapsed time: 0.50168307 seconds (80 bytes allocated)

julia> @time Base.LinAlg.scale!(A,3);
elapsed time: 0.517095924 seconds (80 bytes allocated)

julia> A=randn(2^14);

julia> @time for i=1:100000;Base.LinAlg.generic_scale!(A,3);end
elapsed time: 0.76773478 seconds (0 bytes allocated)

julia> @time for i=1:100000;Base.LinAlg.generic_scale!(A,3);end
elapsed time: 0.771294171 seconds (0 bytes allocated)

julia> @time for i=1:100000;Base.LinAlg.generic_scale!(A,3);end
elapsed time: 0.76905031 seconds (0 bytes allocated)

julia> @time for i=1:100000;Base.LinAlg.scale!(A,3);end
elapsed time: 0.332148663 seconds (0 bytes allocated)

julia> @time for i=1:100000;Base.LinAlg.scale!(A,3);end
elapsed time: 0.32993218 seconds (0 bytes allocated)

julia> @time for i=1:100000;Base.LinAlg.scale!(A,3);end
elapsed time: 0.333357126 seconds (0 bytes allocated)

It's smaller for the large array if I disable_threaded_libs() but still there.

simonster · 2014-09-24T00:08:22Z

Unfortunately JuliaLang/julia#8452 doesn't seem to have made a difference here. What does help is defining:

function generic_scale!(X::AbstractArray, s::Number)
    for i = 1:length(X)
        @inbounds X[i] = X[i]*s
    end
    X
end

Apparently we can generate more efficient code if we know that the input and output are the same array pointer, although there's still a gap:

julia> A=randn(10^9);

julia> @time Base.LinAlg.generic_scale!(A,3);
elapsed time: 0.758208847 seconds (80 bytes allocated)

julia> @time Base.LinAlg.generic_scale!(A,3);
elapsed time: 0.757586562 seconds (80 bytes allocated)

julia> A=randn(2^14);

julia> @time for i=1:100000;Base.LinAlg.generic_scale!(A,3);end
elapsed time: 0.420178572 seconds (0 bytes allocated)

julia> @time for i=1:100000;Base.LinAlg.generic_scale!(A,3);end
elapsed time: 0.421258876 seconds (0 bytes allocated)

For a fairer comparison, with disable_threaded_libs():

julia> A=randn(10^9);

julia> @time Base.LinAlg.scale!(A,3);
elapsed time: 0.690413984 seconds (80 bytes allocated)

julia> @time Base.LinAlg.scale!(A,3);
elapsed time: 0.701861179 seconds (80 bytes allocated)

julia> A=randn(2^14);

julia> @time for i=1:100000;Base.LinAlg.scale!(A,3);end
elapsed time: 0.325523797 seconds (0 bytes allocated)

julia> @time for i=1:100000;Base.LinAlg.scale!(A,3);end
elapsed time: 0.31559956 seconds (0 bytes allocated)

The difference is smaller, but there still is one. It might be interesting to try this with LLVM SVN.

JeffBezanson · 2014-09-24T00:20:27Z

We should add that definition. We're handicapping ourselves against BLAS here.

andreasnoack · 2014-09-24T01:21:21Z

The difference between the generic version and BLAS varies much between machines. On my MacBook

julia> @time Base.LinAlg.generic_scale!(A,3);
elapsed time: 0.966372654 seconds (80 bytes allocated)

julia> @time Base.LinAlg.scale!(A,3);
elapsed time: 0.777787976 seconds (80 bytes allocated)

and on julia.mit.edu

julia> @time Base.LinAlg.generic_scale!(A,3);
elapsed time: 2.947719677 seconds (80 bytes allocated)

julia> @time Base.LinAlg.scale!(A,3);
elapsed time: 0.785620924 seconds (80 bytes allocated)

nwh · 2014-09-26T18:19:52Z

@andreasnoack How would the solution I proposed above affect your use case for StridedArray? I'd be happy to make the changes and submit a pull request. Cheers!

andreasnoack · 2014-09-26T18:30:45Z

Your proposal is good. Please open a pull request. I guess that it is only necessary to define the "second" method for StridedVector as the "first" method covers Array, right?

nwh · 2014-09-26T19:51:53Z

The second and third methods:

        function scal!(DA::$elty, DX::Union(StridedVector{$elty},Array{$elty}))
            scal!(length(DX), DA, pointer(DX), stride(DX,1))
        end
        function scal(DA::$elty, DX::Union(StridedVector{$elty},Array{$elty}))
            scal!(DA,copy(DX))
        end

compute the length and stride from the inputs. Array is present in case somebody wants to scale all elements of an entire (possibly multi-dimensional) contiguous array.

Question: is it always true that stride(A,1) == 1 when A is an Array?

andreasnoack · 2014-09-26T20:33:59Z

Yes.

nwh · 2014-11-25T06:18:20Z

@jiahao this issue can be closed, a fix was merged: JuliaLang/julia#9141

jiahao added bug Something isn't working linear algebra labels Sep 19, 2014

JeffBezanson mentioned this issue Sep 23, 2014

Enable 64-bit vectorization JuliaLang/julia#8452

Merged

nwh mentioned this issue Sep 27, 2014

WIP: cleanup blas.jl scal! JuliaLang/julia#8499

Closed

nwh mentioned this issue Nov 24, 2014

Blas scal cleanup JuliaLang/julia#9141

Merged

jebej mentioned this issue Mar 28, 2017

Indexed assignment not defined for Hermitian matrices [0.5] #416

Closed

KristofferC transferred this issue from JuliaLang/julia Nov 26, 2024

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BLAS.scal! only supports StridedVectors #141

BLAS.scal! only supports StridedVectors #141

jiahao commented Sep 19, 2014

nwh commented Sep 19, 2014

andreasnoack commented Sep 19, 2014

Jutho commented Sep 19, 2014

StefanKarpinski commented Sep 19, 2014

Jutho commented Sep 19, 2014

nwh commented Sep 23, 2014

simonster commented Sep 23, 2014

simonster commented Sep 24, 2014

JeffBezanson commented Sep 24, 2014

andreasnoack commented Sep 24, 2014

nwh commented Sep 26, 2014

andreasnoack commented Sep 26, 2014

nwh commented Sep 26, 2014

andreasnoack commented Sep 26, 2014

nwh commented Nov 25, 2014

BLAS.scal! only supports StridedVectors #141

BLAS.scal! only supports StridedVectors #141

Comments

jiahao commented Sep 19, 2014

nwh commented Sep 19, 2014

andreasnoack commented Sep 19, 2014

Jutho commented Sep 19, 2014

StefanKarpinski commented Sep 19, 2014

Jutho commented Sep 19, 2014

nwh commented Sep 23, 2014

simonster commented Sep 23, 2014

simonster commented Sep 24, 2014

JeffBezanson commented Sep 24, 2014

andreasnoack commented Sep 24, 2014

nwh commented Sep 26, 2014

andreasnoack commented Sep 26, 2014

nwh commented Sep 26, 2014

andreasnoack commented Sep 26, 2014

nwh commented Nov 25, 2014