-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significant performance gap in assigning to a subsection of an array with and without broadcasting #40962
Comments
Well, the improved performance from replacing If we replace separated iteration with Some benchmark: a = rand(40,40); b = rand(40,40);
@btime $a[:,:] = $b # 1.330 μs (0 allocations: 0 bytes)
@btime $a[1:end,1:end] = $b # 1.290 μs (0 allocations: 0 bytes)
@btime $a[:,:] .= $b; # 385.859 ns (0 allocations: 0 bytes)
@btime $a[1:end,1:end] .= $b # 3.056 μs (0 allocations: 0 bytes) fall into a slow banch
a′ = @view a[1:end,1:end]
@btime zip_copyto!($a′, $b); # 1.480 μs (0 allocations: 0 bytes) much faster, almost as fast as setindex!
@btime simd_copyto!($a′, $b); # 234.332 ns (0 allocations: 0 bytes) fastest, I believe that the general broadcast has a close speed I tried to implement a faster |
Is this the same issue? u = rand(UInt, 10_000); lo = 1; hi = 10_000; u_min = 1729;
fast(u, lo, hi, u_min) = (@inbounds for i in lo:hi u[i] -= u_min end; u)
slow(u, lo, hi, u_min) = (@inbounds u[lo:hi] .-= u_min; u)
@belapsed fast($u, $lo, $hi, $u_min) # 1.1795e-6
@belapsed slow($u, $lo, $hi, $u_min) # 7.82e-6 |
No, your example is #43153 |
Interestingly, the performance is reversed for vectors, in which case it's faster to broadcast: julia> A = zeros(1000); B = rand(1000);
julia> @btime $A[1:end] = @view $B[1:end];
734.116 ns (0 allocations: 0 bytes)
julia> @btime $A[1:end] .= @view $B[1:end];
122.402 ns (0 allocations: 0 bytes)
julia> @btime $A[:] = @view $B[:];
1.014 μs (0 allocations: 0 bytes)
julia> @btime $A[:] .= @view $B[:];
123.959 ns (0 allocations: 0 bytes)
julia> @btime $A[:] = @view $B[:];
103.745 ns (0 allocations: 0 bytes)
julia> @btime $A[:] .= @view $B[:];
113.488 ns (0 allocations: 0 bytes) and for the matrices in the original post: julia> @btime $a[:,:] .= $b;
753.387 ns (0 allocations: 0 bytes)
julia> @btime $a[:,:] = $b;
618.081 ns (0 allocations: 0 bytes) |
Well, function setindex!(A::Array, X::AbstractArray, I::AbstractVector{Int})
@_propagate_inbounds_meta
@boundscheck setindex_shape_check(X, length(I))
require_one_based_indexing(X)
X′ = unalias(A, X)
I′ = unalias(A, I)
count = 1
for i in I′
@inbounds x = X′[count]
A[i] = x
count += 1
end
return A
end apparently the boundcheck of |
As noted by @N5N3 in #40962 (comment), the bounds-check on the elementwise `setindex!` prevents vectorization. Explicitly performing the bounds-check and marking the `setindex!` as `@inbounds` speeds up the operation. ```julia julia> A = zeros(1000); B = rand(1000); julia> @Btime $A[1:end] = @view $B[1:end]; 689.940 ns (0 allocations: 0 bytes) # master 97.629 ns (0 allocations: 0 bytes) # PR ```
…3383) With this, the following (and equivalent calls) work: ```julia julia> copyto!(view(zeros(BigInt, 2), 1:2), Vector{BigInt}(undef,2)) 2-element view(::Vector{BigInt}, 1:2) with eltype BigInt: #undef #undef julia> copyto!(view(zeros(BigInt, 2), 1:2), view(Vector{BigInt}(undef,2), 1:2)) 2-element view(::Vector{BigInt}, 1:2) with eltype BigInt: #undef #undef ``` Close #53098. With this, all the `_unsetindex!` branches in `copyto_unaliased!` work for `Array`-views, and this makes certain indexing operations vectorize and speed-up: ```julia julia> using BenchmarkTools julia> a = view(rand(100,100), 1:100, 1:100); b = view(similar(a), axes(a)...); julia> @Btime copyto!($b, $a); 16.427 μs (0 allocations: 0 bytes) # master 2.308 μs (0 allocations: 0 bytes) # PR ``` Improves (but doesn't resolve) #40962 and #53158 ```julia julia> a = rand(40,40); b = rand(40,40); julia> @Btime $a[1:end,1:end] .= $b; 5.383 μs (0 allocations: 0 bytes) # v"1.12.0-DEV.16" 3.194 μs (0 allocations: 0 bytes) # PR ``` ƒ Co-authored-by: Jameson Nash <vtjnash@gmail.com>
…liaLang#53383) With this, the following (and equivalent calls) work: ```julia julia> copyto!(view(zeros(BigInt, 2), 1:2), Vector{BigInt}(undef,2)) 2-element view(::Vector{BigInt}, 1:2) with eltype BigInt: #undef #undef julia> copyto!(view(zeros(BigInt, 2), 1:2), view(Vector{BigInt}(undef,2), 1:2)) 2-element view(::Vector{BigInt}, 1:2) with eltype BigInt: #undef #undef ``` Close JuliaLang#53098. With this, all the `_unsetindex!` branches in `copyto_unaliased!` work for `Array`-views, and this makes certain indexing operations vectorize and speed-up: ```julia julia> using BenchmarkTools julia> a = view(rand(100,100), 1:100, 1:100); b = view(similar(a), axes(a)...); julia> @Btime copyto!($b, $a); 16.427 μs (0 allocations: 0 bytes) # master 2.308 μs (0 allocations: 0 bytes) # PR ``` Improves (but doesn't resolve) JuliaLang#40962 and JuliaLang#53158 ```julia julia> a = rand(40,40); b = rand(40,40); julia> @Btime $a[1:end,1:end] .= $b; 5.383 μs (0 allocations: 0 bytes) # v"1.12.0-DEV.16" 3.194 μs (0 allocations: 0 bytes) # PR ``` ƒ Co-authored-by: Jameson Nash <vtjnash@gmail.com>
…liaLang#53383) With this, the following (and equivalent calls) work: ```julia julia> copyto!(view(zeros(BigInt, 2), 1:2), Vector{BigInt}(undef,2)) 2-element view(::Vector{BigInt}, 1:2) with eltype BigInt: #undef #undef julia> copyto!(view(zeros(BigInt, 2), 1:2), view(Vector{BigInt}(undef,2), 1:2)) 2-element view(::Vector{BigInt}, 1:2) with eltype BigInt: #undef #undef ``` Close JuliaLang#53098. With this, all the `_unsetindex!` branches in `copyto_unaliased!` work for `Array`-views, and this makes certain indexing operations vectorize and speed-up: ```julia julia> using BenchmarkTools julia> a = view(rand(100,100), 1:100, 1:100); b = view(similar(a), axes(a)...); julia> @Btime copyto!($b, $a); 16.427 μs (0 allocations: 0 bytes) # master 2.308 μs (0 allocations: 0 bytes) # PR ``` Improves (but doesn't resolve) JuliaLang#40962 and JuliaLang#53158 ```julia julia> a = rand(40,40); b = rand(40,40); julia> @Btime $a[1:end,1:end] .= $b; 5.383 μs (0 allocations: 0 bytes) # v"1.12.0-DEV.16" 3.194 μs (0 allocations: 0 bytes) # PR ``` ƒ Co-authored-by: Jameson Nash <vtjnash@gmail.com>
…3383) With this, the following (and equivalent calls) work: ```julia julia> copyto!(view(zeros(BigInt, 2), 1:2), Vector{BigInt}(undef,2)) 2-element view(::Vector{BigInt}, 1:2) with eltype BigInt: #undef #undef julia> copyto!(view(zeros(BigInt, 2), 1:2), view(Vector{BigInt}(undef,2), 1:2)) 2-element view(::Vector{BigInt}, 1:2) with eltype BigInt: #undef #undef ``` Close #53098. With this, all the `_unsetindex!` branches in `copyto_unaliased!` work for `Array`-views, and this makes certain indexing operations vectorize and speed-up: ```julia julia> using BenchmarkTools julia> a = view(rand(100,100), 1:100, 1:100); b = view(similar(a), axes(a)...); julia> @Btime copyto!($b, $a); 16.427 μs (0 allocations: 0 bytes) # master 2.308 μs (0 allocations: 0 bytes) # PR ``` Improves (but doesn't resolve) #40962 and #53158 ```julia julia> a = rand(40,40); b = rand(40,40); julia> @Btime $a[1:end,1:end] .= $b; 5.383 μs (0 allocations: 0 bytes) # v"1.12.0-DEV.16" 3.194 μs (0 allocations: 0 bytes) # PR ``` ƒ Co-authored-by: Jameson Nash <vtjnash@gmail.com> (cherry picked from commit 1a90409)
Presumably, the destination is a
Base.SlowSubArray
for the broadcasted assignment.Interestingly, the performance is reversed if the index ranges are replaced by colons. (in which case the destination is a
FastContiguousSubArray
)The text was updated successfully, but these errors were encountered: