Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve linear indexing performance for FastSubArrays #45371

Merged
merged 6 commits into from
Dec 14, 2023

Conversation

jishnub
Copy link
Contributor

@jishnub jishnub commented May 19, 2022

This PR forwards AbstractUnitRange indices for FastSubArrays to the parent, making use of the fact that the parent might have efficient vector indexing methods defined.

Some benchmarks:

getindex benchmarking script
function getindexbench()
	a1 = rand(1000);
	b1 = @view a1[:]; # FastContiguousSubArray
	c1 = @view a1[eachindex(a1)]; # FastContiguousSubArray

	@info "views of a 1D array"
	@info "getindex with AbstractUnitRange"
	ax1 = eachindex(a1);
	@btime $a1[$ax1];
	@btime $b1[$ax1];
	@btime $c1[$ax1];

	@info "getindex with Colon"
	@btime $a1[:];
	@btime $b1[:];
	@btime $c1[:];

	a2 = rand(1000, 1000);
	b2 = @view a2[:, :]; # FastContiguousSubArray
	c2 = @view a2[eachindex(a2)]; # FastContiguousSubArray

	@info "views of a 2D array"
	@info "getindex with AbstractUnitRange"
	ax2 = eachindex(a2);
	@btime $a2[$ax2];
	@btime $b2[$ax2];
	@btime $c2[$ax2];

	@info "getindex with Colon"
	@btime $a2[:];
	@btime $b2[:];
	@btime $c2[:];

	return nothing
end

on master

julia> getindexbench()
[ Info: views of a 1D array
[ Info: getindex with AbstractUnitRange
  660.168 ns (1 allocation: 7.94 KiB)
  1.067 μs (1 allocation: 7.94 KiB)
  1.087 μs (1 allocation: 7.94 KiB)
[ Info: getindex with Colon
  660.567 ns (1 allocation: 7.94 KiB)
  1.353 μs (1 allocation: 7.94 KiB)
  1.083 μs (1 allocation: 7.94 KiB)
[ Info: views of a 2D array
[ Info: getindex with AbstractUnitRange
  476.558 μs (2 allocations: 7.63 MiB)
  1.171 ms (2 allocations: 7.63 MiB)
  1.134 ms (2 allocations: 7.63 MiB)
[ Info: getindex with Colon
  478.056 μs (2 allocations: 7.63 MiB)
  1.117 ms (2 allocations: 7.63 MiB)
  1.117 ms (2 allocations: 7.63 MiB)

this PR:

julia> getindexbench()
[ Info: views of a 1D array
[ Info: getindex with AbstractUnitRange
  679.193 ns (1 allocation: 7.94 KiB)
  695.424 ns (1 allocation: 7.94 KiB)
  691.766 ns (1 allocation: 7.94 KiB)
[ Info: getindex with Colon
  694.644 ns (1 allocation: 7.94 KiB)
  670.639 ns (1 allocation: 7.94 KiB)
  672.520 ns (1 allocation: 7.94 KiB)
[ Info: views of a 2D array
[ Info: getindex with AbstractUnitRange
  497.375 μs (2 allocations: 7.63 MiB)
  499.683 μs (2 allocations: 7.63 MiB)
  501.109 μs (2 allocations: 7.63 MiB)
[ Info: getindex with Colon
  500.335 μs (2 allocations: 7.63 MiB)
  500.966 μs (2 allocations: 7.63 MiB)
  504.430 μs (2 allocations: 7.63 MiB)
setindex! benchmarking script
function setindexbench()
	a1 = rand(1000);
	a12 = copy(a1);
	b1 = @view a1[:]; # FastContiguousSubArray
	c1 = @view a1[eachindex(a1)]; # FastContiguousSubArray
	d1 = @view a1[begin:1:end]; # FastSubArray

	@info "views of a 1D array"
	@info "setindex with AbstractUnitRange"
	ax1 = eachindex(a1);
	@btime $a1[$ax1] = $a12;
	@btime $b1[$ax1] = $a12;
	@btime $c1[$ax1] = $a12;
	@btime $d1[$ax1] = $a12;

	@info "setindex with Colon"
	@btime $a1[:] = $a12;
	@btime $b1[:] = $a12;
	@btime $c1[:] = $a12;
	@btime $d1[:] = $a12;

	a2 = rand(1000, 1000);
	a22 = copy(a2);
	a2v = vec(a22);
	b2 = @view a2[:, :]; # FastContiguousSubArray
	c2 = @view a2[eachindex(a2)]; # FastContiguousSubArray
	d2 = @view a2[begin:1:end]; # 1D FastSubArray

	@info "views of a 2D array"
	@info "setindex with AbstractUnitRange"
	ax2 = eachindex(a2);
	@btime $a2[$ax2] = $a2v;
	@btime $b2[$ax2] = $a2v;
	@btime $c2[$ax2] = $a2v;
	@btime $d2[$ax2] = $a2v;

	@info "setindex with Colon"
	@btime $a2[:] = $a2v;
	@btime $b2[:] = $a2v;
	@btime $c2[:] = $a2v;
	@btime $d2[:] = $a2v;

	return nothing
end

master

julia> setindexbench()
[ Info: views of a 1D array
[ Info: setindex with AbstractUnitRange
  67.272 ns (0 allocations: 0 bytes)
  3.367 μs (0 allocations: 0 bytes)
  3.368 μs (0 allocations: 0 bytes)
  3.136 μs (0 allocations: 0 bytes)
[ Info: setindex with Colon
  63.101 ns (0 allocations: 0 bytes)
  1.277 μs (0 allocations: 0 bytes)
  3.368 μs (0 allocations: 0 bytes)
  3.130 μs (0 allocations: 0 bytes)
[ Info: views of a 2D array
[ Info: setindex with AbstractUnitRange
  502.179 μs (0 allocations: 0 bytes)
  3.401 ms (0 allocations: 0 bytes)
  1.403 ms (0 allocations: 0 bytes)
  3.064 ms (0 allocations: 0 bytes)
[ Info: setindex with Colon
  515.148 μs (0 allocations: 0 bytes)
  3.396 ms (0 allocations: 0 bytes)
  3.395 ms (0 allocations: 0 bytes)
  1.292 ms (0 allocations: 0 bytes)

this PR:

julia> setindexbench()
[ Info: views of a 1D array
[ Info: setindex with AbstractUnitRange
  75.606 ns (0 allocations: 0 bytes)
  77.707 ns (0 allocations: 0 bytes)
  81.870 ns (0 allocations: 0 bytes)
  451.902 ns (0 allocations: 0 bytes)
[ Info: setindex with Colon
  74.344 ns (0 allocations: 0 bytes)
  80.618 ns (0 allocations: 0 bytes)
  81.760 ns (0 allocations: 0 bytes)
  481.208 ns (0 allocations: 0 bytes)
[ Info: views of a 2D array
[ Info: setindex with AbstractUnitRange
  502.375 μs (0 allocations: 0 bytes)
  487.665 μs (0 allocations: 0 bytes)
  488.242 μs (0 allocations: 0 bytes)
  960.377 μs (0 allocations: 0 bytes)
[ Info: setindex with Colon
  486.549 μs (0 allocations: 0 bytes)
  489.817 μs (0 allocations: 0 bytes)
  489.038 μs (0 allocations: 0 bytes)
  953.120 μs (0 allocations: 0 bytes)

@johnnychen94 johnnychen94 added performance Must go faster arrays [a, r, r, a, y, s] labels May 19, 2022
@jishnub
Copy link
Contributor Author

jishnub commented May 20, 2022

There are a lot of lines changes in 4232be0, but that's mainly to get the OffsetArrays test helper in sync with the repo.

@johnnychen94
Copy link
Member

johnnychen94 commented May 20, 2022

There are a lot of lines changes in 4232be0, but that's mainly to get the OffsetArrays test helper in sync with the repo.

I believe it's better to open a separate PR for this because people usually do squash-and-merge. We can merge the separated PR quickly since it's only a test dependency.

I left the version note here so we need to update it as well

# OffsetArrays v1.3.0

@jishnub
Copy link
Contributor Author

jishnub commented May 20, 2022

Sounds good, I'll roll this back and add the OffsetArrays tests as a separate PR.

@jishnub jishnub force-pushed the subarrayvectorindexing branch from 8230203 to 2313165 Compare May 20, 2022 05:47
@N5N3
Copy link
Member

N5N3 commented May 20, 2022

I'm suppressing that LLVM fails to vectorlize the copy part of general getindex/setindex! even for 1d case.
Maybe we can optimize them if getindex's dest (or setindex!'s src) has linear index.

@jishnub
Copy link
Contributor Author

jishnub commented Oct 31, 2023

Gentle bump. This performance issue exists on the current master (Version 1.11.0-DEV.775), and something like this would help with performance.

base/subarray.jl Outdated Show resolved Hide resolved
@ViralBShah
Copy link
Member

Seems nice to get in.

@jishnub jishnub marked this pull request as draft November 22, 2023 07:20
@jishnub jishnub force-pushed the subarrayvectorindexing branch from 2313165 to fcde993 Compare November 23, 2023 13:25
@jishnub jishnub marked this pull request as ready for review November 24, 2023 06:50
@N5N3 N5N3 added the merge me PR is reviewed. Merge when all tests are passing label Dec 13, 2023
@N5N3 N5N3 merged commit 5195da2 into JuliaLang:master Dec 14, 2023
3 checks passed
@N5N3 N5N3 removed the merge me PR is reviewed. Merge when all tests are passing label Dec 14, 2023
@jishnub jishnub deleted the subarrayvectorindexing branch December 14, 2023 12:39
vtjnash pushed a commit that referenced this pull request Feb 12, 2024
This will be an offset array in general, so we need to restrict the
method to 1-based ranges. This restores the behavior on `v"1.10"`.

The method was added in #45371
vtjnash pushed a commit that referenced this pull request Feb 13, 2024
)

It's sad that compiler can't do this automatically.
Some benchmark with `setindex!`:
```julia
julia> a = zeros(Int, 100, 100);
julia> @Btime $a[:,:] = $(1:10000);
  1.340 μs (0 allocations: 0 bytes) #master: 3.350 μs (0 allocations: 0 bytes)

julia> @Btime $a[:,:] = $(view(LinearIndices(a), 1:100, 1:100));
  10.000 μs (0 allocations: 0 bytes) #master: 11.000 μs (0 allocations: 0 bytes)
```

BTW optimization for `FastSubArray` introduced in #45371 still work
after this change as the parent array might have their own `copyto!`
optimization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants