Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unneeded allocations in ldiv!(::QRPivoted, ...) #49282

Merged
merged 1 commit into from
May 4, 2023

Conversation

kamesy
Copy link
Contributor

@kamesy kamesy commented Apr 7, 2023

This PR aims to improve the performance of ldiv!(A::QRPivoted, ...) by

  1. eliminating redundant calls to LAPACK for matrices with a rank of size(A, 2)
  2. pre-allocating/reusing temporary arrays for inverse permutation of the solution

Additionally, this PR introduces one (breaking?) change in the case where the right hand side array is larger than than the solution array. The previous implementation zeroed out the solution array, whereas this PR follows the LAPACK.xgelsy implementation and the array is no longer zeroed out.

If we allow A to be modified in-place, we could further reduce allocations.

Some benchmarks:

julia> versioninfo()
Julia Version 1.9.0-rc2
Commit 72aec423c2a (2023-04-01 10:41 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 36 × Intel(R) Xeon(R) W-2195 CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake-avx512)
  Threads: 36 on 36 virtual cores
Random.seed!(1234)

for nrhs in (1, 100_000), n in (5, 100), m in (5, 100)
   @info "m = $m, n = $n, nrhs = $nrhs"
   local A = qr(randn(m, n), ColumnNorm())
   local B = randn(max(m, n), nrhs)

   @assert ldiv!(A, copy(B))[1:n,:]  proposed_ldiv!(A, copy(B))[1:n,:]

   @btime ldiv!(a, b) setup=(a=deepcopy($A); b = copy($B))
   @btime proposed_ldiv!(a, b) setup=(a=deepcopy($A); b = copy($B))
   println()
end

[ Info: m = 5, n = 5, nrhs = 1
 2.241 μs (36 allocations: 68.03 KiB)
 1.668 μs (27 allocations: 34.52 KiB)
 
[ Info: m = 100, n = 5, nrhs = 1
 2.784 μs (36 allocations: 68.03 KiB)
 2.069 μs (27 allocations: 34.52 KiB)
 
[ Info: m = 5, n = 100, nrhs = 1
 7.454 μs (37 allocations: 73.95 KiB)
 7.210 μs (35 allocations: 73.02 KiB)
 
[ Info: m = 100, n = 100, nrhs = 1
 130.408 μs (611 allocations: 186.70 KiB)
 113.923 μs (597 allocations: 71.77 KiB)
 
[ Info: m = 5, n = 5, nrhs = 100000
 2.511 ms (37 allocations: 52.71 MiB)
 1.805 ms (27 allocations: 24.45 MiB)
 
[ Info: m = 100, n = 5, nrhs = 100000
 25.738 ms (37 allocations: 52.71 MiB)
 17.931 ms (27 allocations: 24.45 MiB)
 
[ Info: m = 5, n = 100, nrhs = 100000
 78.122 ms (38 allocations: 125.19 MiB)
 34.130 ms (35 allocations: 48.90 MiB)
 
[ Info: m = 100, n = 100, nrhs = 100000
 282.322 ms (612 allocations: 125.30 MiB)
 146.832 ms (597 allocations: 24.48 MiB)
 
...
 
 [ Info: m = 100, n = 95, nrhs = 1
 126.734 μs (582 allocations: 177.14 KiB)
 111.139 μs (567 allocations: 69.72 KiB)
 
[ Info: m = 95, n = 100, nrhs = 1
 223.025 μs (583 allocations: 204.72 KiB)
 220.735 μs (576 allocations: 201.64 KiB)
 
[ Info: m = 100, n = 95, nrhs = 100000
 273.032 ms (583 allocations: 121.48 MiB)
 143.162 ms (567 allocations: 24.48 MiB)
 
[ Info: m = 95, n = 100, nrhs = 100000
 280.939 ms (584 allocations: 125.32 MiB)
 236.457 ms (576 allocations: 49.02 MiB)

@inkydragon inkydragon added the linear algebra Linear algebra label Apr 7, 2023
@dkarrasch dkarrasch added the performance Must go faster label Apr 15, 2023
@dkarrasch
Copy link
Member

Let's run pkgeval to see whether the change in behaviour pops up in the wild. At least, it looks like we are not testing for that behaviour.

@nanosoldier runtests()

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected.
A full report can be found here.

@dkarrasch dkarrasch requested a review from andreasnoack April 17, 2023 09:02
@dkarrasch
Copy link
Member

Pkgeval seems to be fine.

@KristofferC KristofferC merged commit 705bfd3 into JuliaLang:master May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linear algebra Linear algebra performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants