Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from permute!! to permute! #272

Merged
merged 4 commits into from
Aug 24, 2023
Merged

Conversation

LilithHafner
Copy link
Contributor

@LilithHafner LilithHafner commented May 17, 2023

In Julia 1.9.0, permute! is typically faster that the internal method Base.permute!!, thanks to JuliaLang/julia#44941.

For struct arrays in particular, using the internal method Base.permute!! seems to give moderate performance improvements for small arrays in exchange for moderate regressions for huge arrays when compared to permute!

using Random, BenchmarkTools, StructArrays

small = StructArray(zip(rand(10), rand(10)));
x = rand(10_000);
y = [randstring() for _ in 1:10_000];
big = StructArray(zip(x, y, rand(ComplexF64, 10_000), x, x, y));
x = rand(1_000_000);
y = [randstring() for _ in 1:1_000_000];
z = rand(ComplexF64, 1_000_000);
huge = StructArray(zip(x, y, x, z, z, x, z, x, y, y, z, x, y));

for sa in (small, big, huge)
    @btime (Base.permute!!($sa, sortperm($sa)); $sa) setup=(shuffle!($sa)) evals=1;
    @btime (permute!($sa, sortperm($sa)); $sa) setup=(shuffle!($sa)) evals=1;
end
# 166.000 ns (2 allocations: 208 bytes)
# 208.000 ns (4 allocations: 496 bytes)
# 359.583 μs (5 allocations: 156.41 KiB)
# 340.708 μs (17 allocations: 703.56 KiB)
# 192.807 ms (5 allocations: 15.26 MiB)
# 107.488 ms (31 allocations: 144.96 MiB)

I would say permute! is better on balance, and even if it were even, using a public method with a simple implementation is typically better than using an internal method with a complex implementation for maintainability and compile time.

@LilithHafner
Copy link
Contributor Author

Bump

@piever
Copy link
Collaborator

piever commented Jun 7, 2023

Sorry for the delay in the review. Could you maybe benchmark against a precomputed sortperm(sa)? Simply to isolate the performance of the two options without getting extra noise from the sortperm computation.

That being said, if it is good enough for Base, it is probably good enough for StructArrays as well, and indeed it would be good to not depend on Base internals.

@piever
Copy link
Collaborator

piever commented Jun 7, 2023

I actually think that we should add a specialization for Base.permute! and just call Base.permute!(c, sortperm(c)) in the sort! method. In any case, we want to make sure that Base.permute!(sa::StructArray) has good performance in general, which may no longer be the case after JuliaLang/julia#44941.

I still am unsure as to what is the optimal implementation of Base.permute!(sa::StructArray, I).

  1. Simply call the fallback method on refarray(sa::StructArray).
  2. Call permute!(v, i) on each individual component and let the component package optimize for this. I think this is the cleanest method, though I suspect that after Stop using permute!! and invpermute!! JuliaLang/julia#44941 the performance of permute!(v::PooledArray, i) degraded, as they relied on a Base.permute!! overload. This should probably be verified and in case fixed.
  3. Call permute(refarray(v), i) on each individual component. This is conceptually less clean but should ensure that we do not sacrifice performance.

@LilithHafner
Copy link
Contributor Author

Could you maybe benchmark against a precomputed sortperm(sa)? Simply to isolate the performance of the two options without getting extra noise from the sortperm computation.

using Random, BenchmarkTools, StructArrays

small = StructArray(zip(rand(10), rand(10)));
x = rand(10_000);
y = [randstring() for _ in 1:10_000];
big = StructArray(zip(x, y, rand(ComplexF64, 10_000), x, x, y));
x = rand(1_000_000);
y = [randstring() for _ in 1:1_000_000];
z = rand(ComplexF64, 1_000_000);
huge = StructArray(zip(x, y, x, z, z, x, z, x, y, y, z, x, y));

for sa in (small, big, huge)
    @btime (Base.permute!!($sa, perm); $sa) setup=(shuffle!($sa); perm=sortperm($sa)) evals=1;
    @btime (permute!($sa, perm); $sa) setup=(shuffle!($sa); perm=sortperm($sa)) evals=1;
end
# 0.001 ns (0 allocations: 0 bytes) <- this one is strange
# 41.000 ns (2 allocations: 288 bytes)
# 80.958 μs (0 allocations: 0 bytes)
# 62.208 μs (12 allocations: 547.16 KiB)
# 135.079 ms (0 allocations: 0 bytes)
# 55.840 ms (26 allocations: 129.70 MiB)

I suspect that after JuliaLang/julia#44941 the performance of permute!(v::PooledArray, i) degraded, as they relied on a Base.permute!! overload. This should probably be verified and in case fixed.

I'm not familiar with PooledArrays internals, but if this is a fair benchmark, it looks like a 5.5x speedup.

julia> using PooledArrays, BenchmarkTools

julia> old_permute!(a, p::AbstractVector) = Base.permute!!(a, Base.copymutable(p));

julia> new_permute!(v, p::AbstractVector) = (v .= v[p]);

julia> x = PooledArray(rand(["a", "b", "cat", "d"], 1000); compress=true, signed=true);

julia> perm = randperm(length(x));

julia> @btime old_permute!($x, $perm);
  2.338 μs (1 allocation: 7.94 KiB)

julia> @btime new_permute!($x, $perm);
  594.381 ns (3 allocations: 1.12 KiB)

we want to make sure that Base.permute!(sa::StructArray) has good performance in general, which may no longer be the case after JuliaLang/julia#44941.

Using the definitions of small, big, huge, old_permute!, and new_permute! above, it seems that Base.permute!(sa::StructArray, I) got better more than it got worse after JuliaLang/julia#44941

julia> for sa in (small, big, huge)
           perm = randperm(length(sa))
           @btime old_permute!($sa, $perm)
           @btime new_permute!($sa, $perm)
       end
  54.962 ns (1 allocation: 144 bytes)
  77.234 ns (2 allocations: 288 bytes)
  91.250 μs (2 allocations: 78.17 KiB)
  61.291 μs (12 allocations: 547.16 KiB)
  135.477 ms (2 allocations: 7.63 MiB)
  55.504 ms (26 allocations: 129.70 MiB)

I still am unsure as to what is the optimal implementation of Base.permute!(sa::StructArray, I).

I have also grappled with that question. I even made a package for it: https://github.com/LilithHafner/CompiledPermutations.jl. However, I have yet to find anything consistently faster than Base's default fallback (i.e. defining no permute! methods in StructArrays.jl). Despite not finding one, I am still confident a faster approach exists.

@piever
Copy link
Collaborator

piever commented Jun 15, 2023

Ah, I see, so permute!(x::PooledVector, p) is fast already by default with the new permute!. I'd have imagined

function permute!(v::PooledArray, p)
    permute!(v.refs, p)
    return v
end

would have been much faster, but, by a quick benchmark, it looks like a very minor improvement. (It can be done separately over at PooledArrays anyways.)

Regarding the implementation of permute!(sa::StructArray, perm), I think the main issue is not so much the performance with plain arrays, but rather that we might fail to catch optimized implementations for component arrays.

For example (on julia 1.9):

julia> using WeakRefStrings, StructArrays, BenchmarkTools, Random

julia> v = WeakRefStrings.StringArray(rand(["a", "b"], 10_000));

julia> sa = StructArray((v,));

julia> perm = randperm(10_000);

julia> @btime permute!($sa, $perm);
  475.200 μs (20013 allocations: 627.33 KiB)

julia> @btime StructArrays.foreachfield(v -> permute!(v, $perm), $sa)
  277.200 μs (13 allocations: 158.58 KiB)

so I would be in favor to add the above as an optimized implementation of permute!, in which case sort! can just called the vanilla permute!, as it appears to be optimized in all scenarios.

@LilithHafner
Copy link
Contributor Author

Yes, I like making permute! fast and using the vanilla version it in sorting, that way folks who want to permute without sorting still get a fast permutation.

I'm not totally familiar with the use of refarray, but as is, we call it before passing a StructArray to sortperm, for the possibility of an easier to sort result, but do not call it before permute! because presumably component arrays that are faster to permute! by calling refarray first will handle that themselves.

src/sort.jl Outdated Show resolved Hide resolved
@piever
Copy link
Collaborator

piever commented Jun 23, 2023

I'm not totally familiar with the use of refarray, but as is, we call it before passing a StructArray to sortperm, for the possibility of an easier to sort result, but do not call it before permute! because presumably component arrays that are faster to permute! by calling refarray first will handle that themselves.

Yes, that seems ideal (ref JuliaData/PooledArrays.jl#84). Actually though there is no guarantee that refarray(c) and c sort in the same way (it is actually false for PooledArrays), so I imagine we should not call refarray at all. A lot of work went into sortperm to make sure that it is fast where the components are PooledArrays, so that can probably be left as is. I've left a small edit, other than that I think this is good to merge.

Co-authored-by: Pietro Vertechi <pietro.vertechi@protonmail.com>
@LilithHafner
Copy link
Contributor Author

Lovely, this seems good from my end, too!

@LilithHafner
Copy link
Contributor Author

Bump?

@LilithHafner
Copy link
Contributor Author

@piever, what's the way forward here? Are you looking for another reviewer?

@piever
Copy link
Collaborator

piever commented Aug 24, 2023

Sorry for the delay, no, this is good to go!

@piever piever merged commit 977fe4d into JuliaArrays:master Aug 24, 2023
@LilithHafner LilithHafner deleted the patch-1 branch August 24, 2023 14:32
@LilithHafner
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants