-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch from permute!!
to permute!
#272
Conversation
Bump |
Sorry for the delay in the review. Could you maybe benchmark against a precomputed That being said, if it is good enough for Base, it is probably good enough for StructArrays as well, and indeed it would be good to not depend on Base internals. |
I actually think that we should add a specialization for I still am unsure as to what is the optimal implementation of
|
using Random, BenchmarkTools, StructArrays
small = StructArray(zip(rand(10), rand(10)));
x = rand(10_000);
y = [randstring() for _ in 1:10_000];
big = StructArray(zip(x, y, rand(ComplexF64, 10_000), x, x, y));
x = rand(1_000_000);
y = [randstring() for _ in 1:1_000_000];
z = rand(ComplexF64, 1_000_000);
huge = StructArray(zip(x, y, x, z, z, x, z, x, y, y, z, x, y));
for sa in (small, big, huge)
@btime (Base.permute!!($sa, perm); $sa) setup=(shuffle!($sa); perm=sortperm($sa)) evals=1;
@btime (permute!($sa, perm); $sa) setup=(shuffle!($sa); perm=sortperm($sa)) evals=1;
end
# 0.001 ns (0 allocations: 0 bytes) <- this one is strange
# 41.000 ns (2 allocations: 288 bytes)
# 80.958 μs (0 allocations: 0 bytes)
# 62.208 μs (12 allocations: 547.16 KiB)
# 135.079 ms (0 allocations: 0 bytes)
# 55.840 ms (26 allocations: 129.70 MiB)
I'm not familiar with PooledArrays internals, but if this is a fair benchmark, it looks like a 5.5x speedup. julia> using PooledArrays, BenchmarkTools
julia> old_permute!(a, p::AbstractVector) = Base.permute!!(a, Base.copymutable(p));
julia> new_permute!(v, p::AbstractVector) = (v .= v[p]);
julia> x = PooledArray(rand(["a", "b", "cat", "d"], 1000); compress=true, signed=true);
julia> perm = randperm(length(x));
julia> @btime old_permute!($x, $perm);
2.338 μs (1 allocation: 7.94 KiB)
julia> @btime new_permute!($x, $perm);
594.381 ns (3 allocations: 1.12 KiB)
Using the definitions of julia> for sa in (small, big, huge)
perm = randperm(length(sa))
@btime old_permute!($sa, $perm)
@btime new_permute!($sa, $perm)
end
54.962 ns (1 allocation: 144 bytes)
77.234 ns (2 allocations: 288 bytes)
91.250 μs (2 allocations: 78.17 KiB)
61.291 μs (12 allocations: 547.16 KiB)
135.477 ms (2 allocations: 7.63 MiB)
55.504 ms (26 allocations: 129.70 MiB)
I have also grappled with that question. I even made a package for it: https://github.com/LilithHafner/CompiledPermutations.jl. However, I have yet to find anything consistently faster than Base's default fallback (i.e. defining no |
Ah, I see, so function permute!(v::PooledArray, p)
permute!(v.refs, p)
return v
end would have been much faster, but, by a quick benchmark, it looks like a very minor improvement. (It can be done separately over at PooledArrays anyways.) Regarding the implementation of For example (on julia 1.9): julia> using WeakRefStrings, StructArrays, BenchmarkTools, Random
julia> v = WeakRefStrings.StringArray(rand(["a", "b"], 10_000));
julia> sa = StructArray((v,));
julia> perm = randperm(10_000);
julia> @btime permute!($sa, $perm);
475.200 μs (20013 allocations: 627.33 KiB)
julia> @btime StructArrays.foreachfield(v -> permute!(v, $perm), $sa)
277.200 μs (13 allocations: 158.58 KiB) so I would be in favor to add the above as an optimized implementation of |
Yes, I like making I'm not totally familiar with the use of |
Yes, that seems ideal (ref JuliaData/PooledArrays.jl#84). Actually though there is no guarantee that |
Co-authored-by: Pietro Vertechi <pietro.vertechi@protonmail.com>
Lovely, this seems good from my end, too! |
Bump? |
@piever, what's the way forward here? Are you looking for another reviewer? |
Sorry for the delay, no, this is good to go! |
Thanks! |
In Julia 1.9.0,
permute!
is typically faster that the internal methodBase.permute!!
, thanks to JuliaLang/julia#44941.For struct arrays in particular, using the internal method
Base.permute!!
seems to give moderate performance improvements for small arrays in exchange for moderate regressions for huge arrays when compared topermute!
I would say
permute!
is better on balance, and even if it were even, using a public method with a simple implementation is typically better than using an internal method with a complex implementation for maintainability and compile time.