-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sortperm has poor performance #939
Comments
We are still about 1.7x slower than matlab for plain sorting, and about 3x slower for sortperm
|
This is highly dependent on the data size and we seem to be doing asymptotically worse than Matlab on
This indicates to me that they may be simply using a better stable sorting algorithm for |
Some numbers for
|
Also an observation @StefanKarpinski |
|
right so any timing against matlab must have the sorted yourself operation applied |
Yeah, that's how I was doing it. |
for reference, MATLAB uses quicksort: http://www.mathworks.com/support/solutions/en/data/1-15K1B/ |
It would be good to check if we're still slower than Matlab here. I suspect we may have caught up. |
Best of 5 timings
Matlab R2012b, Julia 0.2-rc4 on OSX 10.9 |
Bummer. So mostly not. We're still fast, but we're not quite as fast, and not any better than 10 months ago. |
“Within a factor of 2 of MATLAB” is certainly much less impressive than “with a factor of 2 of C” |
IIRC Matlab's quicksort beats C's quicksort. |
Matlab's floating-point sort is the best around. I wouldn't be too surprised if they have an assembly implementation. |
Would be worth comparing to modern STL, of course. |
I also found that Matlab’s Relaunching Matlab in single-threaded mode (with the
|
Oh, well, that's very interesting. Looks like we're doing quite well for a single-threaded implementation. |
(It’s a little odd that on a 4 core machine, Matlab launches only two threads.) |
I don't have Matlab to compare to, but at these sizes, Julia's RadixSort is faster than Julia's QuickSort (the default). Median of 5 timings, using log10n QuickSort_min RadixSort_min min_ratio!
[1,] 4 0.000720482 0.000698203 0.969078
[2,] 5 0.00875544 0.00684599 0.781912
[3,] 6 0.105312 0.0932153 0.885135
[4,] 7 1.23621 1.04952 0.848981
[5,] 8 14.1665 11.004 0.77676 (QuickSort is faster for smaller n) |
These actually look close to @jihao's timings, so Julia's radix sort is probably faster than Matlab in single-threaded mode, but slower generally. |
The radix sort performance is quite impressive. Maybe we should switch to that as the default number sort? |
FWIW, the R package data.table uses the radix sort by default. And it's known particularly for its speed. |
Makes sense to switch to radix as the default number sort. @kmsquire Should we compare quicksort and radixsort on the full benchmark you had put together once? |
@karbarcca I don’t speak @ViralBShah is this the same benchmark that is in |
Yes, the same one, but we should run all the tests instead of just a few for codespeed. |
I believe it is greatly improved since we opened this in 2012 (independent of the radix sort). I think it would be useful to open specific new issues with more current benchmarks. |
Can confirm that #44230 does not address |
half closed due to #44230. now we just need fast sort-perm. |
What is the target for performance of fast sortperm? Which system is the right one to compare against now? The performance is greatly improved from when this issue was filed. |
I think a good benchmark would be Matlab, and R. I would think those would be the languages with the best optimizations here. |
|
I think R would be a decent target, but I also think the target should just be getting |
Absent implementation where |
In R R 4.1.2: > x = runif(100000)
> system.time(sort(x))
user system elapsed
0.008 0.000 0.009
> system.time(order(x))
user system elapsed
0.007 0.000 0.008
> x = runif(10000000)
> system.time(sort(x))
user system elapsed
0.757 0.126 0.889
> system.time(order(x))
user system elapsed
0.440 0.060 0.506 Julia master: julia> x = rand(100_000);
julia> @btime sort(x);
4.915 ms (6 allocations: 1.53 MiB)
# Doesn't use Radix sort yet
julia> @btime sortperm(x);
19.435 ms (3 allocations: 781.31 KiB)
julia> x = rand(10_000_000);
julia> @btime sort(x);
289.151 ms (6 allocations: 152.60 MiB)
# Doesn't use Radix sort yet
julia> @btime sortperm(x);
1.994 s (3 allocations: 76.29 MiB)
FWIW, in my experiments at JuliaCollections/SortingAlgorithms.jl#33, I noticed that it is much faster for large vectors to sort (a copy of) data at the same time as indices, to avoid jumping across the data vector. |
Wait, am I misreading those R timings? It looks like
That's interesting and suggests that we should provide and API that does both, i.e. |
Reopening until we've determined if there's actually performance gains to be had or not. |
It's also wild that Julia is 3-4x faster at sorting than R is, but less awesome that we're 4x slower at sortperm for a larger vector. Also, it seems like you're sorting normally distributed random floats in R and uniformly distributed floats in Julia; probably doesn't matter (it shouldn't, right!?). |
Indeed.
I'm working on it :) |
Ah, good catch. I've edited my comment to fix this. R's |
If R's |
That’s actually the thing — there’s no need to do all this hopping back and forth between the indices and the data. At the very least, zipping together a range and then collecting the output speeds up Actually, I rarely have a list that’s slow to sort because it’s too long, but I’ve had situations where I have to call You’re also reading everything completely correctly, R’s I think |
It would matter if they’re not identical algorithms. The maximum possible sorting speed for random data depends on the distribution of the data and its entropy, which is why histogram sorts exist. |
|
Perhaps |
No, it should definitely be some variation on |
I personally would prefer |
We could use a keyword argument
Then for long* vectors we could define sortperm(v; kw...) = sort(eachindex(v); keys=copymutable(v), kw...)
sortperm!(ix, v; kw...) = sort!(maybeinitialize!(ix, v; kw...); keys=copymutable(v), kw...) *It's very hard to beat insertion sort on short vectors. Here's a version made specifically for `sortperm!`function sortperm!(p::AbstractVector{<:Integer}, v::AbstractVector, lo::Integer, hi::Integer, ::InsertionSortAlg, o::Ordering)
@inbounds begin
p[lo] = lo
for i = lo+1:hi
j = i
x = v[i]
while j > lo && lt(o, x, v[p[j-1]])
p[j] = p[j-1]
j -= 1
end
p[j] = i
end
return p
end
end **Perhaps I'm having a bit too much fun with markdown :) Footnotes
|
sortperm has poor performance compared to matlab.
[edit - updated the issue 1/25/2012 -- ViralBShah]
The text was updated successfully, but these errors were encountered: