Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sortperm has poor performance #939

Open
ViralBShah opened this issue Jun 17, 2012 · 117 comments
Open

sortperm has poor performance #939

ViralBShah opened this issue Jun 17, 2012 · 117 comments
Labels
performance Must go faster sorting Put things in order

Comments

@ViralBShah
Copy link
Member

ViralBShah commented Jun 17, 2012

sortperm has poor performance compared to matlab.

julia> y = collect(1000000:-1:1);

julia> @time sortperm(y);
elapsed time: 0.0885950756072998 seconds
>> tic; [ign, p] = sort(y); toc
Elapsed time is 0.030718 seconds.

[edit - updated the issue 1/25/2012 -- ViralBShah]

JeffBezanson added a commit that referenced this issue Nov 16, 2012
this is an optimization and also makes it easier to get callback pointers.
closes #938. sparse on Range 3x faster
helps #1211 (ziggurat), about 25% faster
helps #1169 (game of go), about 25% faster
helps #939 (sortperm), about 25% faster
helps #1163 (graph centrality) a bit, about 10% faster
@ViralBShah
Copy link
Member Author

We are still about 1.7x slower than matlab for plain sorting, and about 3x slower for sortperm

>> y= rand(1000000,1);
>> tic; sort(y); toc
Elapsed time is 0.073307 seconds.

julia> y = rand(1000000);
julia> @time sort(y);
elapsed time: 0.12204194068908691 seconds

@ViralBShah ViralBShah reopened this Jan 26, 2013
@ghost ghost assigned StefanKarpinski Jan 26, 2013
@StefanKarpinski
Copy link
Member

This is highly dependent on the data size and we seem to be doing asymptotically worse than Matlab on sortperm:

       n    Matlab    Julia  ratio
  100000  0.010252 0.021838  2.130
 1000000  0.108941 0.313423  2.877  
10000000  1.326452 4.901360  3.695

This indicates to me that they may be simply using a better stable sorting algorithm for sortperm. We're using a merge sort, which is very good for general purpose usage, but maybe possible to beat in the very specific case of sorting a dense array of integer indices by using them to index into another auxiliary array.

@StefanKarpinski
Copy link
Member

Some numbers for sort and sort!:

        n    Matlab     Julia    Julia!  ratio ratio!
    10000  0.000768  0.000898  0.000874  1.169  1.138
   100000  0.009830  0.012044  0.010627  1.225  1.081
  1000000  0.079106  0.145935  0.128114  1.845  1.620
 10000000  0.827866  1.569710  1.508500  1.896  1.822
100000000 11.607730 18.483200 17.849900  1.592  1.537

@alanedelman
Copy link
Contributor

Also an observation @StefanKarpinski
it seems to me sortperm doesn't return the sorted vector , only the permutation
therefore a true timing comparison vs matlab should compare a julia sort and a sortperm (or did I miss the fact
that sortperm gives me the sort as well?)

@ViralBShah
Copy link
Member Author

sortperm now gives only the permutation, and you have to apply it yourself to get the sorted vector.

@alanedelman
Copy link
Contributor

right so any timing against matlab must have the sorted yourself operation applied

@StefanKarpinski
Copy link
Member

Yeah, that's how I was doing it.

@amroamroamro
Copy link

for reference, MATLAB uses quicksort: http://www.mathworks.com/support/solutions/en/data/1-15K1B/

@StefanKarpinski
Copy link
Member

It would be good to check if we're still slower than Matlab here. I suspect we may have caught up.

@jiahao
Copy link
Member

jiahao commented Nov 14, 2013

Best of 5 timings

log10n Matlab   Julia!  Julia   ratio!  ratio 
4      0.0007   0.0008  0.0008  1.1512  1.1985
5      0.0053   0.0101  0.0102  1.9105  1.9225
6      0.0748   0.1117  0.1153  1.4935  1.5410
7      0.8640   1.3982  1.3864  1.6183  1.6046
8     11.3690   15.5387 16.3090 1.3668  1.4345

Matlab R2012b, Julia 0.2-rc4 on OSX 10.9

@StefanKarpinski
Copy link
Member

Bummer. So mostly not. We're still fast, but we're not quite as fast, and not any better than 10 months ago.

@jiahao
Copy link
Member

jiahao commented Nov 14, 2013

“Within a factor of 2 of MATLAB” is certainly much less impressive than “with a factor of 2 of C”

@timholy
Copy link
Member

timholy commented Nov 14, 2013

IIRC Matlab's quicksort beats C's quicksort.

@StefanKarpinski
Copy link
Member

Matlab's floating-point sort is the best around. I wouldn't be too surprised if they have an assembly implementation.

@timholy
Copy link
Member

timholy commented Nov 14, 2013

http://www.mathworks.com/company/newsletters/articles/an-adventure-of-sortsbehind-the-scenes-of-a-matlab-upgrade.html

Would be worth comparing to modern STL, of course.

@jiahao
Copy link
Member

jiahao commented Nov 14, 2013

I also found that Matlab’s sort is inherently multithreaded. The numbers I posted above were run with maxNumCompThreads=2.

Relaunching Matlab in single-threaded mode (with the -singleCompThread command line option) produces quite different results:

log10n Matlab   Julia!  Julia   ratio!  ratio 
4   0.0008  0.0008  0.0008  1.0042  1.0454
5   0.0085  0.0101  0.0102  1.1852  1.1926
6   0.1032  0.1117  0.1153  1.0825  1.1169
7   1.3162  1.3982  1.3864  1.0623  1.0533
8   15.6586 15.5387 16.3090 0.9923  1.0415

@StefanKarpinski
Copy link
Member

Oh, well, that's very interesting. Looks like we're doing quite well for a single-threaded implementation.

@jiahao
Copy link
Member

jiahao commented Nov 14, 2013

(It’s a little odd that on a 4 core machine, Matlab launches only two threads.)

@kmsquire
Copy link
Member

I don't have Matlab to compare to, but at these sizes, Julia's RadixSort is faster than Julia's QuickSort (the default). Median of 5 timings, using sort!:

        log10n QuickSort_min RadixSort_min min_ratio!
[1,]         4   0.000720482   0.000698203   0.969078
[2,]         5    0.00875544    0.00684599   0.781912
[3,]         6      0.105312     0.0932153   0.885135
[4,]         7       1.23621       1.04952   0.848981
[5,]         8       14.1665        11.004    0.77676

(QuickSort is faster for smaller n)

@kmsquire
Copy link
Member

These actually look close to @jihao's timings, so Julia's radix sort is probably faster than Matlab in single-threaded mode, but slower generally.

@StefanKarpinski
Copy link
Member

The radix sort performance is quite impressive. Maybe we should switch to that as the default number sort?

@quinnj
Copy link
Member

quinnj commented Nov 14, 2013

FWIW, the R package data.table uses the radix sort by default. And it's known particularly for its speed.

@ViralBShah
Copy link
Member Author

Makes sense to switch to radix as the default number sort. @kmsquire Should we compare quicksort and radixsort on the full benchmark you had put together once?

@jiahao
Copy link
Member

jiahao commented Nov 14, 2013

@karbarcca I don’t speak R. If you could whip together a benchmark I can run it on our machine.

@ViralBShah is this the same benchmark that is in test/perf/sort?

@ViralBShah
Copy link
Member Author

Yes, the same one, but we should run all the tests instead of just a few for codespeed.

@ViralBShah
Copy link
Member Author

ViralBShah commented Mar 12, 2022

I believe it is greatly improved since we opened this in 2012 (independent of the radix sort). I think it would be useful to open specific new issues with more current benchmarks.

@LilithHafner
Copy link
Member

Can confirm that #44230 does not address sortperm.

@ViralBShah ViralBShah reopened this Mar 12, 2022
@oscardssmith
Copy link
Member

half closed due to #44230. now we just need fast sort-perm.

@ViralBShah ViralBShah changed the title sort/sortperm has poor performance sortperm has poor performance Apr 3, 2022
@ViralBShah
Copy link
Member Author

What is the target for performance of fast sortperm? Which system is the right one to compare against now? The performance is greatly improved from when this issue was filed.

@oscardssmith
Copy link
Member

I think a good benchmark would be Matlab, and R. I would think those would be the languages with the best optimizations here.

@ViralBShah
Copy link
Member Author

sortperm is about 2x slower than sort, which itself is now much faster. I think it is fine to close this issue, and open a new issue when benchmarking against other systems (e.g. argsort in python).

@ParadaCarleton
Copy link
Contributor

What is the target for performance of fast sortperm? Which system is the right one to compare against now? The performance is greatly improved from when this issue was filed.

I think R would be a decent target, but I also think the target should just be getting sortperm about as fast as sort. Currently it takes about 50% longer, but there's really no reason for this.

@StefanKarpinski
Copy link
Member

Absent implementation where sortperm is as fast as sort I don't really see why it's necessarily the case that it can be as fast. Yes, both involved allocating a new output vector, but for sort you are comparing and moving data in the same array whereas with sortperm you are hopping back and forth between the indices and the data which seems inherently slower to me. On the other hand, you know the exact distribution of the indices, but I'm not sure that's actually helpful since you still can't do something like counting sort.

@nalimilan
Copy link
Member

nalimilan commented Apr 6, 2022

In R sort is as fast or faster than order (i.e. sortperm) in a random Float64 example... but they are both much slower than sort on Julia master (R uses Radix sort too).

R 4.1.2:

> x = runif(100000)

> system.time(sort(x))
       user      system     elapsed 
      0.008       0.000       0.009 

> system.time(order(x))
       user      system     elapsed 
      0.007       0.000       0.008 

> x = runif(10000000)

> system.time(sort(x))
       user      system     elapsed 
      0.757       0.126       0.889 

> system.time(order(x))
       user      system     elapsed 
      0.440       0.060       0.506 

Julia master:

julia> x = rand(100_000);

julia> @btime sort(x);
  4.915 ms (6 allocations: 1.53 MiB)

# Doesn't use Radix sort yet
julia> @btime sortperm(x);
  19.435 ms (3 allocations: 781.31 KiB)

julia> x = rand(10_000_000);

julia> @btime sort(x);
  289.151 ms (6 allocations: 152.60 MiB)

# Doesn't use Radix sort yet
julia> @btime sortperm(x);
  1.994 s (3 allocations: 76.29 MiB)

FWIW, in my experiments at JuliaCollections/SortingAlgorithms.jl#33, I noticed that it is much faster for large vectors to sort (a copy of) data at the same time as indices, to avoid jumping across the data vector.

@StefanKarpinski
Copy link
Member

Wait, am I misreading those R timings? It looks like order is almost 2x faster than sort.

FWIW, in my experiments at JuliaCollections/SortingAlgorithms.jl#33, I noticed that it is much faster for large vectors to sort (a copy of) data at the same time as indices, to avoid jumping across the data vector.

That's interesting and suggests that we should provide and API that does both, i.e. sortperm!(v) that sorts v and returns the indices. That unfortunately disagrees a bit with sortperm!(idx, v) which sorts the pre-allocated index vector in-place and leaves the value vector unmodified. We could introduce sortperm!!(idx, v) which uses pre-allocated idx and sorts v while working; then it might make more sense for the method that sorts v and returns a newly allocated index to be written as sortperm!(v) which also suggests sortperm!!(v, idx) as perhaps a better argument ordering for the version that sorts both.

@StefanKarpinski
Copy link
Member

Reopening until we've determined if there's actually performance gains to be had or not.

@StefanKarpinski
Copy link
Member

It's also wild that Julia is 3-4x faster at sorting than R is, but less awesome that we're 4x slower at sortperm for a larger vector. Also, it seems like you're sorting normally distributed random floats in R and uniformly distributed floats in Julia; probably doesn't matter (it shouldn't, right!?).

@LilithHafner
Copy link
Member

It's also wild that Julia is 3-4x faster at sorting than R is

Indeed.

but less awesome that we're 4x slower at sortperm for a larger vector.

I'm working on it :)

@nalimilan
Copy link
Member

Also, it seems like you're sorting normally distributed random floats in R and uniformly distributed floats in Julia; probably doesn't matter (it shouldn't, right!?).

Ah, good catch. I've edited my comment to fix this. R's sort is indeed faster with uniformly distributed floats, but still slower than Julia, and the difference between sort and order persists. It's indeed intriguing that the latter is faster than the former with large inputs.

@StefanKarpinski
Copy link
Member

If R's order is still faster than our sortperm once @LilithHafner's done overhauling it, then we can look deeper into how they're doing it, but in the meantime it could just be that their sort is slower than it should be and order is fast enough.

@ParadaCarleton
Copy link
Contributor

Absent implementation where sortperm is as fast as sort I don't really see why it's necessarily the case that it can be as fast. Yes, both involved allocating a new output vector, but for sort you are comparing and moving data in the same array whereas with sortperm you are hopping back and forth between the indices and the data which seems inherently slower to me. On the other hand, you know the exact distribution of the indices, but I'm not sure that's actually helpful since you still can't do something like counting sort.

That’s actually the thing — there’s no need to do all this hopping back and forth between the indices and the data. At the very least, zipping together a range and then collecting the output speeds up sortperm a lot. @xiaodaigh has an implementation that does this. Benchmarks showed that the reason sortperm is this slow is because it had a lot of cache misses. It’s possible that there’s something even faster than that, but if so I don’t know what it is.

Actually, I rarely have a list that’s slow to sort because it’s too long, but I’ve had situations where I have to call sortperm on lots of small lists, one after another, so this is the more important possible improvement IMO.

You’re also reading everything completely correctly, R’s order is twice as fast as their sort.

I think order! would be a good name for a function that sorts the array and returns the same result as sortperm.

@ParadaCarleton
Copy link
Contributor

ParadaCarleton commented Apr 6, 2022

It's also wild that Julia is 3-4x faster at sorting than R is, but less awesome that we're 4x slower at sortperm for a larger vector. Also, it seems like you're sorting normally distributed random floats in R and uniformly distributed floats in Julia; probably doesn't matter (it shouldn't, right!?).

It would matter if they’re not identical algorithms. The maximum possible sorting speed for random data depends on the distribution of the data and its entropy, which is why histogram sorts exist.

@ViralBShah
Copy link
Member Author

I think order! would be a good name for a function that sorts the array and returns the same result as sortperm.

order is such an overloaded word in numerical computing. That's why I like sorting related functions to all have sort prefixes.

@ParadaCarleton
Copy link
Contributor

I think order! would be a good name for a function that sorts the array and returns the same result as sortperm.

order is such an overloaded word in numerical computing. That's why I like sorting related functions to all have sort prefixes.

Perhaps ordered! or ordering! then?

@StefanKarpinski
Copy link
Member

No, it should definitely be some variation on sortperm like sortperm! or sortperm!!

@ParadaCarleton
Copy link
Contributor

No, it should definitely be some variation on sortperm like sortperm! or sortperm!!

I personally would prefer sortperm!, but as you mentioned, that's already taken. sortperm!! might work, but double-exclamation marks are already pretty widely-used to mean "Mutate if possible, create a new object if not."

@LilithHafner
Copy link
Member

We could use a keyword argument keys1:

sort!(v, keys = keys) would sort v and keys at the same time, putting keys in order2.
sort(v, keys = keys) would make a copy of v but not of keys3.

Then for long* vectors we could define

sortperm(v; kw...) = sort(eachindex(v); keys=copymutable(v), kw...)
sortperm!(ix, v; kw...) = sort!(maybeinitialize!(ix, v; kw...); keys=copymutable(v), kw...)
*It's very hard to beat insertion sort on short vectors. Here's a version made specifically for `sortperm!`
function sortperm!(p::AbstractVector{<:Integer}, v::AbstractVector, lo::Integer, hi::Integer, ::InsertionSortAlg, o::Ordering)
    @inbounds begin
        p[lo] = lo    
        for i = lo+1:hi
            j = i
            x = v[i]
            while j > lo && lt(o, x, v[p[j-1]])
                p[j] = p[j-1]
                j -= 1
            end
            p[j] = i
        end
        return p
    end
end

**Perhaps I'm having a bit too much fun with markdown :)

Footnotes

  1. Some alternative names include with, using, by, according_to_and_mutate, or even _keys and leave it undocumented. I'm not particularly attached to any of these options.

  2. Why keys instead of payload and sort by v? We need to return the first argument (v) for type stability and to follow convention, and if someone is sorting according to array a and also wants to put b in order, it is very likely that they are about to do something else with b, so it makes more sense to call b v and a keys.

  3. Why mutate keys by default, even for sort? It is clear how to prevent mutation sort(v, keys=(copy(my_precious_keys))) but not so clear how to force mutation if it is not the default. Also, most of the time one wants to sort according to a vector it is nice to have that vector sorted (e.g. sorting a DataFrame by a column).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster sorting Put things in order
Projects
None yet
Development

No branches or pull requests