Various changes to svd, eig, norm and rank #1312

andreasnoack · 2012-09-26T19:17:01Z

I have changed the functions svd, eig, norm and rank to handle zero dimensional matrices similarly to Matlab. Changed svd to use gesdd added thin svd method. Added eigvals to compute eigenvalues without vectors.
Cc:@dmbates

…ed and svd calls gesdd. Thin svd method has been added. Change eig to allow for vals only method for efficiency. Added eigvals function for convenience. Updated exported names for the changes in svd and eig. Change norm and rank to allow for zero dim matrices.

ViralBShah · 2012-09-27T18:02:15Z

I would prefer to retain sdd, and let svd call gesvd - not sure gesdd by default is the right thing just yet.

ViralBShah · 2012-09-27T18:02:42Z

Rest of the changes are fine and should go in.

StefanKarpinski · 2012-09-27T19:02:51Z

Can someone summarize the gesv versus gesdd issue? I gather that our svd may be substantially slower than Matlab's because of this? Which LAPACK routine is Matlab's svd using? Should we not just use that?

dmbates · 2012-09-27T19:49:30Z

The algorithm used in gesdd was developed more recently than the one used in gesvd. In some early tests,http://www.netlib.org/lapack/lug/node71.html, it proved to be considerably faster than gesvd. I just tried to reproduce those tests and the two methods were similar in timing. For timing the calculation of the singular values without vectors I used

julia> srand(1234321)

julia> tst = randn(1000,1000)
1000x1000 Float64 Array:  # output skipped here
 -1.10433    1.61244    -1.12984     1.72105     0.710395     0.36255     0.781754    1.59886     -0.189295  :  
...
-0.289285   -1.69648     1.53921    0.92066    -0.897352   -1.1228     -1.4976    -0.0483147

julia> norm(svdvals(tst) - sdd(tst, 'N')[2])  # check that they give essentially the same answer
0.0

julia> [@elapsed svdvals(tst) for i in 1:10]
10-element Float64 Array:
 0.756011
 0.750858
 0.771107
 0.760154
 0.76208 
 0.746233
 0.750427
 0.765365
 0.754384
 0.762076

julia> [@elapsed sdd(tst,'N')[2] for i in 1:10]
10-element Float64 Array:
 0.779769
 0.770373
 0.792355
 0.793683
 0.78    
 0.784369
 0.805282
 0.779289
 0.765798
 0.781891

You can see that the minor difference in execution time favors gesvd, not gesdd. If you get both the singular values and singular vectors the time is dominated by creating the singular vectors, which is the same for both methods.

There have been reports, http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=2&t=1868, on the Lapack and Octave lists of cases where gesvd converged but gesdd didn't. That conversation mentions a potential bug in gesdd that apparently has been fixed.

So, having been the one who started the thread on using gesdd I'm not sure now that the difference is important.

I would appreciate it if others could repeat the timings to see what happens on other machines. There is a slight possibility that openblas does something to make gesvd run faster that it would with reference BLAS. I checked the openblas sources and it doesn't look like they are doing anything crafty for either gesvd or gesdd.

StefanKarpinski · 2012-09-27T20:02:31Z

My machine is doing some background stuff at the moment, so not ideal test conditions, but here's what I get:

julia> [@elapsed svdvals(tst) for i in 1:10]
10-element Float64 Array:
 0.885181
 0.862688
 0.957457
 1.07413 
 0.982968
 0.931149
 0.891106
 0.90661 
 0.864116
 0.84166 

julia> [@elapsed sdd(tst,'N')[2] for i in 1:10]
10-element Float64 Array:
 0.896451
 0.9138  
 1.05668 
 0.869889
 0.9657  
 0.898028
 0.862489
 0.921579
 0.878551
 0.928177

I like the use of @elapsed inside a comprehension. Very nice.

dmbates · 2012-09-27T20:06:04Z

@StefanKarpinski I'm a statistician so I always want to assess the variability in the results.

StefanKarpinski · 2012-09-27T20:10:51Z

I immediately thought about doing 100 of these and applying a t-test :-)

ViralBShah · 2012-09-29T06:25:15Z

I am merging this in, but will commit a change to retain sdd for now.

Various changes to svd, eig, norm and rank

andreasnoack · 2012-09-29T16:59:38Z

Sorry for not following up on the pull request and thank you for merging. From reading your discussions on the topic from May (I think) I concluded that the gesdd was preferred, but that the code change was never done. I do not have strong opinions about which one to choose. On my Mac gesdd is faster, however insignificantly for n=50. But I think that there should be only one svd/svdvals command in Base eventually. Two is needless and potentially confusing.

ViralBShah · 2012-09-29T21:26:15Z

No worries - I had some spare time, and got around to this. The LAPACK team clearly decided that the new algorithm is not a drop-in replacement for dgesvd, and hence created a new routine. I feel that this change in behaviour may cause some surprises to people (probably a small set).

The right thing to do would be to fold it into svd, when we have named parameters, so that svd could call dgesdd with the correct options.

-viral

On 29-Sep-2012, at 10:29 PM, Andreas Noack Jensen wrote:

Sorry for not following up on the pull request and thank you for merging. From reading your discussions on the topic from May (I think) I concluded that the gesdd was preferred, but that the code change was never done. I do not have strong opinions about which one to choose. On my Mac gesdd is faster, however insignificantly for n=50. But I think that there should be only one svd/svdvals command in Base eventually. Two is needless and potentially confusing.

—
Reply to this email directly or view it on GitHub.

StefanKarpinski · 2012-09-29T21:31:29Z

Which routine does Matlab use? Do we know? Using the same one probably makes sense.

ViralBShah · 2012-09-29T21:42:57Z

I believe Matlab uses gesvd, but can't be sure. Generally, gesdd is new enough, that most existing environments are likely using gesvd in order to not spring surprises on their users. That is not a reason for us to the same too, but I feel that we need to have a better understanding and some experience with it before making gesdd the default.

-viral

On 30-Sep-2012, at 3:01 AM, Stefan Karpinski wrote:

Which routine does Matlab use? Do we know? Using the same one probably makes sense.

—
Reply to this email directly or view it on GitHub.

ViralBShah · 2012-09-29T21:45:52Z

It is worth reiterating the discussion here:

https://groups.google.com/forum/#!topic/julia-dev/mmgO65i6-fA

Basically, the recommendation by Jack Poulson is that gesdd can take much more memory, but if memory is available, for large matrices, gesdd is significantly faster.

-viral

On 30-Sep-2012, at 3:01 AM, Stefan Karpinski wrote:

Which routine does Matlab use? Do we know? Using the same one probably makes sense.

—
Reply to this email directly or view it on GitHub.

andreasnoack · 2012-09-30T06:06:11Z

My sunday have started out with reading og Lapack and Matlab documentation. Matlab changed their documentation between 10a an b. Before 10b it was explicitly stated that they used gesvd, but from 10b nothing is written about the algorithm.

In dgesdd.f you can read the following: "If singular vectors are desired, it uses a divide-and-conquer algorithm." Hence I wondered if the big difference between gesvd and gesdd was in computation of the vectors and I did some timing

srand(117)

julia> tst=randn(1000,1000);

julia> textsvd=[@Elapsed svd(tst) for i = 1:10]
10-element Float64 Array:
12.2455
12.0631
12.114
12.0491
12.0839
12.0604
11.9695
12.0548
12.0664
12.0931

julia> textsdd=[@Elapsed sdd(tst) for i = 1:10]
10-element Float64 Array:
2.34486
2.25365
2.22366
2.23132
2.24577
2.1741
2.18659
2.18702
2.15019
2.17515

I guess a t-test is not needed to determine the faster one. I exported the matrix and did the computation in Matlab 2012b and it took under two seconds.

ViralBShah · 2012-09-30T06:30:59Z

Yes, I confirm this. svdvals and sddvals take about the same time. However, sdd is about 5-6 times faster than svd, when the singular vectors are computed. Matlab 2012a also takes the same time as julia with sdd - so clearly they have switched recently.

@dmbates What is your take on this discussion? It seems that we could use gesdd as the default, and have a parameter argument to fall back to gesvd if required. The performance of gesdd comes with greater memory usage, so we do need gesvd around for those who may need it.

ViralBShah · 2012-09-30T06:36:19Z

I also looked at the original paper that Jack Poulson posted to the list:

http://www.netlib.org/lapack/lawnspdf/lawn88.pdf

The divide and conquer algorithm is not only faster, but also more accurate, at the cost of more memory.

StefanKarpinski · 2012-09-30T16:30:51Z

We should probably default to the faster and more accurate one and have the memory-efficient one as an alternative.

timholy · 2012-09-30T17:10:40Z

Agreed. 1GB would store a 10k-by10k Float64 matrix: on a beefy and fast machine (32 core Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz) that's 215 seconds in Matlab (which uses multiple threads to compute svd). So for problems that finish in a short amount of time, the memory requirements of SVD are not limiting by today's standards (where tens of GB are getting common).

ViralBShah · 2012-09-30T17:45:54Z

Yes, I agree with this assessment too. What would be a good way to fallback to the older gesvd with our current interface?

-viral

On 30-Sep-2012, at 10:40 PM, Tim Holy wrote:

Agreed. 1GB would store a 10k-by10k Float64 matrix: on a beefy and fast machine (32 core Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz) that's 215 seconds in Matlab (which uses multiple threads to compute svd). So for problems that finish in a short amount of time, the memory requirements of SVD are not limiting by today's standards (where tens of GB are getting common).

—
Reply to this email directly or view it on GitHub.

StefanKarpinski · 2012-09-30T20:14:16Z

We really need proper keyword arguments :-\

timholy · 2012-09-30T21:17:59Z

What about a separate function called gesvd?

StefanKarpinski · 2012-09-30T21:18:36Z

For now that's what we should do, but the need for keyword args is getting kind of urgent...

timholy · 2012-09-30T21:26:05Z

I'm finally using the options framework in real code (optimization routines tend to use a lot of parameters). It works well, and in some ways is even more powerful than keyword args (passing parameters down to nested functions is rather easier with options than with keyword arguments), but it has negatives, too (like run-time overhead, syntax a bit less pretty).

johnmyleswhite · 2012-09-30T22:02:38Z

While I've also started using the Options framework more often and come to like it, I do still think we need keyword arguments.

One crazy request is that we consider also having keyword constructors for composite types.

-- John

On Sep 30, 2012, at 5:26 PM, Tim Holy wrote:

I'm finally using the options framework in real code (optimization routines tend to use a lot of parameters). It works well, and in some ways is even more powerful than keyword args (passing parameters down to nested functions is rather easier with options than with keyword arguments), but it has negatives, too (like run-time overhead, syntax a bit less pretty).

—
Reply to this email directly or view it on GitHub.

StefanKarpinski · 2012-09-30T22:19:20Z

One crazy request is that we consider also having keyword constructors for composite types.

Somehow that got swallowed in the markdown formatting. I don't think that's necessarily a crazy request at all.

timholy · 2012-10-01T00:55:37Z

Oh, agreed that keyword arguments would be convenient in many circumstances, assuming they can be implemented without unfortunate side effects.

ViralBShah · 2012-10-01T04:11:28Z

gesvd is not a good name, because it is too close to gsvd, and it also sounds like the generalized singular value decomposition. Perhaps the best thing is to just add an argument to svd for now, and make it a keyword argument, when we have the feature.

-viral

On 01-Oct-2012, at 2:48 AM, Tim Holy wrote:

What about a separate function called gesvd?

—
Reply to this email directly or view it on GitHub.

dmbates · 2012-10-01T16:15:45Z

Thanks for checking the timing of svd versus sdd including the calculation of the singular vectors, @andreasnoackjensen I understood that there would not be a difference but apparently, like Rick in "Casablanca", I was misinformed. I think the simplest approach, given these results and the apparent shift of Matlab to using dgesdd (which is also what R uses in its svd function), is to remove sdd and use only svd based on gesdd, as in the original suggestion by @andreasnoackjensen

ViralBShah · 2012-10-02T03:41:55Z

Ok, I will change svd to use gesdd in the vs/linalg branch. Please don't change on master, since it will make my merge difficult. We will worry about the fallback in the future if someone needs it, and we have keyword arguments.

ViralBShah · 2012-10-02T03:42:15Z

Thanks @andreasnoackjensen for pursuing this argument all the way!

ViralBShah added a commit that referenced this pull request Sep 29, 2012

Merge pull request #1312 from andreasnoackjensen/master

c996958

Various changes to svd, eig, norm and rank

ViralBShah merged commit c996958 into JuliaLang:master Sep 29, 2012

dmbates mentioned this pull request Oct 1, 2012

Julia now incompatible with Accelerate/vecLib #1322

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various changes to svd, eig, norm and rank #1312

Various changes to svd, eig, norm and rank #1312

andreasnoack commented Sep 26, 2012

ViralBShah commented Sep 27, 2012

ViralBShah commented Sep 27, 2012

StefanKarpinski commented Sep 27, 2012

dmbates commented Sep 27, 2012

StefanKarpinski commented Sep 27, 2012

dmbates commented Sep 27, 2012

StefanKarpinski commented Sep 27, 2012

ViralBShah commented Sep 29, 2012

andreasnoack commented Sep 29, 2012

ViralBShah commented Sep 29, 2012

StefanKarpinski commented Sep 29, 2012

ViralBShah commented Sep 29, 2012

ViralBShah commented Sep 29, 2012

andreasnoack commented Sep 30, 2012

ViralBShah commented Sep 30, 2012

ViralBShah commented Sep 30, 2012

StefanKarpinski commented Sep 30, 2012

timholy commented Sep 30, 2012

ViralBShah commented Sep 30, 2012

StefanKarpinski commented Sep 30, 2012

timholy commented Sep 30, 2012

StefanKarpinski commented Sep 30, 2012

timholy commented Sep 30, 2012

johnmyleswhite commented Sep 30, 2012

StefanKarpinski commented Sep 30, 2012

timholy commented Oct 1, 2012

ViralBShah commented Oct 1, 2012

dmbates commented Oct 1, 2012

ViralBShah commented Oct 2, 2012

ViralBShah commented Oct 2, 2012

Various changes to svd, eig, norm and rank #1312

Various changes to svd, eig, norm and rank #1312

Conversation

andreasnoack commented Sep 26, 2012

ViralBShah commented Sep 27, 2012

ViralBShah commented Sep 27, 2012

StefanKarpinski commented Sep 27, 2012

dmbates commented Sep 27, 2012

StefanKarpinski commented Sep 27, 2012

dmbates commented Sep 27, 2012

StefanKarpinski commented Sep 27, 2012

ViralBShah commented Sep 29, 2012

andreasnoack commented Sep 29, 2012

ViralBShah commented Sep 29, 2012

StefanKarpinski commented Sep 29, 2012

ViralBShah commented Sep 29, 2012

ViralBShah commented Sep 29, 2012

andreasnoack commented Sep 30, 2012

ViralBShah commented Sep 30, 2012

ViralBShah commented Sep 30, 2012

StefanKarpinski commented Sep 30, 2012

timholy commented Sep 30, 2012

ViralBShah commented Sep 30, 2012

StefanKarpinski commented Sep 30, 2012

timholy commented Sep 30, 2012

StefanKarpinski commented Sep 30, 2012

timholy commented Sep 30, 2012

johnmyleswhite commented Sep 30, 2012

StefanKarpinski commented Sep 30, 2012

timholy commented Oct 1, 2012

ViralBShah commented Oct 1, 2012

dmbates commented Oct 1, 2012

ViralBShah commented Oct 2, 2012

ViralBShah commented Oct 2, 2012