-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various changes to svd, eig, norm and rank #1312
Conversation
…ed and svd calls gesdd. Thin svd method has been added. Change eig to allow for vals only method for efficiency. Added eigvals function for convenience. Updated exported names for the changes in svd and eig. Change norm and rank to allow for zero dim matrices.
I would prefer to retain |
Rest of the changes are fine and should go in. |
Can someone summarize the |
The algorithm used in julia> srand(1234321)
julia> tst = randn(1000,1000)
1000x1000 Float64 Array: # output skipped here
-1.10433 1.61244 -1.12984 1.72105 0.710395 0.36255 0.781754 1.59886 -0.189295 :
...
-0.289285 -1.69648 1.53921 0.92066 -0.897352 -1.1228 -1.4976 -0.0483147
julia> norm(svdvals(tst) - sdd(tst, 'N')[2]) # check that they give essentially the same answer
0.0
julia> [@elapsed svdvals(tst) for i in 1:10]
10-element Float64 Array:
0.756011
0.750858
0.771107
0.760154
0.76208
0.746233
0.750427
0.765365
0.754384
0.762076
julia> [@elapsed sdd(tst,'N')[2] for i in 1:10]
10-element Float64 Array:
0.779769
0.770373
0.792355
0.793683
0.78
0.784369
0.805282
0.779289
0.765798
0.781891 You can see that the minor difference in execution time favors There have been reports, http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=2&t=1868, on the Lapack and Octave lists of cases where So, having been the one who started the thread on using I would appreciate it if others could repeat the timings to see what happens on other machines. There is a slight possibility that openblas does something to make |
My machine is doing some background stuff at the moment, so not ideal test conditions, but here's what I get: julia> [@elapsed svdvals(tst) for i in 1:10]
10-element Float64 Array:
0.885181
0.862688
0.957457
1.07413
0.982968
0.931149
0.891106
0.90661
0.864116
0.84166
julia> [@elapsed sdd(tst,'N')[2] for i in 1:10]
10-element Float64 Array:
0.896451
0.9138
1.05668
0.869889
0.9657
0.898028
0.862489
0.921579
0.878551
0.928177 I like the use of |
@StefanKarpinski I'm a statistician so I always want to assess the variability in the results. |
I immediately thought about doing 100 of these and applying a t-test :-) |
I am merging this in, but will commit a change to retain sdd for now. |
Various changes to svd, eig, norm and rank
Sorry for not following up on the pull request and thank you for merging. From reading your discussions on the topic from May (I think) I concluded that the gesdd was preferred, but that the code change was never done. I do not have strong opinions about which one to choose. On my Mac gesdd is faster, however insignificantly for n=50. But I think that there should be only one svd/svdvals command in Base eventually. Two is needless and potentially confusing. |
No worries - I had some spare time, and got around to this. The LAPACK team clearly decided that the new algorithm is not a drop-in replacement for dgesvd, and hence created a new routine. I feel that this change in behaviour may cause some surprises to people (probably a small set). The right thing to do would be to fold it into svd, when we have named parameters, so that svd could call dgesdd with the correct options. -viral On 29-Sep-2012, at 10:29 PM, Andreas Noack Jensen wrote:
|
Which routine does Matlab use? Do we know? Using the same one probably makes sense. |
I believe Matlab uses gesvd, but can't be sure. Generally, gesdd is new enough, that most existing environments are likely using gesvd in order to not spring surprises on their users. That is not a reason for us to the same too, but I feel that we need to have a better understanding and some experience with it before making gesdd the default. -viral On 30-Sep-2012, at 3:01 AM, Stefan Karpinski wrote:
|
It is worth reiterating the discussion here: https://groups.google.com/forum/#!topic/julia-dev/mmgO65i6-fA Basically, the recommendation by Jack Poulson is that gesdd can take much more memory, but if memory is available, for large matrices, gesdd is significantly faster. -viral On 30-Sep-2012, at 3:01 AM, Stefan Karpinski wrote:
|
My sunday have started out with reading og Lapack and Matlab documentation. Matlab changed their documentation between 10a an b. Before 10b it was explicitly stated that they used gesvd, but from 10b nothing is written about the algorithm. In dgesdd.f you can read the following: "If singular vectors are desired, it uses a divide-and-conquer algorithm." Hence I wondered if the big difference between gesvd and gesdd was in computation of the vectors and I did some timing srand(117) julia> tst=randn(1000,1000); julia> textsvd=[@Elapsed svd(tst) for i = 1:10] julia> textsdd=[@Elapsed sdd(tst) for i = 1:10] I guess a t-test is not needed to determine the faster one. I exported the matrix and did the computation in Matlab 2012b and it took under two seconds. |
Yes, I confirm this. @dmbates What is your take on this discussion? It seems that we could use |
I also looked at the original paper that Jack Poulson posted to the list: http://www.netlib.org/lapack/lawnspdf/lawn88.pdf The divide and conquer algorithm is not only faster, but also more accurate, at the cost of more memory. |
We should probably default to the faster and more accurate one and have the memory-efficient one as an alternative. |
Agreed. 1GB would store a 10k-by10k Float64 matrix: on a beefy and fast machine (32 core Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz) that's 215 seconds in Matlab (which uses multiple threads to compute svd). So for problems that finish in a short amount of time, the memory requirements of SVD are not limiting by today's standards (where tens of GB are getting common). |
Yes, I agree with this assessment too. What would be a good way to fallback to the older gesvd with our current interface? -viral On 30-Sep-2012, at 10:40 PM, Tim Holy wrote:
|
We really need proper keyword arguments :-\ |
What about a separate function called |
For now that's what we should do, but the need for keyword args is getting kind of urgent... |
I'm finally using the options framework in real code (optimization routines tend to use a lot of parameters). It works well, and in some ways is even more powerful than keyword args (passing parameters down to nested functions is rather easier with options than with keyword arguments), but it has negatives, too (like run-time overhead, syntax a bit less pretty). |
While I've also started using the Options framework more often and come to like it, I do still think we need keyword arguments. One crazy request is that we consider also having keyword constructors for composite types. -- John On Sep 30, 2012, at 5:26 PM, Tim Holy wrote:
|
Somehow that got swallowed in the markdown formatting. I don't think that's necessarily a crazy request at all. |
Oh, agreed that keyword arguments would be convenient in many circumstances, assuming they can be implemented without unfortunate side effects. |
gesvd is not a good name, because it is too close to gsvd, and it also sounds like the generalized singular value decomposition. Perhaps the best thing is to just add an argument to svd for now, and make it a keyword argument, when we have the feature. -viral On 01-Oct-2012, at 2:48 AM, Tim Holy wrote:
|
Thanks for checking the timing of svd versus sdd including the calculation of the singular vectors, @andreasnoackjensen I understood that there would not be a difference but apparently, like Rick in "Casablanca", I was misinformed. I think the simplest approach, given these results and the apparent shift of Matlab to using |
Ok, I will change |
Thanks @andreasnoackjensen for pursuing this argument all the way! |
I have changed the functions svd, eig, norm and rank to handle zero dimensional matrices similarly to Matlab. Changed svd to use gesdd added thin svd method. Added eigvals to compute eigenvalues without vectors.
Cc:@dmbates