Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add extra exceptional shift to solve rare convergence issues #477

Conversation

thijssteel
Copy link
Collaborator

closes #475

This PR adds an extra exceptional shift choice to the complex implementations of the QZ algorithm.
As is tradition, it has been chosen for no particularly good reason.
Also adds some overflow protection to the original exceptional shift for good measure.

@pablosanjose
Copy link

Thanks! Note that in #475 the problem never arose when using single-precision complex numbers, only in the double precision case. Can you comment on how robust you expect the fix to be? Does precision suffer? Should we run more extensive tests to see if the extra shift strategy is generally sound?

@codecov
Copy link

codecov bot commented Jan 26, 2021

Codecov Report

Merging #477 (6bb0869) into master (2b9f0f5) will decrease coverage by 0.00%.
The diff coverage is 94.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #477      +/-   ##
==========================================
- Coverage   83.23%   83.23%   -0.01%     
==========================================
  Files        1820     1820              
  Lines      170847   170871      +24     
==========================================
+ Hits       142199   142216      +17     
- Misses      28648    28655       +7     
Impacted Files Coverage Δ
SRC/chgeqz.f 92.80% <94.73%> (-0.05%) ⬇️
SRC/zhgeqz.f 90.75% <94.73%> (-0.68%) ⬇️
SRC/zggev3.f 92.50% <0.00%> (-1.88%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2b9f0f5...6bb0869. Read the comment docs.

@thijssteel
Copy link
Collaborator Author

Precision should not suffer at all. This is only the shift strategy which does not introduce any roundoff.
If anything, faster convergence will increase the precision because less iterations introduce less roundoff.

Even if the problem never arose in single precision, i still think we should keep the implementations as close as possible.

I expect this solution to be fairly robust, but more extensive tests never hurt. You mentioned that you run into #475 a lot, so it would be great to test this PR against some more pencils to be sure.

@thijssteel
Copy link
Collaborator Author

Ok, so now the coverage is complaining because the convergence never fails and the exception handling is never run ...

I vote we just ignore it. I can't find any failing pencils.

@pablosanjose
Copy link

I'd like to find a way to try this PR within my Julia code, to try finding more failures, but I'm not sure it is possible until this trickles down to OpenBLAS and then to Julia... Thing is, the package that produces these matrices is Julia-only.

(I guess one thing I could do is try to collect a bunch of failing examples with Julia and use your script in #475 to see if they all converge using this PR.)

@thijssteel
Copy link
Collaborator Author

It's possible to compile julia yourself and link with a preinstalled BLAS/LAPACK library instead of OpenBLAS. See this variable

but this is not an officially supported feature => YMMV.

@langou
Copy link
Contributor

langou commented Jan 26, 2021

Hi all,

Thjis: terrific work, oh my. This is cool.

Here are two webpages with some history about this:
http://www.netlib.org/lapack/Errata/old-vrac/lapack_known_issues.html
http://www.netlib.org/lapack/bug_list.html#_strong_span_class_green_bug0102_span_strong_convergence_problem_with_some_particular_pencils_in_xggev
The webpages are quite outdated.

Julien.

@langou
Copy link
Contributor

langou commented Jan 26, 2021

Ok, so now the coverage is complaining because the convergence never fails and the exception handling is never run ... I vote we just ignore it. I can't find any failing pencils. => I find this hilarious. I agree with ignoring the fact that the coverage is complaining because the exception handling is never run. Indeed. Hilarious though.

@pablosanjose
Copy link

pablosanjose commented Jan 27, 2021

The USE_SYSTEM_LAPACK didn't work for me after a few tries. I'm now wondering whether the best way for me to test this would be to compile LAPACK into a dynamic library, which I can then call from Julia using ccall (perhaps?). I'm a total noob with this kind of thing, could somebody orient me on how to build lapack into a .dylib?
EDIT: I'm on macos, by the way

@thijssteel
Copy link
Collaborator Author

@langou thanks for the interesting history. It makes me doubt my exceptional shift strategy though. I'll have some more fun trying to think up better alternatives.

@pablosanjose i'm afraid that i am but a lowly mathematician. The complexities of dynamic linking are beyond me and it seems LAPACK doesn't support it out of the box. I think your best bet is to clone OpenBLAS and apply my patch in the lapack-netlib folder. Then you can compile to a .dylib. You might have to disable multithreading though or you'll also need pthread.

@pablosanjose
Copy link

Thanks @thijssteel, that's just what I'm trying to do now!

@pablosanjose
Copy link

pablosanjose commented Jan 30, 2021

@MigMuc noted in OpenMathLib/OpenBLAS#3091 that in zhgeqz.f, there may be a typo: LSAME in line 329 should read LLSAME. Please check!

@martin-frbg
Copy link
Collaborator

btw. gfortran also warns about unused variabes "RTDISC" and "T1" and thinks ILZ,ILQ and ILSCHR may be uninitialised in chgeqz.f, if (and only if) you want to use the opportunity for a small cleanup.

@weslleyspereira
Copy link
Collaborator

weslleyspereira commented Feb 3, 2021

@thijssteel, thanks for this bug fix! :)

Although the exceptional shift is chosen for no particularly good reason, could it be better to use

               ESHIFT = ESHIFT + ABI12

instead of

               ESHIFT = ESHIFT + ( ASCALE*H( ILAST,
     $            ILAST ) )/( BSCALE*T( ILAST, ILAST ) )

in SRC/zhgeqz.f ?

@thijssteel
Copy link
Collaborator Author

Not a bad idea, do note however that while this PR does add exceptional shifts, the main problem was the calculation of the shifts. For example, none of the tests ever even seem to use the second exceptional shift if you look at the test coverage. I think it's fine as is.

@langou
Copy link
Contributor

langou commented Feb 9, 2021

Let us merge this in. Thanks @thijssteel! You're the best!

@langou langou merged commit e56b31e into Reference-LAPACK:master Feb 9, 2021
@julielangou julielangou added this to the LAPACK 3.9.1 milestone Mar 25, 2021
christoph-conrads pushed a commit to christoph-conrads/lapack that referenced this pull request May 23, 2021
…value-convergence-failure

add extra exceptional shift to solve rare convergence issues
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convergence errors in ggev and gges with complex double element types
6 participants