-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression in dgeqrf_ QR factorization since 0.3.21 - Output contains NaN values #5006
Comments
The implementation as such did not change, 0.3.21 only introduced an optional C translation of the LAPACK code that can be used when no Fortran compiler is available. If you see it "only" in CI, it suggests that your setup lost its fortran compiler in about the same timeframe and you wouldn't be able to compile 0.3.20 there now. |
Also not reproducible on NeoverseN1 with either NeoverseN1 or ARMV8 target (and NOFORTRAN=1 of course). If anything, this could implicate one of the newer SVE kernels (N1 not having SVE), but 0.3.21 did not have any additions or changes there. |
Ah apologies about that, I missed that crucial peiece of information. This issue appeared on a Neoverse-v1 machine. I will check n1 from my side, but i don't think there is an issue as it came through a pytorch unit test failure on v1. I agree with you, this is suggestive of an SVE related issue. |
Reproduced now on my phone :) I have a hunch that it could be the DNRM2 kernel (which isn't even SVE at the moment, just a different "big server" implementation), will see in a moment |
DNRM2 it is indeed, I had already retired this particular implementation of NRM2 on the Apple M "Vortex" targets earlier. Will create a PR later today. |
This was discovered from a PyTorch unit test failure on aarch64 ( link will be attached )
Code to reproduce the problem is attached
reproducer.zip
See attached code for more details
Under libopenblas/openblas 0.3.20 there is no NaN contained in the tau output, however in 0.3.21+ the output contains NaN values. It appears this is when the implementation was changed from Fortrain to C.
The text was updated successfully, but these errors were encountered: