-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash running the AA test suite #1002
Comments
AA is pure Julia, so it should not segfault. I'd say it's pretty likely a Julia bug, but I don't have any insight into it, other than it looks to be segfaulting in the actual test code itself. One possible cause that could be our fault would be some type piracy. I think the compiler itself uses some Base arithmetic. So if we overloaded that by accident, it could break the compiler. I don't know of any recent changes that could cause this. I see nothing wrong with the test itself, and it works on other Julia versions. The most likely file for type piracy would be this one: https://github.com/Nemocas/AbstractAlgebra.jl/blob/master/src/julia/Integer.jl But the only PR in the last few days just exported isprobable_prime, which is not likely to be harmful. This is the most likely commit to have triggered the issue you are seeing (if it is our fault), though not the one I am seeing: Perhaps we could try reverting that commit and seeing whether anything changes. I'm honestly not sure what those two lists are used for, or more specifically why there are now two such lists. Probably they are used by Hecke and the like and may be unrelated to the problem you are seeing. |
Tests pass locally for me on Ubuntu 20.04 on WSL at commit 5cc3995 with Julia-1.6.1. I will now update to the latest master and see if I can reproduce the issue locally. |
Tests pass locally for me on Ubuntu 20.04 with WSL and latest master. So I am not able to replicate the issue with Julia-1.6.1. |
I don't think the issue is type piracy on our behalf. I've tried all the likely offenders:
In my opinion this is a Julia bug to do with platform specific LLVM issues. |
Ok, I can reproduce the crash with Julia-1.6.2 on Ubuntu 20.04 with WSL @master. |
I have now reduced the crash to the following MWE: using AbstractAlgebra
R, x = RationalFunctionField(QQ, "x")
x * denominator(x) |
Also, it works for me in 1.6.1, and in 1.7.0-beta3 |
Wow, that was quick! I reported the issue to the Julia people: I think technically I am supposed to run a debug version of Julia before doing so, but historically it has taken days to report issues, by which time bugs end up baked in, so I'm reporting early in the hopes of shortcutting the process somewhat. E.g. someone over there might immediately be able to spot the issue already from the backtrace. I will get to trying to reduce the example to a minimal Julia issue, without the reliance on using AbstractAlgebra. |
Actually, before I do that, I am going to go back four days in commits on our side and verify the issue was still there before Julia-1.6.2 (very likely it was, but can't hurt to check, to rule out that we did this in the meantime). I'm actually suspicious when I see denominator in the MWE, as that is a function we have our own definition of internally, so type piracy is still a possibility, in some bizarre way. |
I already checked that it is present in the last release, v0.21.0 |
Ah right, that was ages ago. Yeah, so I'll get to trying to reduce the MWE. |
I haven't managed to reduce the issue any further just yet, but can pinpoint the issue to a specific line of code, i.e. line 526 of That line of code looks correct. Various permutations of that line such as replacing it with an "if" statement and doing a println instead of error or using !== instead of != still results in the same issue. |
So far all of the following seems necessary to exhibit the bug:
One can then trigger it with
Of course one also needs the promotion stuff in Rings.jl and fraction fields over polynomial rings over Rational{BigInt}, so I'm still a long way from a minimal example. |
I have some local setup to reduce it automatically to a minimal example. It may take a while, but should be less work for you. Shall I give it a try? |
@thofma cool, how does that work? |
It uses creduce. |
@thofma Please do. I cannot get it to occur outside of AbstractAlgebra currently. |
Yeah I give up on trying to reproduce it outside AbstractAlgebra. The following doesn't crash, which means that it actually needs quite a bit of Poly and/or Frac to reproduce.
|
@thofma No pressure, just wondering if you know roughly how long this usually takes? Do you have access to the machine in question? |
I have started it, but it could take some time. |
Ok no problem. |
@thofma Did the creduce setup reach a conclusion yet? It seems to take a long time. Do you now how long it typically takes? |
No, not finished yet. Might take another week or so. |
Ok, then I might try to reduce it further by hand. |
Well I tried to start from AbstractAlgebra and remove everything that should be irrelevant to the example so that we can start from a much more minimal example. But now it doesn't crash. That is quite remarkable. This means the minimum working example is going to be quite large. https://github.com/wbhart/AbstractAlgebra.jl/tree/rat_crash |
There's a much smaller branch that DOES crash here: https://github.com/wbhart/AbstractAlgebra.jl/tree/rat_crash2 Interestingly if I don't import promote_rule into Generic the problem disappears. But we supposedly have our own promote_rule, so I don't know how this can be. It could certainly break the compiler if we were committing type piracy. |
I've cleaned it up a lot by hand. It's quite small now, so perhaps creduce can get through this much quicker. What do you think @thofma ? |
Yeah, probably a good idea. I will restart it. |
creduce was doing weird things, but it seems you got it running yourself? |
I did. It's a bit hard to set it up, and I certainly got it wrong the first time. |
Did it also give you garbage at the end with missing new lines? |
Not complete garbage, but yeah it had removed some newlines. The julia compiler doesn't care about those. I just added them back in. |
Yeah, by garbage I mean that |
No, I didn't have that problem. |
This was fixed in Julia 1.6.3 |
I cannot complete running the AA test suite on three machines with Julia 1.6.2. This is with the current master branch (952dde0 as I write this).
Perhaps this is a bug in the Julia codegen?
First machine is my MacBook Pro:
and I get this:
Second machine is an Ubuntu 20.04.2 server
and I get this:
Third machine is nenekiki in Kaiserslautern
and this gives about the same error as the Ubuntu machine.
The text was updated successfully, but these errors were encountered: