Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash running the AA test suite #1002

Closed
fingolfin opened this issue Aug 18, 2021 · 35 comments
Closed

Crash running the AA test suite #1002

fingolfin opened this issue Aug 18, 2021 · 35 comments

Comments

@fingolfin
Copy link
Member

I cannot complete running the AA test suite on three machines with Julia 1.6.2. This is with the current master branch (952dde0 as I write this).

Perhaps this is a bug in the Julia codegen?

First machine is my MacBook Pro:

julia> versioninfo()
Julia Version 1.6.2
Commit 1b93d53fc4 (2021-07-14 15:36 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)

and I get this:

...
Test Summary:                           | Pass  Total
Generic.RationalFunctionField.unary_ops |    1      1
Test Summary:                            | Pass  Total
Generic.RationalFunctionField.binary_ops |  400    400

signal (11): Segmentation fault: 11
in expression starting at /Users/mhorn/Projekte/OSCAR/AbstractAlgebra.jl/test/generic/RationalFunctionField-test.jl:97
_ZN4llvm11Instruction15eraseFromParentEv at /Users/mhorn/Applications/Julia-1.6.2.app/Contents/Resources/julia/lib/julia/libLLVM.dylib (unknown line)
_ZN12_GLOBAL__N_18AllocOpt13runOnFunctionERN4llvm8FunctionE at /Users/mhorn/Applications/Julia-1.6.2.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.dylib (unknown line)
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /Users/mhorn/Applications/Julia-1.6.2.app/Contents/Resources/julia/lib/julia/libLLVM.dylib (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /Users/mhorn/Applications/Julia-1.6.2.app/Contents/Resources/julia/lib/julia/libLLVM.dylib (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /Users/mhorn/Applications/Julia-1.6.2.app/Contents/Resources/julia/lib/julia/libLLVM.dylib (unknown line)
...
_start at ./client.jl:485
jfptr__start_37496.clone_1 at /Users/mhorn/Applications/Julia-1.6.2.app/Contents/Resources/julia/lib/julia/sys.dylib (unknown line)
jl_apply_generic at /Users/mhorn/Applications/Julia-1.6.2.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.dylib (unknown line)
true_main at /Users/mhorn/Applications/Julia-1.6.2.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.dylib (unknown line)
repl_entrypoint at /Users/mhorn/Applications/Julia-1.6.2.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.dylib (unknown line)
Allocations: 988796209 (Pool: 988559243; Big: 236966); GC: 1027
ERROR: Package AbstractAlgebra errored during testing (received signal: 11)

Second machine is an Ubuntu 20.04.2 server

julia> versioninfo()
Julia Version 1.6.2
Commit 1b93d53fc4 (2021-07-14 15:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-3820 CPU @ 3.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, sandybridge)

and I get this:

...
Test Summary:                           | Pass  Total
Generic.RationalFunctionField.unary_ops |    1      1
Test Summary:                            | Pass  Total
Generic.RationalFunctionField.binary_ops |  400    400

signal (11): Segmentation fault
in expression starting at /home/mhorn/Projekte/OSCAR/AbstractAlgebra.jl/test/generic/RationalFunctionField-test.jl:97
_ZN4llvm11Instruction15eraseFromParentEv at /home/mhorn/Projekte/Julia/julia-1.6.2/bin/../lib/julia/libLLVM-11jl.so (unknown line)
finalize at /buildworker/worker/package_linux64/build/src/llvm-alloc-opt.cpp:380 [inlined]
runOnFunction at /buildworker/worker/package_linux64/build/src/llvm-alloc-opt.cpp:1518
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /home/mhorn/Projekte/Julia/julia-1.6.2/bin/../lib/julia/libLLVM-11jl.so (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /home/mhorn/Projekte/Julia/julia-1.6.2/bin/../lib/julia/libLLVM-11jl.so (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /home/mhorn/Projekte/Julia/julia-1.6.2/bin/../lib/julia/libLLVM-11jl.so (unknown line)
...
_start at ./client.jl:485
jfptr__start_34281.clone_1 at /home/mhorn/Projekte/Julia/julia-1.6.2/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
true_main at /buildworker/worker/package_linux64/build/src/jlapi.c:560
repl_entrypoint at /buildworker/worker/package_linux64/build/src/jlapi.c:702
main at /buildworker/worker/package_linux64/build/cli/loader_exe.c:51
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at /home/mhorn/Projekte/Julia/julia-1.6.2/bin/julia (unknown line)
Allocations: 926700153 (Pool: 926473117; Big: 227036); GC: 955
ERROR: Package AbstractAlgebra errored during testing (received signal: 11)

Third machine is nenekiki in Kaiserslautern

julia> versioninfo()
Julia Version 1.6.2
Commit 1b93d53fc4 (2021-07-14 15:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2643 v3 @ 3.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, haswell)

and this gives about the same error as the Ubuntu machine.

@wbhart
Copy link
Contributor

wbhart commented Aug 18, 2021

AA is pure Julia, so it should not segfault. I'd say it's pretty likely a Julia bug, but I don't have any insight into it, other than it looks to be segfaulting in the actual test code itself.

One possible cause that could be our fault would be some type piracy. I think the compiler itself uses some Base arithmetic. So if we overloaded that by accident, it could break the compiler. I don't know of any recent changes that could cause this.

I see nothing wrong with the test itself, and it works on other Julia versions.

The most likely file for type piracy would be this one:

https://github.com/Nemocas/AbstractAlgebra.jl/blob/master/src/julia/Integer.jl

But the only PR in the last few days just exported isprobable_prime, which is not likely to be harmful.

This is the most likely commit to have triggered the issue you are seeing (if it is our fault), though not the one I am seeing:

1dff2c2

Perhaps we could try reverting that commit and seeing whether anything changes. I'm honestly not sure what those two lists are used for, or more specifically why there are now two such lists. Probably they are used by Hecke and the like and may be unrelated to the problem you are seeing.

@wbhart
Copy link
Contributor

wbhart commented Aug 18, 2021

Tests pass locally for me on Ubuntu 20.04 on WSL at commit 5cc3995 with Julia-1.6.1.

I will now update to the latest master and see if I can reproduce the issue locally.

@wbhart
Copy link
Contributor

wbhart commented Aug 18, 2021

Tests pass locally for me on Ubuntu 20.04 with WSL and latest master. So I am not able to replicate the issue with Julia-1.6.1.

@wbhart
Copy link
Contributor

wbhart commented Aug 18, 2021

I don't think the issue is type piracy on our behalf. I've tried all the likely offenders:

julia> using AbstractAlgebra

julia> @which div(1, 2)
div(x::T, y::T) where T<:Union{Int16, Int32, Int64, Int8} in Base at int.jl:261

julia> @which divrem(1, 2)
divrem(x, y) in Base at div.jl:120

julia> @which inv(1)
inv(x::Integer) in Base at int.jl:90

julia> @which sqrt(1)
sqrt(x::Real) in Base.Math at math.jl:608

julia> @which log(1)
log(x::Real) in Base.Math at math.jl:404

julia> @which numerator(1)
numerator(x::Integer) in Base at rational.jl:233

julia> @which denominator(1)
denominator(x::Integer) in Base at rational.jl:250

julia> @which exp(1)
exp(x::Real) in Base.Math at special/exp.jl:201

julia> a = BigInt(1)
1

julia> b = BigInt(2)
2

julia> @which inv(a)
inv(x::Integer) in Base at int.jl:90

julia> @which log(a)
log(x::Real) in Base.Math at math.jl:404

julia> @which exp(a)
exp(x::Real) in Base.Math at special/exp.jl:201

julia> @which sqrt(a)
sqrt(x::BigInt) in Base.MPFR at mpfr.jl:574

julia> @which div(a, b)
div(x::BigInt, y::BigInt) in Base.GMP at gmp.jl:490

julia> @which divrem(a, b)
divrem(x::BigInt, y::BigInt) in Base.GMP at gmp.jl:566

julia> @which numerator(1//2)
numerator(x::Rational) in Base at rational.jl:234

julia> @which denominator(1//2)
denominator(x::Rational) in Base at rational.jl:251

julia> @which denominator(BigInt(1)//2)
denominator(x::Rational) in Base at rational.jl:251

julia> @which numerator(BigInt(1)//2)
numerator(x::Rational) in Base at rational.jl:234

In my opinion this is a Julia bug to do with platform specific LLVM issues.

@wbhart
Copy link
Contributor

wbhart commented Aug 18, 2021

Ok, I can reproduce the crash with Julia-1.6.2 on Ubuntu 20.04 with WSL @master.

@fingolfin
Copy link
Member Author

I have now reduced the crash to the following MWE:

using AbstractAlgebra
R, x = RationalFunctionField(QQ, "x")
x * denominator(x)

@fingolfin
Copy link
Member Author

Also, it works for me in 1.6.1, and in 1.7.0-beta3

@wbhart
Copy link
Contributor

wbhart commented Aug 18, 2021

Wow, that was quick!

I reported the issue to the Julia people:

JuliaLang/julia#41916

I think technically I am supposed to run a debug version of Julia before doing so, but historically it has taken days to report issues, by which time bugs end up baked in, so I'm reporting early in the hopes of shortcutting the process somewhat. E.g. someone over there might immediately be able to spot the issue already from the backtrace.

I will get to trying to reduce the example to a minimal Julia issue, without the reliance on using AbstractAlgebra.

@wbhart
Copy link
Contributor

wbhart commented Aug 18, 2021

Actually, before I do that, I am going to go back four days in commits on our side and verify the issue was still there before Julia-1.6.2 (very likely it was, but can't hurt to check, to rule out that we did this in the meantime).

I'm actually suspicious when I see denominator in the MWE, as that is a function we have our own definition of internally, so type piracy is still a possibility, in some bizarre way.

@fingolfin
Copy link
Member Author

I already checked that it is present in the last release, v0.21.0

@wbhart
Copy link
Contributor

wbhart commented Aug 18, 2021

Ah right, that was ages ago. Yeah, so I'll get to trying to reduce the MWE.

@wbhart
Copy link
Contributor

wbhart commented Aug 18, 2021

I haven't managed to reduce the issue any further just yet, but can pinpoint the issue to a specific line of code, i.e. line 526 of
https://github.com/Nemocas/AbstractAlgebra.jl/blob/master/src/generic/RationalFunctionField.jl

That line of code looks correct.

Various permutations of that line such as replacing it with an "if" statement and doing a println instead of error or using !== instead of != still results in the same issue.

@wbhart
Copy link
Contributor

wbhart commented Aug 18, 2021

So far all of the following seems necessary to exhibit the bug:

struct RationalFunctionField2{T <: FieldElement} <: AbstractAlgebra.Field
   S::Symbol
   fraction_field::FracField{<:PolyElem{T}}
end

mutable struct Rat2{T <: FieldElement} <: AbstractAlgebra.FieldElem
   d::Frac{<:PolyElem{T}}
   parent::RationalFunctionField2{T}

   Rat2{T}(f::Frac{<:PolyElem{T}}) where T <: FieldElement = new{T}(f)
end

function promote_rule(::Type{Rat2{T}}, ::Type{U}) where {T <: FieldElement, U <: RingElem}
   promote_rule(Frac{dense_poly_type(T)}, U) === Frac{dense_poly_type(T)} ? Rat2{T} : Union{}
end

function fraction_field(a::RationalFunctionField2{T}) where T <: FieldElement
   return a.fraction_field::FracField{dense_poly_type(T)}
end

function *(a::Rat2{T}, b::Rat2{T}) where T <: FieldElement
   return data(a) * data(b)
end

data(x::Rat2{T}) where T <: FieldElement = x.d::Frac{dense_poly_type(T)}

parent(a::Rat2) = a.parent

function (a::RationalFunctionField2{T})(b::Frac{<:PolyElem{T}}) where T <: FieldElement
   K = fraction_field(a)
   parent(b) != K && error("Unable to coerce rational function")
   Rat2{T}(b)
end

function (a::RationalFunctionField2)(b::RingElem)
   return a(fraction_field(a)(b))
end

One can then trigger it with

julia> using AbstractAlgebra

julia> R, x = PolynomialRing(QQ, "x")
(Univariate Polynomial Ring in x over Rationals, x)

julia> a = x//1
x

julia> b = R(1)
1

julia> y = Generic.Rat2{Rational{BigInt}}(a)
AbstractAlgebra.Generic.Rat2{Rational{BigInt}}(x, #undef)

julia> y*b

Of course one also needs the promotion stuff in Rings.jl and fraction fields over polynomial rings over Rational{BigInt}, so I'm still a long way from a minimal example.

@thofma
Copy link
Member

thofma commented Aug 18, 2021

I have some local setup to reduce it automatically to a minimal example. It may take a while, but should be less work for you. Shall I give it a try?

@fingolfin
Copy link
Member Author

@thofma cool, how does that work?

@thofma
Copy link
Member

thofma commented Aug 18, 2021

It uses creduce.

@wbhart
Copy link
Contributor

wbhart commented Aug 18, 2021

@thofma Please do. I cannot get it to occur outside of AbstractAlgebra currently.

@wbhart
Copy link
Contributor

wbhart commented Aug 18, 2021

Yeah I give up on trying to reproduce it outside AbstractAlgebra.

The following doesn't crash, which means that it actually needs quite a bit of Poly and/or Frac to reproduce.

module Crash

abstract type RingElem end

mutable struct FracField{T}
end

mutable struct Poly{T} <: RingElem
p::Vector{T}
end

mutable struct Frac{T} <: RingElem
a::T
b::T
end

dense_poly_type(::Type{T}) where T = Poly{T}

struct RationalFunctionField{T}
   S::Symbol
   fraction_field::FracField{Poly{T}}
end

mutable struct Rat{T} <: RingElem
   d::Frac{<:Poly{T}}
   parent::RationalFunctionField{T}

   Rat{T}(f::Frac{<:Poly{T}}) where T = new{T}(f)
end

promote_rule(::Type{T}, ::Type{T}) where T = T

function promote_rule_sym(::Type{T}, ::Type{S}) where {T, S}
   U = promote_rule(T, S)
   if U !== Union{}
      return U
   else
      UU = promote_rule(S, T)
      return UU
   end
end

@inline function try_promote(x::S, y::T) where {S <: RingElem, T <: RingElem}
   U = promote_rule_sym(S, T)
   if S === U
      return true, x, parent(x)(y)
   elseif T === U
      return true, parent(y)(x), y
   else
      return false, x, y
   end
end

function Base.promote(x::S, y::T) where {S <: RingElem, T <: RingElem}
  fl, u, v = try_promote(x, y)
  if fl
    return u, v
  else
    error("Cannot promote to common type")
  end
end

import Base.*

*(x::RingElem, y::RingElem) = *(promote(x, y)...)

function promote_rule(::Type{Rat{T}}, ::Type{U}) where {T, U}
  Rat{T}
end

function fraction_field(a::RationalFunctionField{T}) where T
   return a.fraction_field::FracField{Poly{T}}
end

function *(a::Rat{T}, b::Rat{T}) where T
   return data(a) * data(b)
end

data(x::Rat{T}) where T = x.d::Frac{dense_poly_type(T)}

parent(a::Rat) = a.parent

function (a::RationalFunctionField{T})(b::Frac{<:Poly{T}}) where T
   K = fraction_field(a)
   parent(b) != K && error("Unable to coerce rational function")
   Rat{T}(b)
end

function (a::RationalFunctionField)(b::RingElem)
   return a(fraction_field(a)(b))
end

end

a = Crash.Frac{Crash.Poly{Rational{BigInt}}}(Crash.Poly{Rational{BigInt}}([BigInt(1)//2]), Crash.Poly{Rational{BigInt}}([BigInt(1)//2]))

b = Crash.Poly{Rational{BigInt}}([BigInt(1)//2])

y = Crash.Rat{Rational{BigInt}}(a)

y*b

@wbhart
Copy link
Contributor

wbhart commented Aug 19, 2021

@thofma No pressure, just wondering if you know roughly how long this usually takes? Do you have access to the machine in question?

@thofma
Copy link
Member

thofma commented Aug 19, 2021

I have started it, but it could take some time.

@wbhart
Copy link
Contributor

wbhart commented Aug 19, 2021

Ok no problem.

@wbhart
Copy link
Contributor

wbhart commented Aug 25, 2021

@thofma Did the creduce setup reach a conclusion yet? It seems to take a long time. Do you now how long it typically takes?

@thofma
Copy link
Member

thofma commented Aug 25, 2021

No, not finished yet. Might take another week or so.

@wbhart
Copy link
Contributor

wbhart commented Aug 25, 2021

Ok, then I might try to reduce it further by hand.

@wbhart
Copy link
Contributor

wbhart commented Aug 25, 2021

Well I tried to start from AbstractAlgebra and remove everything that should be irrelevant to the example so that we can start from a much more minimal example. But now it doesn't crash. That is quite remarkable. This means the minimum working example is going to be quite large.

https://github.com/wbhart/AbstractAlgebra.jl/tree/rat_crash

@wbhart
Copy link
Contributor

wbhart commented Aug 25, 2021

There's a much smaller branch that DOES crash here:

https://github.com/wbhart/AbstractAlgebra.jl/tree/rat_crash2

Interestingly if I don't import promote_rule into Generic the problem disappears. But we supposedly have our own promote_rule, so I don't know how this can be. It could certainly break the compiler if we were committing type piracy.

@wbhart
Copy link
Contributor

wbhart commented Aug 25, 2021

I've cleaned it up a lot by hand. It's quite small now, so perhaps creduce can get through this much quicker. What do you think @thofma ?

@thofma
Copy link
Member

thofma commented Aug 25, 2021

Yeah, probably a good idea. I will restart it.

@thofma
Copy link
Member

thofma commented Aug 27, 2021

creduce was doing weird things, but it seems you got it running yourself?

@wbhart
Copy link
Contributor

wbhart commented Aug 27, 2021

I did. It's a bit hard to set it up, and I certainly got it wrong the first time.

@thofma
Copy link
Member

thofma commented Aug 27, 2021

Did it also give you garbage at the end with missing new lines?

@wbhart
Copy link
Contributor

wbhart commented Aug 27, 2021

Not complete garbage, but yeah it had removed some newlines. The julia compiler doesn't care about those. I just added them back in.

@thofma
Copy link
Member

thofma commented Aug 27, 2021

Yeah, by garbage I mean that julia main.jl did not actually run! But it seems it was not my fault.

@wbhart
Copy link
Contributor

wbhart commented Aug 27, 2021

No, I didn't have that problem.

@fingolfin
Copy link
Member Author

This was fixed in Julia 1.6.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants