-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regression bug in Base.tuple_type_head (1.6.0-RC1 vs 1.5.3) #39988
Comments
Tracking down the errorCrash happens calling method set2fields(bs), and also on several other statements (commented out in benchmark.js with an annotation that it crashes). I extracted the statements which are executed in set2fields(bs) and run them directly - no crash. Is included in the current benchmark.jl version. I used the VScode debugger and found an annoying exception. Steps to reproduce:
I do not understand that error message, To me, it looks like a syntactically correct call with a type as parameter. |
Some words on my usage of tuple_type_headI was looking for a way to persuade julia compiler to do (better) constant propagation. I want my method _fielddescr to "compile away", it is a pure function (as long as the functions it calls are also pure and not redefined), and has only type parameters. My expectation was the compiler will replace its body by a constant expression. Pure is not Base.@pure - I tried it, the discussion on it lead to a PR currently in work. See here My attempt to use tuple_type_head is based on an idea of tim.holy recently published here:
First attempt using julia 1.5.3 did not succeed, but with julia 1.6.0RC1, it worked as expected (until the crash). The alternative @generated turned out to be invalid, due to a "new method not available in old world" problem. Code is still there (function @generated function __fielddescr) and even faster, but fails if new types are introduced and referenced in __fielddescr. To me, it was really surprising that the julia compiler does better with a recursive formulation than an iteration-based one - being an old school programmer who learned ages ago "for performance, try to replace recursion by iteration". Congratulations to the julia core team for its brilliant work! tuple_type_head is marked deprecated now. Any idea what I could try to replace tuple_type_head and tuple_type_tail in my context? |
Thanks for the detailed report. I can reproduce this, but it's not immediately obvious what's going on. Could you try to make a self-contained minimal reproducer? That often helps accelerate the process while I continue to try to dig into it. |
minimal reproductionI tried to isolate the crash. please clone https://github.com/rryi/BitStructs.jl, commit 82d8a0a The most simplified version to reproduce the crash is in crashminimal.jl results of attempts to further reduce crash producerI tried to simplify even more, but all steps resulted in code which did not crash any more. I think these steps are of interest in your analysis, because they give hints on the critical construct in the code. I documented them in crash2.jl Removing one of the two expressions in set2fields4 also removes the crash. See set2fields5 and set2fields5. My next try was so substitute the syntactical sugar formulation
The code producing it is a call of set2fields8:
My analysis on crash2.jl errorThe expression My expectation is that typeof(bs) is reported in stack trace, I double checked with the println statement inside set2fields8. To check that, I put A vague guess: type inference problem in combination with syntactical transformation from |
|
You are right, it was a typo. I will continue with drilldown. |
I have corrected the typo in set2fields8, and I could replace the 2nd assignment by manually inlined code. plz execute ´julia.exe test\crash.jl´ to reproduce the crash. Have a look at test\nocrash.jl as a documentation of experiments to narrow down error preconditions.
is the best I could do to isolate the bug. Removing the 1st line in it removes the crash. Replacing the 2nd line with manually inlined code of the call of BitStructs.set also removes the crash. This is coded in file test\nocrash.jl and verified by a run of it.. Crash log:
Hope that helps. |
Thank you, this should be very helpful. |
I had another debugger session. After executing nocrash.jl, I entered in VScode REPL ´julia> @Enter set2fields8(bs)´. I stepped in set2fields8, tried to step over the first line in the method, and got tuple_type_head is called with a tuple type, but it is not a subtype of NTuple{N,DataType}, it is a subtype of NTuple{N,Symbol}. Is there a change in Release 1.6.0 which requires a stricter parameter definition for tuple_type_head, which is not reflected in the function signature? tuple_type_head is not exported and not documented in the API doc, so changing its behavior is formally a nonbreaking change in a minor release. Function definition of tuple_type_head has this signature, in 1.6.0 as well as in 1.5.3:
The function name suggests more narrow:
This is what I had in mind, and is fulfilled in all my calls. And in 1.5.3, it is required for calling fieldtype:
In 1.6.0, implementation is relaxed a lot:
The error message indicates it is required something like that:
This signature is violated by my call, and I suppose there are more such cases. At least one is documented here, it was my inspiration and template. My impression from the formulation there was using the tuple_type_* functions is a common practice in julia itself and in external packages. Here the debugger view just before the error is shown: Next, very important question: why is the TypeError exception shown only in the debugger and not propagated upwards in the chain of stack frames of active calls? I have no catch anywhere in my code. An exception thrown from any code I call, should stop the program and show up in REPL/console. There is nothing. Is it an indication that runtime stack is already corrupt? That would explain the crash. Last question: why does my program work (as far as I can see) correctly in so many calls of tuple_type_head with the signature ´tuple_type_head(t::T) where T <: NTuple{N,Symbol} where N´? |
Symbols as type parameters of Tuple are a bit of a strange case --- the parameters of I'm not sure where the TypeError is coming from; possibly the debugger itself. Will look into it. |
I think it is a missing case for Tuple/NamedTuple from this recent PR:
|
Maybe this is the root of the problem. Looks like I did abuse tuple type mechanics. Tuple{:a,:b,:c} is meant "pure meta", nothing to be instantiated. It-s just a vehicle to lift a tuple of symbols onto type parameter level, so that it is used for dispatch. It-s not my idea, I just adopted it without much thinking about it.
fieldtype respective tuple_type_head extract a parameter of a tuple type. They don't care about the type of the parameter, as long as it is isbits. Some more examples:
I assume it is an undocumented feature. It just works ... well, most of the time, if the crash is related to it. |
Hmm ... very good point. Using the debugger is my standard approach to inspect what happens nearby a unexpected error, to get close to the exact location of a strange interruption of normal program flow. The crash falls into that category - julia stack trace ends with my method, which looks pretty harmless at first glance. Within the debugger, I found that TypeError, It suggested to me it might be the cause of problems. Using the debugger might be completely inadequate in this case. I was stepping into code which I expect and intend that it is never executed at runtime in compiled code. Have a look at this:
The whole construction is an attempt to persuade the compiler to completely compile it away, doing recursive constant propagation. It-s a constant expression. The call inside is designed as a pure function. In a former version I even tried to tell it the compiler the hard way, using Base.@pure. That was put in question if it is correct use, see discussion here which lead to PR 39954.. In julia 1.6.0, compiler agrees and does constant propagation. Proof in REPL after running nocrash.jl:
My impression from benchmarking BitStructs is, that constant propagation works in simple call scenarios, but not in more complex ones. This was under investigation, but literally interrupted by the crash. Crash caused by too much stress for the compiler with constant propagation, producing corrupted code? |
I tried after running nocrash.jl
listing (120 lines) ends with this:
Cited from wikipedia: ´UD2 Generates an invalid opcode exception. This instruction is provided for software testing to explicitly generate an invalid opcode. The opcode for this instruction is reserved for this purpose.´ A quick scan shows: all methods set2field* marked as crashing generate code ending with ud2, all marked as not crashing do not have ud2 in their @code_native output. A quick web check verifies ud2 is generated by LLVM in special situations. Maybe this not so old post is interesting. cited from it:
@code_llvm set2fields8(bs) ends with
I assume 'call void @llvm.trap()' is compiled to UD2 in machine code, Does julia compiler do such things as "last resort"? |
Yes, it is optimized assuming that the return value is Union{}, and thus is not prepared to handle a Symbol. See the previous PR fix and issue report |
fix unsoundness in fieldtype of Tuple with non-Type params (fixes #39988)
I am working on package BitStructs and encounter a strange error running my benchmarks with julia 1.6.0-RC1.
Environment:
System Ryzen 1700, 16 GB RAM, Windows 10 Pro, version 20H2
Julia 64 bit 1.6.0-rc1 (2021-02-06)
Package https://github.com/rryi/BitStructs.jl, commit 50ea2e46b77d098ba61ab65d15e31aa289848e24
Steps to reproduce:
My log for that, showing the crash under 1.6.0RC1
here the run with julia 1.5.3, no crash:
The text was updated successfully, but these errors were encountered: