-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some inferrability and precompile improvements #180
Conversation
@@ -303,7 +303,7 @@ end | |||
|
|||
# adds x ^ (p::Real) | |||
function add_pow!( | |||
ls::LoopSet, var::Symbol, x, p::Real, elementbytes::Int, position::Int | |||
ls::LoopSet, var::Symbol, @nospecialize(x), p::Real, elementbytes::Int, position::Int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will adding this @nospecialize
have an effect on runtime performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. add_pow!
expands literal powers into a product of squares while the macro expands.
That is, it replaces x^5
with x*((x^2)^2)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think not but for questions like these I'd defer to the experts.
@@ -143,6 +143,7 @@ use their `parent`. Triangular loops aren't yet supported. | |||
""" | |||
macro avx(q) | |||
q = macroexpand(__module__, q) | |||
isa(q, Expr) || return q |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is a hint to inference that q
will be an Expr
from that point on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result type of macroexpand
is not inferrable. This just allows the compiler to know that below this line, q::Expr
.
@@ -303,7 +303,7 @@ end | |||
|
|||
# adds x ^ (p::Real) | |||
function add_pow!( | |||
ls::LoopSet, var::Symbol, x, p::Real, elementbytes::Int, position::Int | |||
ls::LoopSet, var::Symbol, @nospecialize(x), p::Real, elementbytes::Int, position::Int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will adding this @nospecialize
have an effect on runtime performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely don't have the answers here, so I'd ask @chriselrod to chime in.
@@ -303,7 +303,7 @@ end | |||
|
|||
# adds x ^ (p::Real) | |||
function add_pow!( | |||
ls::LoopSet, var::Symbol, x, p::Real, elementbytes::Int, position::Int | |||
ls::LoopSet, var::Symbol, @nospecialize(x), p::Real, elementbytes::Int, position::Int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think not but for questions like these I'd defer to the experts.
@@ -143,6 +143,7 @@ use their `parent`. Triangular loops aren't yet supported. | |||
""" | |||
macro avx(q) | |||
q = macroexpand(__module__, q) | |||
isa(q, Expr) || return q |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result type of macroexpand
is not inferrable. This just allows the compiler to know that below this line, q::Expr
.
@@ -335,7 +335,7 @@ function LoopSet(mod::Symbol) | |||
Tuple{Int,Symbol}[], | |||
Tuple{Int,Int}[], | |||
Tuple{Int,Float64}[], | |||
Int[],Int[], | |||
Tuple{Int,NumberType}[],Tuple{Int,Symbol}[], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen this kind of thing in my own code (e.g., JuliaDebug/LoweredCodeUtils.jl#58), and I'm amazed that convert(Vector{Tuple{Int,NumberType}}, ::Vector{Int})
doesn't error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
Being empty saves Julia the need to convert an Int
into a Tuple{Int,Symbol}
, while "converting" the container (sans the contents) is trivial. Would be nice if this were easier to catch.
julia> convert(Vector{Tuple{Int,Symbol}}, Int[])
Tuple{Int64, Symbol}[]
julia> convert(Vector{Tuple{Int,Symbol}}, Int[3, 4])
ERROR: MethodError: Cannot `convert` an object of type Int64 to an object of type Tuple{Int64, Symbol}
Closest candidates are:
convert(::Type{T}, ::T) where T<:Tuple at essentials.jl:317
convert(::Type{T}, ::Tuple{Vararg{Any, N}}) where {N, T<:Tuple} at essentials.jl:318
convert(::Type{T}, ::CartesianIndex) where T<:Tuple at multidimensional.jl:137
...
Stacktrace:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic as always! Ready to merge?
@@ -2,13 +2,13 @@ function maybeaddref!(ls::LoopSet, op) | |||
ref = op.ref | |||
id = findfirst(r -> r == ref, ls.refs_aliasing_syms) | |||
# try to CSE | |||
if isnothing(id) | |||
if id === nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should perhaps replace all isnothing(x)
with x === nothing
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think that's a good idea.
Codecov Report
@@ Coverage Diff @@
## master #180 +/- ##
==========================================
- Coverage 94.67% 93.15% -1.52%
==========================================
Files 32 32
Lines 4541 4615 +74
==========================================
Hits 4299 4299
- Misses 242 316 +74
Continue to review full report at Codecov.
|
For your interest, this was the workflow to find the inference issues: using SnoopCompile, Cthulhu
tinf = @snoopi_deep include("some_workload.jl")
itrigs = inference_triggers(tinf)
itrig = itrigs[1]
ascend(itrig) # inspect and fix if needed (the first item in the chain is the callee that was runtime-dispatched or needed re-inferring, later ones are its callers)
itrig = itrigs[2]
ascend(itrig) # inspect and fix if needed... The main gotcha is that the callers of a freshly-inferred MethodInstance are derived from a backtrace grabbed at the entrance to inference, and you only have a full MethodInstance for non-inlined methods. (Any This won't be very rewarding on LoopVectorization now because it was already quite good and this eliminates most of the remaining "real" triggers; you just get a lot of noise from JuliaLang/julia#38983. But in a package that has not yet received the kind of polish this one has, the results can be very informative. |
If someone hands you a needle and a piece of hay, it's also obvious which is which. Plus, it's not always obvious to those of us less familiar with Julia's internals!
Thanks, I'm going to have to try that. I intend to merge this (and VectorizationBase) once the final test passes, rerun benchmarks, and then issue a new release. |
Hah, I'd completely forgotten I'd added those precompiles! 😆 |
This is the companion to JuliaSIMD/VectorizationBase.jl#29. This PR focuses on inference improvements, as I've found for some packages that this is one of the most effective tools for increasing its precompilability (xref JuliaLang/julia#30488 (comment)). That said, you were already in quite good shape here, so these are minor tweaks with relatively little impact. The new precompile statements make the most difference, and those are something that the old
@snoopi
couldn't have easily discovered since they represent an intermediate stage in the inference tree. (In retrospect they seem pretty obvious, but when profiling it never ceases to amaze me how often things only make sense in retrospect and are completely different from what I expected going in.)