Some inferrability and precompile improvements #180

timholy · 2020-12-24T13:48:40Z

This is the companion to JuliaSIMD/VectorizationBase.jl#29. This PR focuses on inference improvements, as I've found for some packages that this is one of the most effective tools for increasing its precompilability (xref JuliaLang/julia#30488 (comment)). That said, you were already in quite good shape here, so these are minor tweaks with relatively little impact. The new precompile statements make the most difference, and those are something that the old @snoopi couldn't have easily discovered since they represent an intermediate stage in the inference tree. (In retrospect they seem pretty obvious, but when profiling it never ceases to amaze me how often things only make sense in retrospect and are completely different from what I expected going in.)

DilumAluthge · 2020-12-24T13:54:16Z

src/add_compute.jl

@@ -303,7 +303,7 @@ end

 # adds x ^ (p::Real)
 function add_pow!(
-    ls::LoopSet, var::Symbol, x, p::Real, elementbytes::Int, position::Int
+    ls::LoopSet, var::Symbol, @nospecialize(x), p::Real, elementbytes::Int, position::Int


Will adding this @nospecialize have an effect on runtime performance?

No. add_pow! expands literal powers into a product of squares while the macro expands.
That is, it replaces x^5 with x*((x^2)^2).

I think not but for questions like these I'd defer to the experts.

DilumAluthge · 2020-12-24T13:54:43Z

src/constructors.jl

@@ -143,6 +143,7 @@ use their `parent`. Triangular loops aren't yet supported.
 """
 macro avx(q)
    q = macroexpand(__module__, q)
+    isa(q, Expr) || return q


Can you explain this line?

I guess this is a hint to inference that q will be an Expr from that point on.

The result type of macroexpand is not inferrable. This just allows the compiler to know that below this line, q::Expr.

DilumAluthge · 2020-12-24T13:54:52Z

src/add_compute.jl

@@ -303,7 +303,7 @@ end

 # adds x ^ (p::Real)
 function add_pow!(
-    ls::LoopSet, var::Symbol, x, p::Real, elementbytes::Int, position::Int
+    ls::LoopSet, var::Symbol, @nospecialize(x), p::Real, elementbytes::Int, position::Int


Will adding this @nospecialize have an effect on runtime performance?

src/vectorizationbase_compat/contract_pass.jl

timholy

I definitely don't have the answers here, so I'd ask @chriselrod to chime in.

timholy · 2020-12-24T14:01:44Z

src/add_compute.jl

@@ -303,7 +303,7 @@ end

 # adds x ^ (p::Real)
 function add_pow!(
-    ls::LoopSet, var::Symbol, x, p::Real, elementbytes::Int, position::Int
+    ls::LoopSet, var::Symbol, @nospecialize(x), p::Real, elementbytes::Int, position::Int


I think not but for questions like these I'd defer to the experts.

timholy · 2020-12-24T14:02:23Z

src/constructors.jl

@@ -143,6 +143,7 @@ use their `parent`. Triangular loops aren't yet supported.
 """
 macro avx(q)
    q = macroexpand(__module__, q)
+    isa(q, Expr) || return q


The result type of macroexpand is not inferrable. This just allows the compiler to know that below this line, q::Expr.

timholy · 2020-12-24T14:04:14Z

src/graphs.jl

@@ -335,7 +335,7 @@ function LoopSet(mod::Symbol)
        Tuple{Int,Symbol}[],
        Tuple{Int,Int}[],
        Tuple{Int,Float64}[],
-        Int[],Int[],
+        Tuple{Int,NumberType}[],Tuple{Int,Symbol}[],


I've seen this kind of thing in my own code (e.g., JuliaDebug/LoweredCodeUtils.jl#58), and I'm amazed that convert(Vector{Tuple{Int,NumberType}}, ::Vector{Int}) doesn't error.

Good catch!

Being empty saves Julia the need to convert an Int into a Tuple{Int,Symbol}, while "converting" the container (sans the contents) is trivial. Would be nice if this were easier to catch.

julia> convert(Vector{Tuple{Int,Symbol}}, Int[]) Tuple{Int64, Symbol}[] julia> convert(Vector{Tuple{Int,Symbol}}, Int[3, 4]) ERROR: MethodError: Cannot `convert` an object of type Int64 to an object of type Tuple{Int64, Symbol} Closest candidates are: convert(::Type{T}, ::T) where T<:Tuple at essentials.jl:317 convert(::Type{T}, ::Tuple{Vararg{Any, N}}) where {N, T<:Tuple} at essentials.jl:318 convert(::Type{T}, ::CartesianIndex) where T<:Tuple at multidimensional.jl:137 ... Stacktrace:

src/vectorizationbase_compat/contract_pass.jl

chriselrod

Fantastic as always! Ready to merge?

src/add_compute.jl

chriselrod · 2020-12-24T14:08:51Z

src/add_loads.jl

@@ -2,13 +2,13 @@ function maybeaddref!(ls::LoopSet, op)
    ref = op.ref
    id = findfirst(r -> r == ref, ls.refs_aliasing_syms)
    # try to CSE
-    if isnothing(id)
+    if id === nothing


I should perhaps replace all isnothing(x) with x === nothing.

Yeah I think that's a good idea.

codecov-io · 2020-12-24T14:18:50Z

Codecov Report

Merging #180 (a302f6c) into master (5cbab62) will decrease coverage by 1.51%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #180      +/-   ##
==========================================
- Coverage   94.67%   93.15%   -1.52%     
==========================================
  Files          32       32              
  Lines        4541     4615      +74     
==========================================
  Hits         4299     4299              
- Misses        242      316      +74

Impacted Files	Coverage Δ
src/add_compute.jl	`98.56% <100.00%> (+0.01%)`	⬆️
src/add_loads.jl	`100.00% <100.00%> (ø)`
src/constructors.jl	`97.24% <100.00%> (+0.02%)`	⬆️
src/graphs.jl	`88.12% <100.00%> (-0.18%)`	⬇️
src/precompile.jl	`98.66% <100.00%> (+0.03%)`	⬆️
src/reconstruct_loopset.jl	`96.83% <100.00%> (ø)`
src/vectorizationbase_compat/contract_pass.jl	`81.81% <100.00%> (ø)`
src/lower_store.jl	`83.89% <0.00%> (-9.12%)`	⬇️
src/lower_load.jl	`91.80% <0.00%> (-6.56%)`	⬇️
src/loopstartstopmanager.jl	`92.18% <0.00%> (-6.50%)`	⬇️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5cbab62...57b715f. Read the comment docs.

timholy · 2020-12-24T14:31:05Z

For your interest, this was the workflow to find the inference issues:

using SnoopCompile, Cthulhu
tinf = @snoopi_deep include("some_workload.jl")
itrigs = inference_triggers(tinf)
itrig = itrigs[1]
ascend(itrig) # inspect and fix if needed (the first item in the chain is the callee that was runtime-dispatched or needed re-inferring, later ones are its callers)
itrig = itrigs[2]
ascend(itrig) # inspect and fix if needed...

The main gotcha is that the callers of a freshly-inferred MethodInstance are derived from a backtrace grabbed at the entrance to inference, and you only have a full MethodInstance for non-inlined methods. (Any => you see in each row represents inlining.)

This won't be very rewarding on LoopVectorization now because it was already quite good and this eliminates most of the remaining "real" triggers; you just get a lot of noise from JuliaLang/julia#38983. But in a package that has not yet received the kind of polish this one has, the results can be very informative.

src/vectorizationbase_compat/contract_pass.jl

chriselrod · 2020-12-24T15:32:12Z

In retrospect they seem pretty obvious, but when profiling it never ceases to amaze me how often things only make sense in retrospect and are completely different from what I expected going in.

If someone hands you a needle and a piece of hay, it's also obvious which is which.
But if someone presents a haystack and asks you to find the needles, any tool to sort through it would be a lot of help.

Plus, it's not always obvious to those of us less familiar with Julia's internals!
E.g., when the compiler benefits from extra type hints.

For your interest, this was the workflow to find the inference issues:

Thanks, I'm going to have to try that.
Much of that past polish is of course to your credit, but ascend looks great and I know a few repos I expect to benefit a lot from this.

I intend to merge this (and VectorizationBase) once the final test passes, rerun benchmarks, and then issue a new release.
(I don't expect these changes to impact runtime performance, but I made a couple other changes since the last release that should.)

timholy · 2020-12-24T17:36:05Z

Hah, I'd completely forgotten I'd added those precompiles! 😆

timholy mentioned this pull request Dec 24, 2020

Add some precompiles JuliaSIMD/VectorizationBase.jl#29

Merged

DilumAluthge reviewed Dec 24, 2020

View reviewed changes

timholy commented Dec 24, 2020

View reviewed changes

chriselrod approved these changes Dec 24, 2020

View reviewed changes

timholy added 2 commits December 24, 2020 08:49

Small improvements in inferrability

5ecb9f7

Add some precompiles

b3bec89

chriselrod reviewed Dec 24, 2020

View reviewed changes

src/vectorizationbase_compat/contract_pass.jl Outdated Show resolved Hide resolved

Remove LHS argument from mul_add

57b715f

chriselrod merged commit dd3afad into JuliaSIMD:master Dec 24, 2020

timholy deleted the teh/inference branch December 24, 2020 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some inferrability and precompile improvements #180

Some inferrability and precompile improvements #180

timholy commented Dec 24, 2020

DilumAluthge Dec 24, 2020

chriselrod Dec 24, 2020 •

edited

Loading

timholy Dec 24, 2020

DilumAluthge Dec 24, 2020

chriselrod Dec 24, 2020

timholy Dec 24, 2020

DilumAluthge Dec 24, 2020

timholy left a comment

timholy Dec 24, 2020

timholy Dec 24, 2020

timholy Dec 24, 2020

chriselrod Dec 24, 2020

chriselrod left a comment

chriselrod Dec 24, 2020

DilumAluthge Dec 24, 2020

codecov-io commented Dec 24, 2020 •

edited

Loading

timholy commented Dec 24, 2020

chriselrod commented Dec 24, 2020

timholy commented Dec 24, 2020

Some inferrability and precompile improvements #180

Some inferrability and precompile improvements #180

Conversation

timholy commented Dec 24, 2020

Choose a reason for hiding this comment

chriselrod Dec 24, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timholy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chriselrod left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Dec 24, 2020 • edited Loading

Codecov Report

timholy commented Dec 24, 2020

chriselrod commented Dec 24, 2020

timholy commented Dec 24, 2020

chriselrod Dec 24, 2020 •

edited

Loading

codecov-io commented Dec 24, 2020 •

edited

Loading