Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some inferrability and precompile improvements #180

Merged
merged 3 commits into from
Dec 24, 2020
Merged

Some inferrability and precompile improvements #180

merged 3 commits into from
Dec 24, 2020

Conversation

timholy
Copy link
Contributor

@timholy timholy commented Dec 24, 2020

This is the companion to JuliaSIMD/VectorizationBase.jl#29. This PR focuses on inference improvements, as I've found for some packages that this is one of the most effective tools for increasing its precompilability (xref JuliaLang/julia#30488 (comment)). That said, you were already in quite good shape here, so these are minor tweaks with relatively little impact. The new precompile statements make the most difference, and those are something that the old @snoopi couldn't have easily discovered since they represent an intermediate stage in the inference tree. (In retrospect they seem pretty obvious, but when profiling it never ceases to amaze me how often things only make sense in retrospect and are completely different from what I expected going in.)

@@ -303,7 +303,7 @@ end

# adds x ^ (p::Real)
function add_pow!(
ls::LoopSet, var::Symbol, x, p::Real, elementbytes::Int, position::Int
ls::LoopSet, var::Symbol, @nospecialize(x), p::Real, elementbytes::Int, position::Int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will adding this @nospecialize have an effect on runtime performance?

Copy link
Member

@chriselrod chriselrod Dec 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. add_pow! expands literal powers into a product of squares while the macro expands.
That is, it replaces x^5 with x*((x^2)^2).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not but for questions like these I'd defer to the experts.

@@ -143,6 +143,7 @@ use their `parent`. Triangular loops aren't yet supported.
"""
macro avx(q)
q = macroexpand(__module__, q)
isa(q, Expr) || return q
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this line?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is a hint to inference that q will be an Expr from that point on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result type of macroexpand is not inferrable. This just allows the compiler to know that below this line, q::Expr.

@@ -303,7 +303,7 @@ end

# adds x ^ (p::Real)
function add_pow!(
ls::LoopSet, var::Symbol, x, p::Real, elementbytes::Int, position::Int
ls::LoopSet, var::Symbol, @nospecialize(x), p::Real, elementbytes::Int, position::Int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will adding this @nospecialize have an effect on runtime performance?

src/vectorizationbase_compat/contract_pass.jl Outdated Show resolved Hide resolved
Copy link
Contributor Author

@timholy timholy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely don't have the answers here, so I'd ask @chriselrod to chime in.

@@ -303,7 +303,7 @@ end

# adds x ^ (p::Real)
function add_pow!(
ls::LoopSet, var::Symbol, x, p::Real, elementbytes::Int, position::Int
ls::LoopSet, var::Symbol, @nospecialize(x), p::Real, elementbytes::Int, position::Int
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not but for questions like these I'd defer to the experts.

@@ -143,6 +143,7 @@ use their `parent`. Triangular loops aren't yet supported.
"""
macro avx(q)
q = macroexpand(__module__, q)
isa(q, Expr) || return q
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result type of macroexpand is not inferrable. This just allows the compiler to know that below this line, q::Expr.

@@ -335,7 +335,7 @@ function LoopSet(mod::Symbol)
Tuple{Int,Symbol}[],
Tuple{Int,Int}[],
Tuple{Int,Float64}[],
Int[],Int[],
Tuple{Int,NumberType}[],Tuple{Int,Symbol}[],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen this kind of thing in my own code (e.g., JuliaDebug/LoweredCodeUtils.jl#58), and I'm amazed that convert(Vector{Tuple{Int,NumberType}}, ::Vector{Int}) doesn't error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Being empty saves Julia the need to convert an Int into a Tuple{Int,Symbol}, while "converting" the container (sans the contents) is trivial. Would be nice if this were easier to catch.

julia> convert(Vector{Tuple{Int,Symbol}}, Int[])
Tuple{Int64, Symbol}[]

julia> convert(Vector{Tuple{Int,Symbol}}, Int[3, 4])
ERROR: MethodError: Cannot `convert` an object of type Int64 to an object of type Tuple{Int64, Symbol}
Closest candidates are:
  convert(::Type{T}, ::T) where T<:Tuple at essentials.jl:317
  convert(::Type{T}, ::Tuple{Vararg{Any, N}}) where {N, T<:Tuple} at essentials.jl:318
  convert(::Type{T}, ::CartesianIndex) where T<:Tuple at multidimensional.jl:137
  ...
Stacktrace:

src/vectorizationbase_compat/contract_pass.jl Outdated Show resolved Hide resolved
Copy link
Member

@chriselrod chriselrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic as always! Ready to merge?

src/add_compute.jl Outdated Show resolved Hide resolved
@@ -2,13 +2,13 @@ function maybeaddref!(ls::LoopSet, op)
ref = op.ref
id = findfirst(r -> r == ref, ls.refs_aliasing_syms)
# try to CSE
if isnothing(id)
if id === nothing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should perhaps replace all isnothing(x) with x === nothing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that's a good idea.

@codecov-io
Copy link

codecov-io commented Dec 24, 2020

Codecov Report

Merging #180 (a302f6c) into master (5cbab62) will decrease coverage by 1.51%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #180      +/-   ##
==========================================
- Coverage   94.67%   93.15%   -1.52%     
==========================================
  Files          32       32              
  Lines        4541     4615      +74     
==========================================
  Hits         4299     4299              
- Misses        242      316      +74     
Impacted Files Coverage Δ
src/add_compute.jl 98.56% <100.00%> (+0.01%) ⬆️
src/add_loads.jl 100.00% <100.00%> (ø)
src/constructors.jl 97.24% <100.00%> (+0.02%) ⬆️
src/graphs.jl 88.12% <100.00%> (-0.18%) ⬇️
src/precompile.jl 98.66% <100.00%> (+0.03%) ⬆️
src/reconstruct_loopset.jl 96.83% <100.00%> (ø)
src/vectorizationbase_compat/contract_pass.jl 81.81% <100.00%> (ø)
src/lower_store.jl 83.89% <0.00%> (-9.12%) ⬇️
src/lower_load.jl 91.80% <0.00%> (-6.56%) ⬇️
src/loopstartstopmanager.jl 92.18% <0.00%> (-6.50%) ⬇️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5cbab62...57b715f. Read the comment docs.

@timholy
Copy link
Contributor Author

timholy commented Dec 24, 2020

For your interest, this was the workflow to find the inference issues:

using SnoopCompile, Cthulhu
tinf = @snoopi_deep include("some_workload.jl")
itrigs = inference_triggers(tinf)
itrig = itrigs[1]
ascend(itrig) # inspect and fix if needed (the first item in the chain is the callee that was runtime-dispatched or needed re-inferring, later ones are its callers)
itrig = itrigs[2]
ascend(itrig) # inspect and fix if needed...

The main gotcha is that the callers of a freshly-inferred MethodInstance are derived from a backtrace grabbed at the entrance to inference, and you only have a full MethodInstance for non-inlined methods. (Any => you see in each row represents inlining.)

This won't be very rewarding on LoopVectorization now because it was already quite good and this eliminates most of the remaining "real" triggers; you just get a lot of noise from JuliaLang/julia#38983. But in a package that has not yet received the kind of polish this one has, the results can be very informative.

@chriselrod
Copy link
Member

In retrospect they seem pretty obvious, but when profiling it never ceases to amaze me how often things only make sense in retrospect and are completely different from what I expected going in.

If someone hands you a needle and a piece of hay, it's also obvious which is which.
But if someone presents a haystack and asks you to find the needles, any tool to sort through it would be a lot of help.

Plus, it's not always obvious to those of us less familiar with Julia's internals!
E.g., when the compiler benefits from extra type hints.

For your interest, this was the workflow to find the inference issues:

Thanks, I'm going to have to try that.
Much of that past polish is of course to your credit, but ascend looks great and I know a few repos I expect to benefit a lot from this.

I intend to merge this (and VectorizationBase) once the final test passes, rerun benchmarks, and then issue a new release.
(I don't expect these changes to impact runtime performance, but I made a couple other changes since the last release that should.)

@chriselrod chriselrod merged commit dd3afad into JuliaSIMD:master Dec 24, 2020
@timholy timholy deleted the teh/inference branch December 24, 2020 17:35
@timholy
Copy link
Contributor Author

timholy commented Dec 24, 2020

Hah, I'd completely forgotten I'd added those precompiles! 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants