-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very slow broadcast #560
Comments
On Julia master
|
I'm sorry, yes, I forgot to mention/copy that we also identified that this seems to be a Julia v1.0.2 issue. |
I wonder if this is because of #539. Perhaps the use of |
Is v.0.10.1 already tagged? #539 seems to be only in the latest version. I did my tests with v0.10.0. |
No, 91743c9 is in 0.10.0 as well. |
The benchmark doesn't run that code though, so that's not it. |
No, it's slow on v0.9 and on v0.8.3... 🤔 |
I don't think I can contribute concretely to the resolution of the problem, but let me share the results of some experiments. Hopefully somebody finds them helpful.
The following parts of
Here is a first workaround, which hits the normal Vector case.
The following parts/versions of
A second workaround is to define the function elementwise (without broadcasting), and then apply it via broadcasting:
This hits the Vector case even harder. |
Julia 1.0.2: │ ─ %-1 = invoke Base.Broadcast.make_makeargs(::Function,::Tuple{Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1},Nothing,typeof(*),Tuple{Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1},Nothing,typeof(max),Tuple{Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1},Nothing,typeof(abs),Tuple{SArray{Tuple{3},Float64,1,3}}},Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1},Nothing,typeof(abs),Tuple{SArray{Tuple{3},Float64,1,3}}}}},Float64}}})::Type{getfield(Base.Broadcast, Symbol("##7#8")){Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1},Nothing,typeof(*),Tuple{Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1},Nothing,typeof(max),Tuple{Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1},Nothing,typeof(abs),Tuple{SArray{Tuple{3},Float64,1,3}}},Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1},Nothing,typeof(abs),Tuple{SArray{Tuple{3},Float64,1,3}}}}},Float64}},getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},_1}} where _1
Body::getfield(Base.Broadcast, Symbol("##7#8")){Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1},Nothing,typeof(*),Tuple{Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1},Nothing,typeof(max),Tuple{Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1},Nothing,typeof(abs),Tuple{SArray{Tuple{3},Float64,1,3}}},Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1},Nothing,typeof(abs),Tuple{SArray{Tuple{3},Float64,1,3}}}}},Float64}},getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},_1} where _1
334 1 ─ %1 = (Base.getfield)(t, 1, true)::Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1},Nothing,typeof(*),Tuple{Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1},Nothing,typeof(max),Tuple{Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1},Nothing,typeof(abs),Tuple{SArray{Tuple{3},Float64,1,3}}},Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1},Nothing,typeof(abs),Tuple{SArray{Tuple{3},Float64,1,3}}}}},Float64}}
336 │ %2 = (Base.getfield)(%1, :args)::Tuple{Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1},Nothing,typeof(max),Tuple{Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1},Nothing,typeof(abs),Tuple{SArray{Tuple{3},Float64,1,3}}},Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1},Nothing,typeof(abs),Tuple{SArray{Tuple{3},Float64,1,3}}}}},Float64}
│ %3 = (Base.getfield)(%2, 1, true)::Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1},Nothing,typeof(max),Tuple{Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1},Nothing,typeof(abs),Tuple{SArray{Tuple{3},Float64,1,3}}},Base.Broadcast.Broadcasted{StaticArrays.StaticArrayStyle{1},Nothing,typeof(abs),Tuple{SArray{Tuple{3},Float64,1,3}}}}}
│ %4 = Base.Broadcast.:(##5#6)::Type{getfield(Base.Broadcast, Symbol("##5#6"))} ││╻ make_makeargs
│ %5 = (Core.typeof)(makeargs)::Type{#s55} where #s55<:Function │││
... I think it is fixed by JuliaLang/julia@67b06af |
I don't think so. That patch is included in Julia v1.0.2, in which the problem occurs. Note also that the runtime on Julia master is far from optimal. My second workaround takes 5 ns, as opposed to the 90 ns reported by @KristofferC. Anyway, the 90 ns are slower than the usual Vector case. In some sense, the broadcast needed here is of a "trivial" kind. All broadcasted operations should be fused (not sure that's the correct terminology), and the result should be the same as my second workaround, which first fuses manually, and then calls broadcasting. |
I think this issue can be considered as (magically) sort of resolved. For the original problem on
I now get
That's, however, still much more than my "second workaround" from above, which yields the same numerical result:
Edit: Note that this is also much better than the previous master-runtime, see @KristofferC's #560 (comment). |
0.027 ns is less than a clock cycle, so everything has been constant-folded, (i.e. the computation happened at compile time because the compiler recognized that pure functions were being called on constant inputs). A better microbenchmark would be julia> @btime f.(u, u0, u1, r) setup = begin
ũ, u₀, u₁, ρ = rand(3), rand(3), rand(3), rand(3)
u, u0, u1, r = convert.(SVector{3}, (ũ, u₀, u₁, ρ))
end
5.413 ns (0 allocations: 0 bytes) Still a difference though. |
Yes, I was surprised about that computation time, but didn't know what to do about it. Anyway, there seems to be room for improvement, but 11 ns are still much better than 2µs with which we started. I'll leave the issue open as a reminder to revisit the issue once in a while, since we haven't really understood the original issue and how it got resolved, AFAICT. |
Seems this might still be an issue, cf. SciML/DifferentialEquations.jl#436 Needs detailed investigation to figure out if we can do something about it or whether we need a fix in Base. |
adds a summary of issues discussed in JuliaArrays#682 and JuliaArrays#560 to the documentation
Small update: I've proposed a solution here: JuliaLang/julia#41090 . |
At SciML/OrdinaryDiffEq.jl#571 an instance of very slow broadcast was identified:
The text was updated successfully, but these errors were encountered: