Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache axis args in a dictionary #3775

Merged
merged 20 commits into from
Aug 31, 2021
Merged

Conversation

BioTurboNick
Copy link
Member

These axis arguments can be generated lots of times for large plots. We can save time by caching them in a dictionary. For my large map test plot:

Before: 1.075 s (3393429 allocations: 183.49 MiB)
After:  959.024 ms (3393294 allocations: 181.64 MiB) - 10% improvement in speed, small decrease in allocations

TTFP Before: 7.543192 seconds (26.79 M allocations: 1.544 GiB, 3.89% gc time, 0.06% compilation time)
TTFP After: 6.886222 seconds (23.58 M allocations: 1.355 GiB, 3.63% gc time, 0.07% compilation time) - 8% improvement in speed, 12% fewer allocations, 12% lower allocation amount

The cost of this is, I think, just 3-4 kb. Is there a function to read out the size in memory of a dictionary? sizeof for the dictionary itself just returns 64.

Resolves #3763

These axis arguments can be generated lots of times for large plots. We can save time by caching them in a dictionary. For my large map test plot:

```
Before: 1.075 s (3393429 allocations: 183.49 MiB)
After:  959.024 ms (3393294 allocations: 181.64 MiB) - 10% improvement in speed, small decrease in allocations

TTFP Before: 7.543192 seconds (26.79 M allocations: 1.544 GiB, 3.89% gc time, 0.06% compilation time)
TTFP After: 6.886222 seconds (23.58 M allocations: 1.355 GiB, 3.63% gc time, 0.07% compilation time) - 8% improvement in speed, 12% fewer allocations, 12% lower allocation amount
```

The cost of this is, I think, just 3-4 kb. Is there a function to read out the size in memory of a dictionary? `sizeof` for the dictionary itself just returns 64.
@codecov
Copy link

codecov bot commented Aug 27, 2021

Codecov Report

Merging #3775 (e992da4) into master (22651e8) will decrease coverage by 0.49%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3775      +/-   ##
==========================================
- Coverage   63.34%   62.85%   -0.50%     
==========================================
  Files          28       28              
  Lines        7377     7104     -273     
==========================================
- Hits         4673     4465     -208     
+ Misses       2704     2639      -65     
Impacted Files Coverage Δ
src/backends.jl 64.60% <0.00%> (+1.12%) ⬆️
src/backends/deprecated/pgfplots.jl 0.00% <0.00%> (ø)
src/backends/gaston.jl 0.00% <ø> (ø)
src/backends/pgfplotsx.jl 60.64% <ø> (-1.00%) ⬇️
src/args.jl 73.17% <57.69%> (-0.85%) ⬇️
src/utils.jl 54.84% <66.66%> (-0.05%) ⬇️
src/axes.jl 85.61% <73.07%> (+0.55%) ⬆️
src/backends/gr.jl 89.02% <100.00%> (-0.29%) ⬇️
src/components.jl 68.22% <100.00%> (-0.53%) ⬇️
src/pipeline.jl 90.04% <100.00%> (-0.52%) ⬇️
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 22651e8...e992da4. Read the comment docs.

@BioTurboNick
Copy link
Member Author

Generalized a bit, new time:

955.188 ms (2945589 allocations: 167.97 MiB)
TTFP: 6.968500 seconds (23.69 M allocations: 1.373 GiB, 3.82% gc time, 0.08% compilation time)

Not much time savings over the initial change, but an extra 15% allocations gone and 8% allocation amount. Marginal increase in TTFP, but still improved over the current master.

@t-bltg
Copy link
Member

t-bltg commented Aug 28, 2021

My mental model on Symbol concatenation refuses to see that it can be slower than a Dict lookup, but I guess the number are here 😉. And reducing allocs is always a good thing.

Could you share a mwe for the TTFP, so that I can also check the numbers on my machine ?

@BioTurboNick
Copy link
Member Author

BioTurboNick commented Aug 28, 2021

I hear you haha. Each time a symbol is created, it has to make a new string.

This should do it.

U.S. Census county shapefiles: https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_county_5m.zip

using Shapefile
using Plots

main() = begin
  shapepath = joinpath("usgeodata", "cb_2018_us_county_5m.shp")
  table = Shapefile.Table(shapepath)
  geoms = Shapefile.shapes(table)
  stateids = parse.(Int, table.STATEFP)
  states = (stateids .< 57)
  stateids = stateids[states]
  geoms = geoms[states]
  ALASKA = 2
  HAWAII = 15
  lower48geoms = geoms[stateids .∉ Ref([ALASKA, HAWAII])]

  plot(lower48geoms)
  return
end

@time main()

@t-bltg
Copy link
Member

t-bltg commented Aug 28, 2021

Thanks, but ERROR: UndefVarError: stateids not defined ?

@BioTurboNick
Copy link
Member Author

Oops, that'll teach me to run on a fresh session. Updated it.

@t-bltg
Copy link
Member

t-bltg commented Aug 28, 2021

My numbers:

without PR

$ julia
julia> GC.enable(false); @time main(); GC.enable(true); GC.gc()
  6.162806 seconds (11.35 M allocations: 661.625 MiB, 0.19% compilation time)
julia> GC.enable(false); @time main(); GC.enable(true); GC.gc()
  1.506822 seconds (4.78 M allocations: 279.002 MiB)  # repeated

with PR (and no const)

$ julia
julia> GC.enable(false); @time main(); GC.enable(true); GC.gc()
  5.992996 seconds (9.49 M allocations: 546.062 MiB, 0.19% compilation time)
julia> GC.enable(false); @time main(); GC.enable(true); GC.gc()
  1.248008 seconds (2.94 M allocations: 164.003 MiB)  # repeated

LGTM.

@t-bltg t-bltg added the performance speedups and slowdowns label Aug 28, 2021
@t-bltg
Copy link
Member

t-bltg commented Aug 28, 2021

Re-thinking about this, another possible implementation (not reinventing the wheel, what we do here is memoization) could be:

--- a/Plots.jl/src/utils.jl	2021-08-26 20:05:17.172180754 +0200
+++ b/Plots.jl/src/utils.jl	2021-08-28 21:26:22.900145140 +0200
@@ -1,4 +1,5 @@
 # ---------------------------------------------------------------
+using Memoize
 
 treats_y_as_x(seriestype) =
     seriestype in (:vline, :vspan, :histogram, :barhist, :stephist, :scatterhist)
@@ -1214,3 +1215,6 @@
     end
     return X, Y, Z
 end
+
+@memoize get_axis_attr(letter::Symbol, keyword::String) = get_axis_attr(letter, Symbol(keyword))
+@memoize get_axis_attr(letter::Symbol, keyword::Symbol) = Symbol(letter, keyword)

Running #3775 (comment) with this patch and GC disabled:

$ julia
julia> GC.enable(false); @time main(); GC.enable(true); GC.gc()
  5.890243 seconds (10.00 M allocations: 562.515 MiB, 0.19% compilation time)
julia> GC.enable(false); @time main(); GC.enable(true); GC.gc()
  1.126509 seconds (3.42 M allocations: 178.810 MiB)  # repeated

Compared to #3775 (comment), this is the fastest, with sligthly more allocs. It could also allow us to cache more things later.

With original PR (and const)

$ julia
julia> GC.enable(false); @time main(); GC.enable(true); GC.gc()
  5.790343 seconds (9.04 M allocations: 532.402 MiB, 0.20% compilation time)
julia> GC.enable(false); @time main(); GC.enable(true); GC.gc()
  1.088492 seconds (2.49 M allocations: 150.345 MiB)  # repeated

Fastest, and winner on allocs.

@BioTurboNick
Copy link
Member Author

BioTurboNick commented Aug 28, 2021

Oh, that change to a const may have made TTFP worse, but did further improve repeated plotting.

TTFP: 7.577282 seconds (25.98 M allocations: 1.518 GiB, 4.08% gc time, 0.06% compilation time)
Repeated: 878.953 ms (2498031 allocations: 154.32 MiB)

@BioTurboNick
Copy link
Member Author

Memoize.jl version with my own example:

TTFP: 6.755638 seconds (23.88 M allocations: 1.360 GiB, 3.66% gc time, 0.08% compilation time)
907.864 ms (3598686 allocations: 187.91 MiB)

@t-bltg
Copy link
Member

t-bltg commented Aug 28, 2021

Oh, that change to a const may have made TTFP worse, but did further improve repeated plotting.

On my machine timings improve with const. Maybe a gc effect ?

@t-bltg
Copy link
Member

t-bltg commented Aug 29, 2021

There are some missing replacements remaining, e.g.

nax = sp[Symbol(near_letter, :axis)]

oaxis = sp[Symbol(oletter, :axis)]

Those should be replaced:

$ cd Plots.jl
$ grep -ir 'Symbol(.*, :axis)'

@BioTurboNick
Copy link
Member Author

Are you testing this on Julia master, or 1.6.2?

@t-bltg
Copy link
Member

t-bltg commented Aug 29, 2021

Are you testing this on Julia master, or 1.6.2?

1.6.2, but not the downloaded binaries, local build.

julia> versioninfo()
Julia Version 1.6.2
Commit 1b93d53fc4b* (2021-07-14 15:36 UTC)
Platform Info:
  OS: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, sandybridge)

@BioTurboNick
Copy link
Member Author

BioTurboNick commented Aug 30, 2021

Alright, I've reproduced your numbers. Are there enough other things that need to be cached to justify the extra overhead?

Profiling this example, I don't see many other opportunities. Would it be better to stay with this and just leave a comment to that effect? Wouldn't be a hard change to make if it became necessary to memoize more generally.

@t-bltg
Copy link
Member

t-bltg commented Aug 30, 2021

I'm OK with merging this as is.
Can you confirm @BeastyBlacksmith ?

Copy link
Member

@BeastyBlacksmith BeastyBlacksmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it we can just generate the whole cache ahead of time to simplify access.

BioTurboNick and others added 3 commits August 30, 2021 12:20
@BioTurboNick
Copy link
Member Author

I found a good place to populate the dictionary up front. There's a loop at the top level of args.jl to create underscore aliases that loops over letters and arguments. I just expanded it. That was probably where most of them were being initialized anyway. Taking a similar approach, I added the PyPlot specific ones to pyplot.jl, so they're only loaded when that backend is activated.

Before (dynamic):
julia> GC.enable(false); @time main(); GC.enable(true); GC.gc()
  4.060057 seconds (10.50 M allocations: 558.878 MiB, 0.19% compilation time)
julia> GC.enable(false); @time main(); GC.enable(true); GC.gc()
  0.563965 seconds (3.18 M allocations: 167.640 MiB)

After:
julia> GC.enable(false); @time main(); GC.enable(true); GC.gc()
  3.952987 seconds (10.50 M allocations: 559.009 MiB, 0.19% compilation time)
julia> GC.enable(false); @time main(); GC.enable(true); GC.gc()
  0.546460 seconds (3.18 M allocations: 167.640 MiB)

This is with PyPlot:

Before (dynamic):
julia> GC.enable(false); @time main(); GC.enable(true); GC.gc()
  9.440951 seconds (35.43 M allocations: 1.878 GiB, 0.08% compilation time)

julia> GC.enable(false); @time main(); GC.enable(true); GC.gc()
  0.519489 seconds (2.81 M allocations: 156.496 MiB)

After:
julia> GC.enable(false); @time main(); GC.enable(true); GC.gc()
  9.663443 seconds (34.90 M allocations: 1.853 GiB, 0.08% compilation time)

julia> GC.enable(false); @time main(); GC.enable(true); GC.gc()
  0.515910 seconds (2.81 M allocations: 156.496 MiB)

@t-bltg
Copy link
Member

t-bltg commented Aug 31, 2021

Thanks for reworking, I think this looks way neater.

For reference, numbers are now:

julia> GC.enable(false); @time main(); GC.enable(true); GC.gc()
  5.778806 seconds (9.06 M allocations: 533.092 MiB, 0.20% compilation time)
julia> GC.enable(false); @time main(); GC.enable(true); GC.gc()
  1.092024 seconds (2.49 M allocations: 150.345 MiB)

@t-bltg t-bltg removed the discussion label Aug 31, 2021
@t-bltg t-bltg merged commit 70b635d into JuliaPlots:master Aug 31, 2021
t-bltg added a commit that referenced this pull request Sep 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance speedups and slowdowns
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FR] Cut down performance hit of dynamically generating prefix-symbols each time
3 participants