memory impact of the functional approach #12

CarloLucibello · 2021-02-07T12:13:32Z

I'm bit concerned about the approach here on master and in #9 , where we create a bunch of intermediate values (e.g. optimizers do not mutate gradients but transform them) and one also ends up with a new set of weights

w, st = Optimisers.update(opt, w, gs..., st)

Compared to the current approach in Flux, where in-place mutation is used as much as possible, one might end up with multiple copies of a ResNet before GC kicks in. Has this been discussed and benchmarked already? I'd like to have some reassurance here because the advantages outlined in #9 (comment) don't seem juicy enough to counterbalance this problem

darsnack · 2021-02-07T14:42:09Z

You will find me in full agreement on here. The original version of #9 had mutating apply! and update! for this reason. It seems the memory overhead is not that much, but I can't reconcile those numbers mentally with the discussion here.

But it seemed #9 was not going to be considered at all with mutation, and I want explicit state in the picture as we add optimizer related features to Flux (like scheduling). I just think the implicit state interface is too confusing to extend and error-prone. We can have all the advantages outlined in the comment you linked without giving up mutation, and that's what #9 was meant to do.

darsnack · 2021-02-07T14:43:53Z

Personally I think a path that starts where Flux's optimizers currently are and slowly adds explicit state then immutability is a much cleaner development path. It is less likely to break user code and cause performance regressions. Instead, we're starting with something that appears at face value to be a performance non-starter and trying to work in the other direction.

CarloLucibello · 2021-02-07T16:17:06Z

Can we revert #9 to the original design using apply! and update!? seems we should take a more careful and step by step approach, as you said. Or we can keep #9 as it is, but take some time to compare optimizers in Flux with the ones here

darsnack · 2021-02-07T16:28:25Z

Yeah given the reluctance to merge #9 w/ in-place updates, I opted for the latter path. I'd rather be able to merge and compare, then file a PR for in-place (it's a simple find-and-replace for this repo). As opposed to the endless argument-cycle that #9 was stuck in.

DhairyaLGandhi · 2021-02-13T12:02:30Z

I have been considering it as well, and while for the most part things seem to be stable. With the IdDict approach, all those references stay alive, whereas now not so much. I'd also started with some checks in place in this branch https://github.com/FluxML/Optimisers.jl/tree/dg/mut

ToucheSir · 2021-07-20T22:16:01Z

I just realized we sometimes unconditionally mutate state as well, e.g. https://github.com/FluxML/Optimisers.jl/blob/master/src/rules.jl#L47. So in addition to #13 , we should probably change the non-mutating versions to be truly non-mutating.

mcabbott · 2022-01-28T05:41:12Z

Closed by #31

CarloLucibello mentioned this issue Feb 13, 2021

Add basic scheduling policies and a scheduler FluxML/Flux.jl#1506

Open

4 tasks

mcabbott closed this as completed Jan 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory impact of the functional approach #12

memory impact of the functional approach #12

CarloLucibello commented Feb 7, 2021

darsnack commented Feb 7, 2021

darsnack commented Feb 7, 2021

CarloLucibello commented Feb 7, 2021

darsnack commented Feb 7, 2021

DhairyaLGandhi commented Feb 13, 2021

ToucheSir commented Jul 20, 2021

mcabbott commented Jan 28, 2022

memory impact of the functional approach #12

memory impact of the functional approach #12

Comments

CarloLucibello commented Feb 7, 2021

darsnack commented Feb 7, 2021

darsnack commented Feb 7, 2021

CarloLucibello commented Feb 7, 2021

darsnack commented Feb 7, 2021

DhairyaLGandhi commented Feb 13, 2021

ToucheSir commented Jul 20, 2021

mcabbott commented Jan 28, 2022