Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory impact of the functional approach #12

Closed
CarloLucibello opened this issue Feb 7, 2021 · 7 comments
Closed

memory impact of the functional approach #12

CarloLucibello opened this issue Feb 7, 2021 · 7 comments

Comments

@CarloLucibello
Copy link
Member

I'm bit concerned about the approach here on master and in #9 , where we create a bunch of intermediate values (e.g. optimizers do not mutate gradients but transform them) and one also ends up with a new set of weights

w, st = Optimisers.update(opt, w, gs..., st)

Compared to the current approach in Flux, where in-place mutation is used as much as possible, one might end up with multiple copies of a ResNet before GC kicks in. Has this been discussed and benchmarked already? I'd like to have some reassurance here because the advantages outlined in #9 (comment) don't seem juicy enough to counterbalance this problem

@darsnack
Copy link
Member

darsnack commented Feb 7, 2021

You will find me in full agreement on here. The original version of #9 had mutating apply! and update! for this reason. It seems the memory overhead is not that much, but I can't reconcile those numbers mentally with the discussion here.

But it seemed #9 was not going to be considered at all with mutation, and I want explicit state in the picture as we add optimizer related features to Flux (like scheduling). I just think the implicit state interface is too confusing to extend and error-prone. We can have all the advantages outlined in the comment you linked without giving up mutation, and that's what #9 was meant to do.

@darsnack
Copy link
Member

darsnack commented Feb 7, 2021

Personally I think a path that starts where Flux's optimizers currently are and slowly adds explicit state then immutability is a much cleaner development path. It is less likely to break user code and cause performance regressions. Instead, we're starting with something that appears at face value to be a performance non-starter and trying to work in the other direction.

@CarloLucibello
Copy link
Member Author

Can we revert #9 to the original design using apply! and update!? seems we should take a more careful and step by step approach, as you said. Or we can keep #9 as it is, but take some time to compare optimizers in Flux with the ones here

@darsnack
Copy link
Member

darsnack commented Feb 7, 2021

Yeah given the reluctance to merge #9 w/ in-place updates, I opted for the latter path. I'd rather be able to merge and compare, then file a PR for in-place (it's a simple find-and-replace for this repo). As opposed to the endless argument-cycle that #9 was stuck in.

@DhairyaLGandhi
Copy link
Member

I have been considering it as well, and while for the most part things seem to be stable. With the IdDict approach, all those references stay alive, whereas now not so much. I'd also started with some checks in place in this branch https://github.com/FluxML/Optimisers.jl/tree/dg/mut

@ToucheSir
Copy link
Member

I just realized we sometimes unconditionally mutate state as well, e.g. https://github.com/FluxML/Optimisers.jl/blob/master/src/rules.jl#L47. So in addition to #13 , we should probably change the non-mutating versions to be truly non-mutating.

@mcabbott
Copy link
Member

Closed by #31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants