Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT / Optim: Add GaLore optimizer #29588
FEAT / Optim: Add GaLore optimizer #29588
Changes from 1 commit
b31ce79
58169f1
9032635
136f104
a5483b3
887d3ad
d6f119f
3fae229
c8c50f8
2bdda68
630bd13
a871b75
cb6cd7e
51b7b29
3da3b90
9115c94
0b4ba83
3e5930e
a16d3a8
29e7e94
18ea144
7800bf1
e022bdd
830c68d
b640e98
14a89b2
6f7102d
c11cb63
3678201
fdc4b2a
e7ce9b7
91d6436
b9e338a
6ff3762
0d0440a
832f2be
898a3c5
ed3ad4a
57e7096
64ccfa6
4413f07
73dcabb
1987b7a
db2bf21
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need an import check here, no? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah indeed, many things can be optimized, for now it's a really rough draft, will focus on polishing everything next!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, ping me when you're ready for a full review :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should raise an error if the target module name matches but the layer type is not
Linear
. Let's assume that a user matches N linear layers and accidentally 1 other type likeEmbedding
, currently the embedding layer would be ignored but the user doesn't get any error message or warning.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a good idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! I think we should pass a warning to be able to pass simple regex such as
.*.attn.*.
- lmk wdyt !There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@younesbelkada But it is not allowed for pure layered_adamw optimizers, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm in that case users should just pass
adamw
instead ofgalore_adamw_layerwise
I thinkThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh I think i got your point, we could indeed extend the optimizers and enable layer-wise optimizations ! This can be done in a scope of another follow up PR !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @muellerzr @amyeroberts @BenjaminBossan @pacman100
The idea here would be to leverage
optim_target_modules
and think of a more general API to enable layer-wise optimization within Trainer, using the approach presented here with DummyOptimizer / DummyLRScheduler and post gradient hooks. I propose to do that in a separate PR but I can also do that here if you think it makes more sense to introduce both GaLoRe + per-layer optimization for all optimizers in the same PRI think it's wiser to do it in a separate PR as layer-wise optimization might not be supported OTB for many scenarios such as DS / DDP etc;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@younesbelkada Yeah, that sounds good for me. I guess I can comment out this check locally and wait for your pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it makes more sense to a) have generic API for layerwise optimization using the approach presented here b) have GaLoRE c) use generic API to instantiate layerwise GaLoRE; otherwise, after you add a generic API (which it seems like you already have a lot of code to do that, I don't see anything that's GaLoRE-specific), you would have to go back and do another PR just to refactor the layerwise GaLoRE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kiddyboots216 I think it's fine to add Galore first. We're just merging into the dev branch atm, so there aren't guarantees about API stability until we have a release. Adding it in a more general sense is more involved and will require more tests / hitting possible blockers. In this order, we can release the feature without being blocked by the development of the more general API.