-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix multiple lr scheduler (warmup scheduler) & Add adaptive_patience to the lr scheduler #3035
Conversation
src/otx/recipe/classification/multi_class_cls/otx_mobilenet_v3_large.yaml
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warmup settings seem to be different on models, is this by design? Could you please double check?
src/otx/core/model/module/base.py
Outdated
@@ -150,6 +157,7 @@ def ensure_list(item: Any) -> list: # noqa: ANN401 | |||
optimizer(params=self.parameters()) if callable(optimizer) else optimizer | |||
for optimizer in ensure_list(self.hparams.optimizer) | |||
] | |||
self.init_lr = optimizers[0].param_groups[0]["lr"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.init_lr = optimizers[0].param_groups[0]["lr"] | |
# Capture initial_lr | |
for optimizer in optimizers: | |
for param_group in optimizer.param_groups: | |
param_group.setdefault('initial_lr', param_group["lr"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, fa86c3b
src/otx/core/model/module/base.py
Outdated
def _scale_lr(start_point: int, end_point: int, init_lr: float) -> float: | ||
return min(1.0, float(start_point + 1) / end_point) * init_lr | ||
|
||
optimizer.step(closure=closure) | ||
|
||
if self.warmup_by_epoch and self.trainer.current_epoch < self.warmup_steps: | ||
for pg in optimizer.param_groups: | ||
pg["lr"] = _scale_lr(self.trainer.current_epoch, self.warmup_steps, self.init_lr) | ||
|
||
if not self.warmup_by_epoch and (self.trainer.global_step < self.warmup_steps): | ||
for pg in optimizer.param_groups: | ||
pg["lr"] = _scale_lr(self.trainer.global_step, self.warmup_steps, self.init_lr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _scale_lr(start_point: int, end_point: int, init_lr: float) -> float: | |
return min(1.0, float(start_point + 1) / end_point) * init_lr | |
optimizer.step(closure=closure) | |
if self.warmup_by_epoch and self.trainer.current_epoch < self.warmup_steps: | |
for pg in optimizer.param_groups: | |
pg["lr"] = _scale_lr(self.trainer.current_epoch, self.warmup_steps, self.init_lr) | |
if not self.warmup_by_epoch and (self.trainer.global_step < self.warmup_steps): | |
for pg in optimizer.param_groups: | |
pg["lr"] = _scale_lr(self.trainer.global_step, self.warmup_steps, self.init_lr) | |
def _scale_lr(start_point: int, end_point: int, param_group) -> float: | |
return min(1.0, float(start_point + 1) / end_point) * param_group["initial_lr"] | |
if self.trainer.current_epoch < self.warmup_steps: | |
lr_step = self.trainer.current_epoch if self.warmup_by_epoch else self.trainer.global_step | |
for pg in optimizer.param_groups: | |
pg["lr"] = _scale_lr(lr_step, self.warmup_steps, pg) | |
optimizer.step(closure=closure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should calling _scale_lr()
be in front of optimizer.step(closure=closure)
to force it to be effective in top priority?
To validate this behavior, please add integration tests for the following scenario,
With cosine LR scheduler, warmup_steps=10
, 5 iterations per 1 epoch, and training for 10 epochs, validate the LR curve is scheduled correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just following the setting from the OTX1.x and I don't think we should use the same value for all models since the characteristics could vary |
Yes If it's the same as the setting in 1.X, you can ignore it. :) |
Close this PR, this PR(#3056) will handle the warmup scheduler issue. |
Summary
This PR introduces,
optimizer_step()
to enable the warmup scheduling. So, warmup logic will not be enabled byLinearWarmupScheduler
anymore.TODOs,
warmup_steps
andwarmup_by_epochs
at the baseOTXLitModule
.NOTE,
How to test
Checklist
License
Feel free to contact the maintainers if that's a concern.