Fix multiple lr scheduler (warmup scheduler) & Add adaptive_patience to the lr scheduler #3035

sungmanc · 2024-03-05T07:16:44Z

Summary

This PR introduces,

Resolving the multi LR scheduler issue by overriding the optimizer_step() to enable the warmup scheduling. So, warmup logic will not be enabled by LinearWarmupScheduler anymore.
Add adaptive_patience to the LRScheduler
Add unit-tests for warmup logic

TODOs,

Need to resolve the hard-coded index for the optimizer, and scheduler (i.e. optimizers[0]).
For future work, we might use the callback ? to enable the warm-up scheduler, need to check the feasibility. If this way works, we could remove the warmup_steps and warmup_by_epochs at the base OTXLitModule.
NOTE,
I remained the lr_scheduler_configs as list type since I got an error when I changed the type to the dictionary. I think it could be handled at the next phase since it is not important for this PR.

How to test

Checklist

I have added unit tests to cover my changes.
I have added integration tests to cover my changes.
I have added e2e tests for validation.
I have added the description of my changes into CHANGELOG in my target branch (e.g., CHANGELOG in develop).
I have updated the documentation in my target branch accordingly (e.g., documentation in develop).
I have linked related issues.

License

I submit my code changes under the same Apache License that covers the project.
Feel free to contact the maintainers if that's a concern.
I have updated the license header for each file (see an example below).

# Copyright (C) 2023 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

…nsions into fix-multiple-lrscheduler

tests/integration/cli/test_export_inference.py

src/otx/engine/engine.py

src/otx/recipe/classification/multi_class_cls/otx_mobilenet_v3_large.yaml

src/otx/recipe/detection/atss_r50_fpn.yaml

src/otx/recipe/detection/atss_resnext101.yaml

src/otx/recipe/detection/ssd_mobilenetv2.yaml

src/otx/recipe/detection/yolox_l.yaml

src/otx/recipe/semantic_segmentation/segnext_s.yaml

src/otx/recipe/semantic_segmentation/segnext_t.yaml

src/otx/core/model/module/base.py

harimkang

The warmup settings seem to be different on models, is this by design? Could you please double check?

vinnamkim · 2024-03-07T02:45:53Z

src/otx/core/model/module/base.py

@@ -150,6 +157,7 @@ def ensure_list(item: Any) -> list:  # noqa: ANN401
            optimizer(params=self.parameters()) if callable(optimizer) else optimizer
            for optimizer in ensure_list(self.hparams.optimizer)
        ]
+        self.init_lr = optimizers[0].param_groups[0]["lr"]


Suggested change

self.init_lr = optimizers[0].param_groups[0]["lr"]

# Capture initial_lr

for optimizer in optimizers:

for param_group in optimizer.param_groups:

param_group.setdefault('initial_lr', param_group["lr"])

Done, fa86c3b

vinnamkim · 2024-03-07T02:47:35Z

src/otx/core/model/module/base.py

+        def _scale_lr(start_point: int, end_point: int, init_lr: float) -> float:
+            return min(1.0, float(start_point + 1) / end_point) * init_lr
+
+        optimizer.step(closure=closure)
+
+        if self.warmup_by_epoch and self.trainer.current_epoch < self.warmup_steps:
+            for pg in optimizer.param_groups:
+                pg["lr"] = _scale_lr(self.trainer.current_epoch, self.warmup_steps, self.init_lr)
+
+        if not self.warmup_by_epoch and (self.trainer.global_step < self.warmup_steps):
+            for pg in optimizer.param_groups:
+                pg["lr"] = _scale_lr(self.trainer.global_step, self.warmup_steps, self.init_lr)


Suggested change

def _scale_lr(start_point: int, end_point: int, init_lr: float) -> float:

return min(1.0, float(start_point + 1) / end_point) * init_lr

optimizer.step(closure=closure)

if self.warmup_by_epoch and self.trainer.current_epoch < self.warmup_steps:

for pg in optimizer.param_groups:

pg["lr"] = _scale_lr(self.trainer.current_epoch, self.warmup_steps, self.init_lr)

if not self.warmup_by_epoch and (self.trainer.global_step < self.warmup_steps):

for pg in optimizer.param_groups:

pg["lr"] = _scale_lr(self.trainer.global_step, self.warmup_steps, self.init_lr)

def _scale_lr(start_point: int, end_point: int, param_group) -> float:

return min(1.0, float(start_point + 1) / end_point) * param_group["initial_lr"]

if self.trainer.current_epoch < self.warmup_steps:

lr_step = self.trainer.current_epoch if self.warmup_by_epoch else self.trainer.global_step

for pg in optimizer.param_groups:

pg["lr"] = _scale_lr(lr_step, self.warmup_steps, pg)

optimizer.step(closure=closure)

Should calling _scale_lr() be in front of optimizer.step(closure=closure) to force it to be effective in top priority?

To validate this behavior, please add integration tests for the following scenario,

With cosine LR scheduler, warmup_steps=10, 5 iterations per 1 epoch, and training for 10 epochs, validate the LR curve is scheduled correctly.

In addition, while writing this comment, I notice that the current implementation can fall into the following "what we don't want".

To achieve "what we want", is it better to implement ReduceLROnPlateauWithWarmup and make it to be called for both mode step/epoch?

Done, fa86c3b
Add test 65a7f91

sungmanc · 2024-03-07T04:00:27Z

The warmup settings seem to be different on models, is this by design? Could you please double check?

I just following the setting from the OTX1.x and I don't think we should use the same value for all models since the characteristics could vary

harimkang · 2024-03-07T04:03:36Z

The warmup settings seem to be different on models, is this by design? Could you please double check?

I just following the setting from the OTX1.x and I don't think we should use the same value for all models since the characteristics could vary

Yes If it's the same as the setting in 1.X, you can ignore it. :)

sungmanc · 2024-03-07T06:51:48Z

I also manually checked the warm_up works well.

sungmanc · 2024-03-07T08:40:23Z

Close this PR, this PR(#3056) will handle the warmup scheduler issue.

sungmanc added 2 commits March 5, 2024 14:25

Initial update

8f5527b

Edit recipes

cb674f1

github-actions bot added TEST Any changes in tests OTX 2.0 labels Mar 5, 2024

sungmanc added 14 commits March 5, 2024 16:17

Merge

11bba91

Remove typo

9881890

Add attributes to the each lit modules

5967339

Fix precommit

3ca7813

Fix lr getter

cf23bfe

Fix precommit

427873f

Merge branch 'v2' of https://github.com/openvinotoolkit/training_exte…

609ae1a

…nsions into fix-multiple-lrscheduler

Fix adaptive training schedule

e56de58

Fix unit-test

24ad2b9

Update adaptive scheduler

88b0335

Merge branch 'v2' of https://github.com/openvinotoolkit/training_exte…

57266b6

…nsions into fix-multiple-lrscheduler

Edit the test for the ov inference and edit recipes after merging

2b219be

Fix precommit

d18ce3c

Fix equatoin

47a46ea

harimkang reviewed Mar 6, 2024

View reviewed changes

tests/integration/cli/test_export_inference.py Outdated Show resolved Hide resolved

Rebase

1c69912

sungmanc marked this pull request as ready for review March 7, 2024 02:23

sungmanc requested review from samet-akcay, vinnamkim, jaegukhyun, eugene123tw, kprokofi, chuneuny-emily and sovrasov as code owners March 7, 2024 02:23

harimkang reviewed Mar 7, 2024

View reviewed changes

vinnamkim reviewed Mar 7, 2024

View reviewed changes

sungmanc added 7 commits March 7, 2024 13:09

Add docstring for warmup

fceca29

Add warmup_steps to missing models

3ccd4b1

Reflect reviews

fa86c3b

Add test to check schedule

44b10e4

Add typehint

18707b9

Add warmup test

65a7f91

Fix precommit

85e34d5

sungmanc mentioned this pull request Mar 7, 2024

Fix warmup scheduler and add patience update for adaptive interval #3056

Merged

8 tasks

sungmanc closed this Mar 7, 2024

vinnamkim mentioned this pull request Mar 27, 2024

Refactor optimizer and lr scheduler part #3216

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix multiple lr scheduler (warmup scheduler) & Add adaptive_patience to the lr scheduler #3035

Fix multiple lr scheduler (warmup scheduler) & Add adaptive_patience to the lr scheduler #3035

sungmanc commented Mar 5, 2024 •

edited

Loading

harimkang left a comment

vinnamkim Mar 7, 2024

sungmanc Mar 7, 2024

vinnamkim Mar 7, 2024

vinnamkim Mar 7, 2024 •

edited

Loading

vinnamkim Mar 7, 2024

sungmanc Mar 7, 2024

sungmanc commented Mar 7, 2024

harimkang commented Mar 7, 2024

sungmanc commented Mar 7, 2024

sungmanc commented Mar 7, 2024

-        self.init_lr = optimizers[0].param_groups[0]["lr"]
+        # Capture initial_lr
+        for optimizer in optimizers:
+             for param_group in optimizer.param_groups:
+                  param_group.setdefault('initial_lr', param_group["lr"])

Fix multiple lr scheduler (warmup scheduler) & Add adaptive_patience to the lr scheduler #3035

Fix multiple lr scheduler (warmup scheduler) & Add adaptive_patience to the lr scheduler #3035

Conversation

sungmanc commented Mar 5, 2024 • edited Loading

Summary

How to test

Checklist

License

harimkang left a comment

Choose a reason for hiding this comment

vinnamkim Mar 7, 2024

Choose a reason for hiding this comment

sungmanc Mar 7, 2024

Choose a reason for hiding this comment

vinnamkim Mar 7, 2024

Choose a reason for hiding this comment

vinnamkim Mar 7, 2024 • edited Loading

Choose a reason for hiding this comment

vinnamkim Mar 7, 2024

Choose a reason for hiding this comment

sungmanc Mar 7, 2024

Choose a reason for hiding this comment

sungmanc commented Mar 7, 2024

harimkang commented Mar 7, 2024

sungmanc commented Mar 7, 2024

sungmanc commented Mar 7, 2024

sungmanc commented Mar 5, 2024 •

edited

Loading

vinnamkim Mar 7, 2024 •

edited

Loading