Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [Fix] learning rate schedule and momentum value #123

Closed
wants to merge 1 commit into from

Conversation

Adamusen
Copy link
Contributor

Set the initial optimizer.max_lr value to be zero for each parameter group. Removed the "baked in" 0.8 optimizer momentum values, it will be initialized with the one provided in the train config file instead.

Fixes #122

Set the initial optimizer.max_lr value to be zero for each parameter group.
Removed the "baked in" 0.8 optimizer momentum values, it will be initialized with the one provided in the train config file instead.
@henrytsui000
Copy link
Collaborator

Hold on, I also do a coarse momentum schedule, but not pushed yet, you may modify it from the code:

def lerp(start: float, end: float, step: Union[int, float], total: int = 1):
    return start + (end - start) * step / total

def create_optimizer(model: YOLO, optim_cfg: OptimizerConfig) -> Optimizer:
    ...

    def next_epoch(self, batch_num, epoch_idx):
        self.min_lr = self.max_lr
        self.max_lr = [param["lr"] for param in self.param_groups]
        # TODO: load momentum from config instead a fix number
        #       0.937: Start Momentum
        #       0.8  : Normal Momemtum
        #       3    : The warm up epoch num
        self.min_mom = lerp(0.937, 0.8, max(epoch_idx, 3), 3)
        self.max_mom = lerp(0.937, 0.8, max(epoch_idx + 1, 3), 3)
        self.batch_num = batch_num
        self.batch_idx = 0

    def next_batch(self):
        self.batch_idx += 1
        lr_dict = dict()
        for lr_idx, param_group in enumerate(self.param_groups):
            min_lr, max_lr = self.min_lr[lr_idx], self.max_lr[lr_idx]
            param_group["lr"] = lerp(min_lr, max_lr, self.batch_idx, self.batch_num)
            param_group["momentum"] = lerp(self.min_mom, self.max_mom, self.batch_idx, self.batch_num)
            lr_dict[f"LR/{lr_idx}"] = param_group["lr"]
        return lr_dict
    ...

@Adamusen
Copy link
Contributor Author

Alright :)

One additional note regarding the learning rate scheduling: With the current implementation the lightning module is unable to restore the learning rate if one tries to continue an interrupted training by providing the checkpoint path for trainer.fit(model, ckpt_path=ckpt_path) in lazy.py :) (otherwise everything else is loaded properly)

@Adamusen
Copy link
Contributor Author

Closing this pull request, as you are working on this part of the code yourself anyway :)

@Adamusen Adamusen closed this Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issues within the learning rate schedule and optimizer initialization
2 participants