Issues within the learning rate schedule and optimizer initialization #122

Adamusen · 2024-11-11T15:51:31Z

Describe the bug

There are three parameter groups defined in yolo/utils/model_utils.py/create_optimizer. One for biases, one for batch_norm weights and one for the convolutional weights. In the stable implementation, the learning rate for each of these groups is the same during training, they only differ in their weight_decay (only the conv weights have it), however, in the current implementation the bias learning rate starts hight during warmup, then quickly converges towards zero right after the warmup epochs, basically freezing the bias values after a couple of epochs! This issue stems most likely from Line 79 in yolo/utils/model_utils.py/create_optimizer:
optimizer.max_lr = [0.1, 0, 0], where the max_lr is initialized differently for the bias parameters group.

An additional problem is, that the momentum values for the optimizer are hardcoded to be 0.8 instead of using the value found in train.yaml, which is 0.937 in the following code snippet (Lines 55-57):

        {"params": bias_params, "momentum": 0.8, "weight_decay": 0},
        {"params": conv_params, "momentum": 0.8},
        {"params": norm_params, "momentum": 0.8, "weight_decay": 0},

To Reproduce

Steps to reproduce the behavior:

Train a network with tensorboard enabled.
Observe the learning rate of the parameter groups.

Expected behavior

The bias learning rate not to converge towards zero early.

Screenshots

Learning rate schedule for the three parameter group in the current implementation:

Learning rate schedule for the same parameter groups in the "stable" implementation:

Proposed solution

Set Line 79 to optimizer.max_lr = [0, 0, 0] for the learning rates to be identical.
Remove the fix momentum values from Line 55-57, in which case the optimizer is initialized with value provided in train.yaml

The text was updated successfully, but these errors were encountered:

henrytsui000 · 2024-11-11T16:46:00Z

Hi

Generally, the warm-up learning rate of the bias term is 0.1 and the others are 0, they will align to 0.01 after the warm-up epoch.
This is my learning rate curve in wandb, it seems to work regularly now, can you tell me your configuration?

$python yolo/lazy.py task=train # the basic configuration

best regards,
HenryTsui

Adamusen · 2024-11-11T17:08:53Z

Hey,

Yes, I modified the learning rate in train.yaml as "lr: 0.001" from 0.01 and "end_factor: 0.1" from 0.01 for the training on the first screenshot. All else learning rate schedule related should be the same.

Edit: I will check the exact value in tensorboard, to which my bias learning rate converged tomorrow (from the screenshot I can't tell if it's 0.001 or less at this point).

Edit2: It looks like in the screenshot I set the learning rate even to 0.0005. I'm sorry for the confusion, I was trying out different values to see if I get better convergence. I will double check all of this tomorrow.

Adamusen · 2024-11-12T07:59:13Z

Hi @henrytsui000 ,

I am sorry, it is my bad. The bias learning rate indeed converged to the same value as the remaining two groups. It just looked zero like compared to the 0.1 start:

I close this issue, as it was a mistake.

Adamusen added the bug Something isn't working label Nov 11, 2024

Adamusen mentioned this issue Nov 11, 2024

🐛 [Fix] learning rate schedule and momentum value #123

Closed

Adamusen changed the title ~~Isses within the learning rate schedule and optimizer initialization~~ Issues within the learning rate schedule and optimizer initialization Nov 11, 2024

Adamusen mentioned this issue Nov 11, 2024

Issues with BoxMatcher and loss #103

Open

Adamusen closed this as completed Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues within the learning rate schedule and optimizer initialization #122

Issues within the learning rate schedule and optimizer initialization #122

Adamusen commented Nov 11, 2024

henrytsui000 commented Nov 11, 2024

Adamusen commented Nov 11, 2024 •

edited

Loading

Adamusen commented Nov 12, 2024

Issues within the learning rate schedule and optimizer initialization #122

Issues within the learning rate schedule and optimizer initialization #122

Comments

Adamusen commented Nov 11, 2024

Describe the bug

To Reproduce

Expected behavior

Screenshots

Proposed solution

henrytsui000 commented Nov 11, 2024

Adamusen commented Nov 11, 2024 • edited Loading

Adamusen commented Nov 12, 2024

Adamusen commented Nov 11, 2024 •

edited

Loading