-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues within the learning rate schedule and optimizer initialization #122
Comments
Hey, Yes, I modified the learning rate in train.yaml as "lr: 0.001" from 0.01 and "end_factor: 0.1" from 0.01 for the training on the first screenshot. All else learning rate schedule related should be the same. Edit: I will check the exact value in tensorboard, to which my bias learning rate converged tomorrow (from the screenshot I can't tell if it's 0.001 or less at this point). Edit2: It looks like in the screenshot I set the learning rate even to 0.0005. I'm sorry for the confusion, I was trying out different values to see if I get better convergence. I will double check all of this tomorrow. |
Hi @henrytsui000 , I am sorry, it is my bad. The bias learning rate indeed converged to the same value as the remaining two groups. It just looked zero like compared to the 0.1 start: I close this issue, as it was a mistake. |
Describe the bug
There are three parameter groups defined in yolo/utils/model_utils.py/create_optimizer. One for biases, one for batch_norm weights and one for the convolutional weights. In the stable implementation, the learning rate for each of these groups is the same during training, they only differ in their weight_decay (only the conv weights have it), however, in the current implementation the bias learning rate starts hight during warmup, then quickly converges towards zero right after the warmup epochs, basically freezing the bias values after a couple of epochs! This issue stems most likely from Line 79 in yolo/utils/model_utils.py/create_optimizer:
optimizer.max_lr = [0.1, 0, 0]
, where the max_lr is initialized differently for the bias parameters group.An additional problem is, that the momentum values for the optimizer are hardcoded to be 0.8 instead of using the value found in train.yaml, which is 0.937 in the following code snippet (Lines 55-57):
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The bias learning rate not to converge towards zero early.
Screenshots
Learning rate schedule for the three parameter group in the current implementation:
Learning rate schedule for the same parameter groups in the "stable" implementation:
Proposed solution
optimizer.max_lr = [0, 0, 0]
for the learning rates to be identical.The text was updated successfully, but these errors were encountered: