refactor(optimizer): change "filter_bias_and_bn" to "weight_decay_filter" #752
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Thank you for your contribution to the MindCV repo.
Before submitting this PR, please make sure:
Motivation
Fix the bug with weight decay when set
filter_bias_and_bn
:filter_bias_and_bn
isTrue
: functioninit_group_params
does not set value of weight decay forno_decay_params
, in this case, mindspore will useweight_decay
of optimizer (usually not 0.0).filter_bias_and_bn
isFalse
: mindspore will automatically filter BatchNorm params from weight_decay.So the name of the argument is not the same as what it actually does. And we can never filter out the param of bias and norm layer from doing weight decay, as the name
filter_bias_and_bn
.Due to this, we refactor it to
weight_decay_filter
:"disable"
: No parameters to filter."auto"
: We do not apply weight decay filtering to any parameters. However, MindSpore currently automatically filters the parameters of Norm layer from weight decay."norm_and_bias"
: Filter the paramters of Norm layer and Bias from weight decay.How do I migrate from an old configuration?
True
"disable"
False
"auto"
BTW, we also support get no_weight_decay list from model and layer_decay.
Test Plan
(How should this PR be tested? Do you require special setup to run the test or repro the fixed bug?)
Related Issues and PRs
(Is this PR part of a group of changes? Link the other relevant PRs and Issues here. Use https://help.github.com/en/articles/closing-issues-using-keywords for help on GitHub syntax)