Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(optimizer): change "filter_bias_and_bn" to "weight_decay_filter" #752

Merged
merged 3 commits into from
Jan 17, 2024

Conversation

sageyou
Copy link
Collaborator

@sageyou sageyou commented Dec 9, 2023

Thank you for your contribution to the MindCV repo.
Before submitting this PR, please make sure:

Motivation

Fix the bug with weight decay when set filter_bias_and_bn:

  1. When filter_bias_and_bn is True: function init_group_params does not set value of weight decay for no_decay_params, in this case, mindspore will use weight_decay of optimizer (usually not 0.0).
  2. When filter_bias_and_bn is False: mindspore will automatically filter BatchNorm params from weight_decay.

So the name of the argument is not the same as what it actually does. And we can never filter out the param of bias and norm layer from doing weight decay, as the name filter_bias_and_bn.

Due to this, we refactor it to weight_decay_filter:

  • "disable": No parameters to filter.
  • "auto": We do not apply weight decay filtering to any parameters. However, MindSpore currently automatically filters the parameters of Norm layer from weight decay.
  • "norm_and_bias": Filter the paramters of Norm layer and Bias from weight decay.

How do I migrate from an old configuration?

filter_bias_and_bn weight_decay_filter
True "disable"
False "auto"

Keep in mind that filter_bias_and_bn doesn't do what its name suggests.

BTW, we also support get no_weight_decay list from model and layer_decay.

Test Plan

(How should this PR be tested? Do you require special setup to run the test or repro the fixed bug?)

Related Issues and PRs

(Is this PR part of a group of changes? Link the other relevant PRs and Issues here. Use https://help.github.com/en/articles/closing-issues-using-keywords for help on GitHub syntax)

@sageyou sageyou force-pushed the weight_decay_filter branch 3 times, most recently from 0a0553c to dd93647 Compare December 15, 2023 02:20
@sageyou sageyou force-pushed the weight_decay_filter branch from dd93647 to c28e3e6 Compare December 27, 2023 02:22
@sageyou sageyou force-pushed the weight_decay_filter branch 2 times, most recently from 4132fd8 to b468028 Compare January 15, 2024 08:55
@sageyou sageyou force-pushed the weight_decay_filter branch from b468028 to 4b82c0b Compare January 15, 2024 09:53
@geniuspatrick geniuspatrick changed the title Replace "filter_bias_and_bn" by "weight_decay_filter" to fix and be comptatibilty with the previous bug refacetor: replace "filter_bias_and_bn" by "weight_decay_filter" to fix and be comptatibilty with the previous bug Jan 17, 2024
@geniuspatrick geniuspatrick changed the title refacetor: replace "filter_bias_and_bn" by "weight_decay_filter" to fix and be comptatibilty with the previous bug refactor(optimizer): change "filter_bias_and_bn" to "weight_decay_filter" Jan 17, 2024
weight_decay_filter: filters to filter parameters from weight_decay.
- "disable": No parameters to filter.
- "auto": We do not apply weight decay filtering to any parameters. However, MindSpore currently
automatically filters the parameters of Norm layer from weight decay.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be specific. For 'auto' weight decay, Norm layer , which kind of norm layers will be applied with weight decay? BatchNorm only or including LayerNorm

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the norm layers BatchNorm and LayerNorm.

@geniuspatrick geniuspatrick merged commit 062fe1a into mindspore-lab:main Jan 17, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants