-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Ping @piiswrong in case there's additional comment.
not sure if we want to add this option, especially under this name. |
@piiswrong caffe and paddlepaddle also use name |
@szha @tornadomeet if training with use_global_stats=True, it seemed all the moving_mean = 0 and moving_var = 1 in the trained model, is is right ? then batch norm changed into a scalar shift op. what situation should use_global_stats=True be used ? |
@7oud I have the same question. I think use_global_stats=True should be used as you finetune some pretrained model such as ResNet, VGG. |
@thbupt @tornadomeet I found in some small dataset tasks such as segmentation (training from scratch), the inference result is worse than training when using BatchNorm without use_global_stats. Did you have similar situations? |
@7oud If you train from scratch, the use_global_status should be set to false in training and true in testing which is default in mxnet. |
@7oud do you mean in your small task, set if that is true, which means bn has no work in your task, so just remove bn for your network. |
@thbupt Actually I did like what you said, but the same data batch has different output when using forward(is_train=False) and forward(is_train=True), it means inference results are worse. So I try to train with use_global_status=True, it gives the same results |
@tornadomeet Is there a simple way to set all use_global_status=True as finetuning. I know one way is to set use_global_status=True for each bn layer seperately when adding nn.BatchNorm. |
@tornadomeet it seems that, but I cannot give the conclusion, bcz the dataset is too small to giving truth |
@7oud how about your batch size? bn seems to prefer large batch size. |
@7oud the correct way which using bn during training from scratch is setting just make |
@thbupt batch size in training is 8, and in inference is usually 1. |
@7oud I think 8 is too small for bn, you can try larger bz like 16, 32. |
In Gluon, do we need to set In #3871, it said that the |
@jonbakerfish that flag is automatically set by module or autograd.record. It can be queried via autograd.is_training and overridden with autograd.train_mode/predict_mode when using autograd. |
Description
#9419
Checklist
Essentials
make lint
)