-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default value of \beta (args.reg_weight) and \gamma (args.prior_weight) wrong. #6
Comments
Hi! Thank you for your attention on our work! Yes you are right, the default parameters in args are not what we described in the paper. That is because training with the "Sparse Object Prior" term is quite slow (I have to use for-loop to iterate over batch input to calculate s_i log(s_i)). Thus, it would be prohibitive to train BiDet with this term from scratch (may need 3-4 days to one week). Besides, we find out that training with only the original loc and cls loss at the beginning, while adding the "Sparse Object Prior" and "I(X; F)" regularization term later results in equally good performances. Specifically, we first train with \beta=0 and \gamma=0 using learning_rate=1e-3 for several epochs, and then set \beta=0.1 and \gamma=0.2 at the first lr decay step (that's what the code here does). Also, I have to point out that the "args.reg_weight" is actually not the same as \beta in IB formula. As you can see, args.reg_weight is multiplied with I(X; F) in the code, while \beta is multiplied with I(F; C, L) in our IB formula. So here args.reg_weight = \frac{1}{\beta}, and that's why I set args.reg_weight=0.1 instead of 10 in the code (0.1 = \frac{1}{10}). Hope my answer can help you! |
Thank you for your quick reply. I do have to mention that there is still a bug in the uploaded code because while args.reg_weight and args.prior_weight is set to a non-zero value some iterations later. the predicate used in calculating the respective losses are global variables REGULARIZATION_LOSS_WEIGHT and PRIOR_LOSS_WEIGHT which are set to 0. and never changed. Although this is quite a straightforward fix, this makes me think that the uploaded code wasn't actually tested to see if it reproduced the results. Also, I get a division by zero error on L283 in train_bidet_ssd.py after fixing the predicates using the global variables. It seems that loss_count is set to zero and never incremented in the for-loop that calculates the prior loss. It would be really helpful if someone could look into this. |
Thank you for your careful inspection of our code. For the first question, yes I have to admit that it's truly a bug. When I tested the code before uploading it to GitHub, I only ran it from a pre-trained weight and set For the second problem, loss_count is actually increased in the for-loop here. I guess the "division by zero" problem may be due to that the model hasn't been trained enough for prior loss calculation. As you can see, there are many conditions that the code will skip the Again, thank you for your time on our work and code. I really apologize for the bugs in it and I should definitely test the code much more carefully the next time. |
In the paper, it is described that \beta and \gamma were set to 10 and 0.2. However, the given code uses 0 as the default values for both of them, essentially meaning that the code is going to be ran without the proposed additional loss terms. I would appreciate it if the authors could provide some clarification on how \beta and \gamma were actually used during training.
The text was updated successfully, but these errors were encountered: