-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Feature: add DropBlock layer #5416
Conversation
💊 CI failures summary and remediationsAs of commit 2af4162 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiaohu2015 Awesome work as always. Thanks a lot!
The implementation looks very nice overall. I've added a couple of comments concerning nits and boilerplate code needed for ops in TorchVision. My only real question is on the way you do the sampling on bernoulli, see below.
Finally, would you be interested in implementing the 3d version on the same PR? We want to start offering better video support and this contribution will help us offer better models. If you prefer, we can also do it on a follow up.
Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com>
Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com>
Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiaohu2015 The changes look good to me. I've pushed to your branch to fix one remaining linter issue. Let's see if the CI passes now.
@xiaohu2015 All tests pass. 🚀 🚀 On the positive side, your implementation seems correct to me, fully aligns with the paper. Though some of the implementation details differ from the references you provided, after doing a deep dive I believe that your approach is equivalent. Some limited testing also shows that the implementation works as intended. On the negative side, the behaviour of the method is not fully tested (see points at #5416 (comment)). Moreover, we haven't trained a model to show-case that using the layer leads to the expected increase. What is your bandwidth at this point? Do you have the capacity to do the above 2 checks prior merging? Do you think it's something you would do on a follow up PR after merge? It's a bit hard to provide support on this one due to limited bandwidth on our side because of the upcoming release and other work on the pipeline. Another approach could be to ping a couple of other regular contributors to see if they can help us test it. Let me know your preference/thoughts. Thanks! |
@datumbox OK. I will do the two checks before merging. I plan to use resnet50 to test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @xiaohu2015.
I've added one last optional test improvement for your consideration below. The testing strategy looks good to me. Let me know when the training checks is completed to merge.
Thanks a lot as always for the awesome contribution!
I just run 90-epoch Resnet50, maybe I can get result next day. |
@xiaohu2015 Awesome. The paper uses 90 epochs for the baseline and 270 epochs for those with Dropblock (see page 4, section 4.1 on paper). If you don't have enough compute to run this, ping me on slack and we can work something out. FYI the last update broke the linter. |
Here is my branch for testing the ResNet50 with DropBlock: #5483 I've hardcoded changes on the LR scheduler and the ResNet. We are not going to use scheduled DropBlock:
I'll let you know about the results once I got them. |
@xiaohu2015 I got good news. The addition of your DropBlock seems to improve the accuracy. Here is the baseline ResNet50 accuracy for a 90-epoch training without DropBlock: Here is the one if we add DropBlock on Groups 3&4 wo/ scheduling (implementation details here): Both the baseline and DropBlock values are different from the ones reported on the paper (baseline 76.51 and DropBlock ~77.5), but I think that's the result of other minor differences that we didn't replicate (hyper-params, small architectural improvements etc). I think we have sufficient evidence that your implementation work as expected and we can merge the PR. Thanks so much for the awesome contribution and work! |
Summary: * Create dropblock.py * add dropblock2d * fix pylint * refactor dropblock * add dropblock * Rename dropblock.py to drop_block.py * fix pylint * add dropblock * add dropblock3d * add drop_block3d * add dropblock * Update drop_block.py * Update torchvision/ops/drop_block.py * Update torchvision/ops/drop_block.py * Update torchvision/ops/drop_block.py * Update torchvision/ops/drop_block.py * Update drop_block.py * Update drop_block.py * import torch.fx * fix lint * fix lint * Update drop_block.py * improve dropblock * add dropblock * refactor dropblock * fix doc * remove the limitation of block_size * Update torchvision/ops/drop_block.py * fix lint * fix lint * add dropblock * Fix linter * add dropblock random check * reduce test time * Update test_ops.py * speed the dropblock test * fix lint Reviewed By: datumbox Differential Revision: D34579502 fbshipit-source-id: 635df25cfc1dfc4433f38e673e2ee8fd2af96409 Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com> Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com> Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com> Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com> Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com> Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com>
As discussed in #5414, I am working to add DropBlock layer to Torchvision.
we add DropBlock2d and DropBlock3d module.
Reference