Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding multi-layer perceptron in ops #6053
Adding multi-layer perceptron in ops #6053
Changes from 6 commits
f6ba25f
d39df6d
eb30cb8
1dfc312
39921e8
743457d
9356107
007ecf3
48d178d
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annotations in docstring make me sad :'(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know how you feel above this. :( I think all the callables all over TorchVision are added like that to provide info on what they are supposed to return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not very clear for me why there's a Dropout layer after the last layer. I saw that it was present in the previous MLPBlock class, but no other implementation of MLP with dropout (that I could find) has a dropout layer on the output. Including the one in the multimodal package.
Maybe this was something specific for the usecase of MLPBlock? If so, this should not be in this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right there are various implementations of MLP, some of which don't have at all dropout, some have at the middle but not at the end or some have everywhere. If you check the references, you will see that all patterns exist. Our implementation is like that because it replaces MLP layers used in existing models like ViT and Swin. We also try to support more complex variations with more than 2 linear layers. Your observation is correct though that if one wanted to avoid having dropout at the end, the current implementation wouldn't let them. Since that variant is also valid, perhaps it's worth making this update in a non-BC way with a new boolean that controls the appearance of Dropout at the end or not. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your suggestion sounds very nice. What would be the default value be for the boolean? I guess that setting it to True (with dropout) would cause no breaking changes. At the same time, I would say that not having a dropout in the last layer is more common (default) configuration? Also, I'd be intested in working on this, whichever option is chosen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right we will need to maintain BC. Note that using True is the "default" setup on TorchVision at the moment as literally all existing models require dropout everywhere.
Sounds great, let me recommend the following. Could you start an issue, summarizing what you said here and providing a few references of the usage of MLP with a middle dropout but without the final one? Providing a few examples from real-world vision architectures will help build a stronger case. Once we clarify the details on the issue, we can discuss a potential PR. 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok! I am a bit short on time at the moment, but will have more time in the upcoming weeks. Nevertheless, I'm interested in this and will be working on it!