-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net.cpp now allows zero-sized batches #2053
Conversation
@jeffdonahue I see that somebody is tagging PR as "ready for review". Can you tag also this one? |
I also have a use case for this PR, so I vote for making caffe aware of this. However, force-reshaping all the top blobs doesn't quite seem right to me; it seems to me like this would be better handled when reshape gets called on each layer, right before the forward pass. That way, the layer can decide for itself if it wants all the top blobs to have zero batch size. Then the forward/backward would only be skipped if all bottom and top blobs have zero batch size. Another potential improvement to this PR is to make the solvers aware of backwardIsAllowed() and not update the associated parameters. Right now, it seems like momentum and weight decay would still be applied. |
47169e3
to
40f4f6c
Compare
rebased on Master |
@jeffdonahue This is the last one of the triplet for filter_layer. Can you pass here? |
Ping |
Agreed with @cdoersch -- this is a bit too aggressive in its assumptions about what each layer might want to do in the event of size 0 batches (for example, the output shouldn't necessarily have the same shape as the input as assumed here, and often doesn't), and should probably operate on the level of individual layers rather than the net. Furthermore the net itself doesn't and shouldn't have any global notion of batch size, and the assumption that "batch size" is the 0th axis of each blob in the net isn't valid (at least, not anymore). On a more mundane note, code in |
A more general solution would be to skip blobs with 0 entries, since any blob with 0 along its first axis has 0 total entries. One open question would be how to resize all the blobs after the 0-sized blob, if at all. Another thought: |
Could anyone tell me how empty top/bottom blobs are handled right now? Is forward/backward prevented for 0 size batches? All I know is that when I feed in empty bottoms (as a result of the FilterLayer) to a loss layer (SoftmaxWithLoss) Caffe crashes with cudnn bad param error. |
@hyojinie unlikely to resume this |
This is the new PR based on old #1484 which was not mergeable anymore
Old description: