-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Treat bottoms and params more uniformly? #1474
Comments
Off-topic: well, not exactly the same function. Inner product layer treat bottom blob as matrix of N times CxWxH, and top as matrix of N times num_output. If we take two bottoms as matrices and multiply them we get matrix of N times N. What I want to get is vector of N elements - for each bottom vectors pair in batch dot product of them. That's main diagonal, but not the whole matrix. |
(@mcheshkov, yeah, you're right.) |
@longjon I think it is a good idea |
+1 |
#1471 reminded me of something that's been weighing on my mind lately.
Caffe is designed with a hard distinction between "data blobs", the intermediate results of computation, and "parameter blobs", the variables on which gradient descent is performed.
This distinction is artificially imposed on the inputs to layers, which are just functions (with derivatives). For example, inner product layer performs a matrix multiply,
C = AB
, whereB
must be a bottom,C
must be a top, andA
must be a parameter blob. Meanwhile, @mcheshkov wants to compute the same function, except withA
andB
both as bottoms.A layer should really just be a function with its first derivative, leaving you free to compose layers into functions (i.e., nets) which have free variables (parameters) and bound variables (intermediate data). The distinction between bottoms and parameters is really the business of
Net
andSolver
rather thanLayer
, as it's only after composition that one learns which variables are free.You can find some symptoms of this imposition in the code, e.g., we have both
propagate_down
andparam_propagate_down
. It'll also come up when generating nets from Python.Obviously changing this would be rather major. I don't want to propose any specific course of action at this time, but just to make a note and let other minds compute.
The text was updated successfully, but these errors were encountered: