Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat bottoms and params more uniformly? #1474

Open
longjon opened this issue Nov 25, 2014 · 4 comments
Open

Treat bottoms and params more uniformly? #1474

longjon opened this issue Nov 25, 2014 · 4 comments

Comments

@longjon
Copy link
Contributor

longjon commented Nov 25, 2014

#1471 reminded me of something that's been weighing on my mind lately.

Caffe is designed with a hard distinction between "data blobs", the intermediate results of computation, and "parameter blobs", the variables on which gradient descent is performed.

This distinction is artificially imposed on the inputs to layers, which are just functions (with derivatives). For example, inner product layer performs a matrix multiply, C = AB, where B must be a bottom, C must be a top, and A must be a parameter blob. Meanwhile, @mcheshkov wants to compute the same function, except with A and B both as bottoms.

A layer should really just be a function with its first derivative, leaving you free to compose layers into functions (i.e., nets) which have free variables (parameters) and bound variables (intermediate data). The distinction between bottoms and parameters is really the business of Net and Solver rather than Layer, as it's only after composition that one learns which variables are free.

You can find some symptoms of this imposition in the code, e.g., we have both propagate_down and param_propagate_down. It'll also come up when generating nets from Python.

Obviously changing this would be rather major. I don't want to propose any specific course of action at this time, but just to make a note and let other minds compute.

@mcheshkov
Copy link

Off-topic: well, not exactly the same function. Inner product layer treat bottom blob as matrix of N times CxWxH, and top as matrix of N times num_output. If we take two bottoms as matrices and multiply them we get matrix of N times N. What I want to get is vector of N elements - for each bottom vectors pair in batch dot product of them. That's main diagonal, but not the whole matrix.

@longjon
Copy link
Contributor Author

longjon commented Nov 25, 2014

(@mcheshkov, yeah, you're right.)

@sguada
Copy link
Contributor

sguada commented Nov 27, 2014

@longjon I think it is a good idea

@Yangqing
Copy link
Member

Yangqing commented Jan 7, 2015

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants