Treat bottoms and params more uniformly? #1474

longjon · 2014-11-25T04:30:31Z

#1471 reminded me of something that's been weighing on my mind lately.

Caffe is designed with a hard distinction between "data blobs", the intermediate results of computation, and "parameter blobs", the variables on which gradient descent is performed.

This distinction is artificially imposed on the inputs to layers, which are just functions (with derivatives). For example, inner product layer performs a matrix multiply, C = AB, where B must be a bottom, C must be a top, and A must be a parameter blob. Meanwhile, @mcheshkov wants to compute the same function, except with A and B both as bottoms.

A layer should really just be a function with its first derivative, leaving you free to compose layers into functions (i.e., nets) which have free variables (parameters) and bound variables (intermediate data). The distinction between bottoms and parameters is really the business of Net and Solver rather than Layer, as it's only after composition that one learns which variables are free.

You can find some symptoms of this imposition in the code, e.g., we have both propagate_down and param_propagate_down. It'll also come up when generating nets from Python.

Obviously changing this would be rather major. I don't want to propose any specific course of action at this time, but just to make a note and let other minds compute.

The text was updated successfully, but these errors were encountered:

mcheshkov · 2014-11-25T05:52:20Z

Off-topic: well, not exactly the same function. Inner product layer treat bottom blob as matrix of N times CxWxH, and top as matrix of N times num_output. If we take two bottoms as matrices and multiply them we get matrix of N times N. What I want to get is vector of N elements - for each bottom vectors pair in batch dot product of them. That's main diagonal, but not the whole matrix.

longjon · 2014-11-25T06:03:26Z

(@mcheshkov, yeah, you're right.)

sguada · 2014-11-27T09:16:54Z

@longjon I think it is a good idea

Yangqing · 2015-01-07T04:47:21Z

+1

longjon mentioned this issue Jan 7, 2015

Switch LayerType to string for extensibility #1685

Closed

longjon mentioned this issue Jan 15, 2015

Python net specification #1733

Closed

longjon mentioned this issue Mar 9, 2015

Add parameter layer for learning any bottom #2079

Merged

longjon mentioned this issue Mar 20, 2015

Add option to take parameters from bottoms #2166

Closed

longjon added the enhancement label May 11, 2015

This was referenced Jun 16, 2015

Python net specification #2086

Merged

Improve/enhance Python net specification #2621

Open

This was referenced Aug 21, 2015

Caffemodel snapshots with shared weights don't have multiple copies #2946

Closed

Give the python layer parameter/weight blobs. #2944

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Treat bottoms and params more uniformly? #1474

Treat bottoms and params more uniformly? #1474

longjon commented Nov 25, 2014

mcheshkov commented Nov 25, 2014

longjon commented Nov 25, 2014

sguada commented Nov 27, 2014

Yangqing commented Jan 7, 2015

Treat bottoms and params more uniformly? #1474

Treat bottoms and params more uniformly? #1474

Comments

longjon commented Nov 25, 2014

mcheshkov commented Nov 25, 2014

longjon commented Nov 25, 2014

sguada commented Nov 27, 2014

Yangqing commented Jan 7, 2015