Modular model definitions #1290

cypof · 2014-10-15T22:26:34Z

A way to import a protobuf model into another one, by specifying a layer of type IMPORT. We went for a file system analogy, to allow referencing layers and blobs from different parts of the model:

Imports are renamed to “name of the import layer/name”, e.g. “conv_pool1/relu”.
Imported layers can reference layers and blobs from the importing network as a parent folder, e.g. “../data”, or using an absolute path: “/data”.
By default names are relative references, so existing network definitions are fine. They will resolve all objects in the root folder.

Imports can be configured using ${variables}, that are applied during load using simple string replace.

We modified mnist/lenet_train_test.prototxt as an example, by exporting the Conv/Pool part as a module that is imported twice.

In lenet_train_test.prototxt:


...
layers {
  name: "cp1"
  type: IMPORT
  import_param {
    net: "examples/mnist/lenet_conv_pool.prototxt"
    var { name: "bottom" value: "/data" }
    var { name: "num_output" value: "20" }
  }
}
layers {
  name: "cp2"
  type: IMPORT
  import_param {
    net: "examples/mnist/lenet_conv_pool.prototxt"
    var { name: "bottom" value: "../cp1/pool" }
    var { name: "num_output" value: "50" }
  }
}
…

lenet_conv_pool.prototxt:


layers {
  name: "conv"
  type: CONVOLUTION
  bottom: "${bottom}"
  top: "conv"
  blobs_lr: 1
  blobs_lr: 2
  convolution_param {
    num_output: ${num_output}
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layers {
  name: "pool"
  type: POOLING
  bottom: "conv"
  top: "pool"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}

jyegerlehner · 2014-10-20T07:10:29Z

src/caffe/net.cpp

+      CHECK(parse) << "Failed to parse NetParameter file: " << import.net();
+      CHECK(layer.has_name() && layer.name().length() > 0)
+          << "Import layer must have a name";
+      LoadImports(net, target, ResolveImportName(layer.name(), pwd));


Doesn't look like this recursion checks for self-inclusion. That is, if foo.prototxt imports bar.prototxt, and bar.prototxt imports foo.prototxt, the recursion doesn't end until the stack overflows.

jyegerlehner · 2014-10-20T08:21:40Z

I'm a bit confused by the variable naming scheme. How come in the lenet example, cp1 can refer to its bottom blob as just /data, where the blob name doesn't have to be qualified with the containing layer name, whereas cp2 has to specify the "path" to the blob with the layer name as ../cp2/pool? It seems like cp1 should also have to specify the name of the containing layer, so instead of /data, it would be /mnist/data.

I don't quite get why ../cp2/pool has the ../ at the beginning. cp2 is already at the top level, not being nested below anything. So cp2 being at the top, ../cp2 wouldn`t seem to be a valid path.

cypof · 2014-10-21T01:09:43Z

Using different paths for cp1 and 2 was only to show the two ways a module
can reference a parent object. Both data and cp1 are at the root, so using
an absolute path / or the relative path ../ from a layer inside cp2 is
equivalent.

I think adding modules from the same file would be great. No pblm with
renaming to module. Maybe having two fields instead of 'net', like 'file'
and 'reference' for external and internal definitions?

—
Reply to this email directly or view it on GitHub
#1290 (comment).

futurely · 2014-10-21T16:39:05Z

src/caffe/proto/caffe.proto

@@ -252,6 +252,7 @@ message LayerParameter {
    HINGE_LOSS = 28;
    IM2COL = 11;
    IMAGE_DATA = 12;
+    IMPORT = 39;


The use case of this layer is very similar to the HTML template framework commonly used in web development where the variables are usually replaced by data fetched from the databases. Calling it TEMPLATE layer sounds more natural.

futurely · 2014-10-21T16:44:58Z

Think about how definitions are reused in C++. Header files are "include"d and namespaces are "use"d . So instead of "file" and "reference", "include" and "use" are more familiar choices.

shelhamer · 2014-10-21T17:09:05Z

@cypof well done! I like the file system analogy for naming and the general variable substitution since it encompasses any variation to bottoms / tops, configuration fields, and weight sharing. While all-in-one nets de-duped related definitions, there is still plenty of duplication within model definitions like the convolution + pooling pairs you've highlighted.

To keep in line with unified model definitions one could ideally define a module as (1) a group of layers in the same prototxt or (2) a separate prototxt file for import as suggested by @jyegerlehner. (1) could be done by introducing a ModuleParameter in caffe.proto to list layers just like in NetParameter.

Naming the layer MODULE or TEMPLATE should make the purpose the most clear. As you brought up the (1) same-net modules and (2) file import cases could be distinguished by the field names. use and include suggested by @futurely or module and file all sound reasonable.

Thanks for bundling an example and test. To clarify the example usage, you could include comments to explain the absolute / relative naming inline in the definition.

@jeffdonahue @longjon and I are all in deadline mode for CVPR but certainly interested in richer and modular net definitions.

shelhamer · 2014-10-21T17:09:39Z

Makefile

@@ -269,7 +269,7 @@ endif

 # Debugging
 ifeq ($(DEBUG), 1)
-	COMMON_FLAGS += -DDEBUG -g -O0
+	COMMON_FLAGS += -DDEBUG -g -O0 -DBOOST_NOINLINE='__attribute__ ((noinline))'


-DBOOST_NOINLINE='__attribute__ ((noinline)) what's the story here?

jyegerlehner · 2014-10-22T03:01:43Z

@cypof

Using different paths for cp1 and 2 was only to show the two ways a module can reference a
parent

I understood the example is showing both relative and absolute ways of naming. My question was about something different, but doesn't seem like it's confusing anyone else.

I pulled your dev fork and ran your modulized-lenet and all seems well.

Not sure if the intent is to add the ModuleParameter scheme to this PR, as has been discussed above and in #1169. If not, I can try adding that after this PR is merged. Or if our maintainers decide adding that is a prerequisite to merging, I could conceivably submit a PR to your fork.

sguada · 2014-11-28T10:59:46Z

@cypof let's use .module extension for all the prototxt that are intended to be imported.
@cypof could you clean the unnecessary commits and make sure it can be merged?

sguada · 2014-11-30T13:42:50Z

src/caffe/util/io.cpp

@@ -65,6 +65,18 @@ void WriteProtoToBinaryFile(const Message& proto, const char* filename) {
  CHECK(proto.SerializeToOstream(&output));
 }

+string ReadFile(const string& filename) {
+  std::ifstream in(filename.c_str(), std::ios::in | std::ios::binary);


You can use std::ifstream in(filename.c_str(), std::ios::in | std::ios::binary | std::ios::ate); to open the file and seek to the end of file at the same time.

sguada · 2014-11-30T13:47:42Z

@cypof I have defined GoogleNet using modules and it is much cleaner and shorter than what I did before #1317 #1367 #1106
@shelhamer Let's merge soon and I will upload the prototxt and modules needed to train GoogleNet.

longjon · 2014-12-01T08:59:29Z

I don't want to retard this momentum, but I have some reservations about this (which I've discussed with @shelhamer previously):

This PR turns our net specification language from something well-defined (a protocol buffer) into something less clear (and not defined anywhere): protocol buffers, plus some file inclusion and string substitution. Why might that be a problem?
- What happens when you copy the net definition, but forget the module files? Now: one net, one file. This PR: one net, many files.
- What happens when you want to process a net definition with external tools? (E.g., to draw a graph, count the layers, make some kind of transformation...) Now: read with any protobuf library. This PR: Implement your own parser, or link against caffe (or some part of caffe).
- How is this language going to evolve in the future? It's easy to ignore an unknown field in a protobuf. It's not so easy to ignore a new syntax.
String substitution is already straightforward using other tools (e.g., shell, perl). Do we need our own implementation, with its own limitations and bugs?
String substitution has its pitfalls: e.g., lack of typing pushes errors until after parsing, modules no longer have to even resemble valid syntax. What about recursive expansion? What about escaping special characters? What happens when we need to generalize so that these substitutions involve some computation? Suddenly we're writing an interpreter, and it's not a pretty sight, because it's based on string substitution in protocol buffers.

I know modularity is important and currently we don't provide it and some are ready to use this PR right now, so I don't want to stall this effort, but I also don't feel comfortable committing to this path. Maybe we can merge to a feature branch?

(Personally my view is that rather than confining our net definition language to protobuf, or some hacks on top of protobuf, we should treat protobuf as a human-readable intermediate language, and provide interfaces ("DSLs") in real languages (at least Python) for building nets.)

cypof · 2014-12-01T18:42:11Z

@shelhame @longjon has some good points. Where are we on the effort to define nets in python? Does it still makes sense to finish this one?

shelhamer · 2014-12-01T18:54:44Z

@cypof I do agree that @longjon's points block merge to canonical Caffe and
that the protobuf substrate + DSL on top approach is right. Sorry it took
us a while to converge on this idea, but your PR helped spark the planning.

Right now one can wield the Python protobuf bindings to make a net like
@kmatzen's GoogLeNet but it's quite DIY.

My proposal is to make a caffe.model submodule of pycaffe with helpers and
primitives. Reference models and the zoo could include both generation code
and the serialized prototxt to not force Python. @longjon and all, thoughts?
On Mon, Dec 1, 2014 at 13:42 Cyprien Noel notifications@github.com wrote:

@shelhame @longjon https://github.com/longjon has some good points.
Where are we on the effort to define nets in python? Does it still makes
sense to finish this one?

—
Reply to this email directly or view it on GitHub
#1290 (comment).

sguada · 2014-12-02T05:06:16Z

@longjon I understand your concerns, however currently there is no a good alternative to avoid redundant prototxt, which is prone to errors.
On the other hand requiring python to be able to build nets using modules seems a bit high.

What about creating tool within Caffe that could read templates and generate the prototxt files, while the python come up to speed? This would keep Caffe Networks as pure prototxt, and serve as another tool to generate networks.

longjon · 2014-12-02T08:20:28Z

@sguada I'm happy with the idea of providing tools to produce prototxts, as it's decoupled from defining the input language, and I'm happy with the idea of providing stopgap tools before things are eventually done the right way.

But shell is already a better tool for this than what we have here, as far I can tell.

Witness:

#!/bin/bash

read -d '' MODULE << END
layers {
  name: 'example'
  type: BORING
}
END

function module_func {
cat << END
layers {
  name: 'functional'
  type: EXCITING
  amount_of_awesome: $1
}
END
}

cat << END
layers {
  name: 'input'
}
$MODULE
$MODULE
$(module_func "'a lot'")
END

which produces, as you would expect:

layers {
  name: 'input'
}
layers {
  name: 'example'
  type: BORING
}
layers {
  name: 'example'
  type: BORING
}
layers {
  name: 'functional'
  type: EXCITING
  amount_of_awesome: 'a lot'
}

Everyone has shell, and you can do anything you want instead of just string substitution, while the latter is still easy. Or you can do the same in Perl or Python if that's more your taste...

sguada · 2014-12-02T18:34:48Z

@longjon thanks for your illustrative bash example, however I think writing scripts that know nothing about protobuf will become cumbersome pretty quickly. The chances of introducing errors is high and it will easily need to become a protobuf parser.
I will try to write a simple tool within Caffe, that allows only string substitution and don't allow recursion, as starting point, and anyone will be welcome to extend it, or port it to other languages.

cypof · 2014-12-02T18:43:45Z

@sguada do you plan to extract the import code in a separate tool? The Net.Init could share this code. Sorry for not helping finishing this one, right now I'm trying to get ImageNet results on the distrib training PR.

shelhamer · 2014-12-02T18:47:17Z

@cypof that's a good division of labor. We're gearing up to take a closer
look at parallelization too.

On Tue, Dec 2, 2014 at 10:43 AM, Cyprien Noel notifications@github.com
wrote:

@sguada https://github.com/sguada do you plan to extract the import
code in a separate tool? The Net.Init could share this code. Sorry for not
helping finishing this one, right now I'm trying to get ImageNet results on
the distrib training PR.

—
Reply to this email directly or view it on GitHub
#1290 (comment).

sguada · 2014-12-02T18:52:47Z

@cypof if you don't mind I will reuse pieces of your code, but renaming it as TEMPLATE instead of IMPORT. However I will make it a separate tool that doesn't interfere with Net.Init(). That way Caffe Nets definition will not need to know anything about expanding templates.
So the idea is to have a prototxt with TEMPLATE layers that refer to prototxt.template and use this tool to expand the templates and create a new prototxt with all the template expanded. That new prototxt could be then be used to train a model.

Usage:

expand_templates net_proto_file_in net_proto_file_out

@cypof I'm glad to hear that you are working on the distrib training PR. Let's us know when we can take a look.

longjon · 2014-12-02T19:55:54Z

@sguada:

[...] however I think writing scripts that know nothing about protobuf will become cumbersome pretty quickly. The chances of introducing errors is high and it will easily need to become a protobuf parser.

That's exactly my point. Unless I've badly misread it, the code that's doing module insertion here knows exactly as much as shell about protobuf, i.e., nothing. (Okay, it parses modules before insertion, so you can be slightly more abusive with shell in ways no reasonable person would attempt, but string substitution is performed on strings, not any parsed thing, just like shell.)

sguada · 2014-12-02T21:22:54Z

@longjon partially agreed, the module insertion only need to worry about one layer, the rest is untouch and parsed against prototxt, and also after the string substitution the net is parsed again which will allow to discover errors sooner.

Just to be clear, I'm not against adding other tools to achieve similar results, and this tool could be deprecated in the future if it is superseded by others.

futurely · 2014-12-05T02:56:08Z

To get an inception module with something similar with the one liner of Torch7 inception_module(2, 480, {{192}, {96, 208}, {16, 48}, {3, 64}}), the only thing needed is a custom module parameter.

message InceptionModuleParameter {
  repeated string bottom = 1; // the name of the bottom blobs
  repeated string top = 2; // the name of the top blobs
  optional string name = 3; // the layer name
  optional int32 depth_dim = 4,
  optional int32 input_size = 5,
  repeated int32 num_output_feature_maps = 6,
}

message NetParameter {
  ...
  repeated InceptionModuleParameter inception_modules = 1000; // a bunch of inception_modules.
}

Then, an inception module equivalent to the above Torch7 example can be defined as simple as the following.

inception_modules {
  bottom: "some_layer",
  top: "some_inception_module",
  name: "some_inception_module_name",
  depth_dim: 2,
  input_size: 480,
  num_output_feature_maps: 192,
  num_output_feature_maps: 96,
  num_output_feature_maps: 208,
  num_output_feature_maps: 16,
  num_output_feature_maps: 48,
  num_output_feature_maps: 3,  
  num_output_feature_maps: 64
}

The module definition can be expanded into more verbose layer definitions by an InceptionModuleExpander.
This is not meant to be a general purpose solution.

sguada · 2014-12-19T15:19:52Z

Using templates #1518 one can define Inceptions layers as

# Inception (3a)
layers {
  name: "inception_3a"
  type: TEMPLATE
  template_param {
    source: "models/bvlc_googlenet/inception.template"
    variable { name: "input"      value: "/pool2/3x3_s2"}
    variable { name: "1x1"        value: "64"}
    variable { name: "3x3_reduce" value: "96"}
    variable { name: "3x3"        value: "128"}
    variable { name: "5x5_reduce" value: "16"}
    variable { name: "5x5"        value: "32"}
    variable { name: "pool_proj"  value: "32"}
  }
}

cypof mentioned this pull request Oct 15, 2014

learn a distance function suitable for use with hinge loss in siamese network #639

Closed

sergeyk force-pushed the dev branch from 2fb4c97 to 1718903 Compare October 17, 2014 18:44

cypof force-pushed the dev branch from dc7470d to 1ed11d5 Compare October 17, 2014 21:30

cypof mentioned this pull request Oct 17, 2014

Modular model definitions: network layer, layer modules, and protobuf generation #1169

Closed

jyegerlehner reviewed Oct 20, 2014
View reviewed changes

futurely reviewed Oct 21, 2014
View reviewed changes

shelhamer reviewed Oct 21, 2014
View reviewed changes

sguada reviewed Nov 30, 2014
View reviewed changes

sguada mentioned this pull request Dec 2, 2014

Added Net Templates to easy definition of Nets #1518

Closed

shelhamer added the enhancement label Dec 30, 2014

longjon mentioned this pull request Jan 15, 2015

Python net specification #1733

Closed

cypof added 2 commits January 20, 2015 17:21

Single file LMDB, override Makefile.config location

974474c

Data queues / prefetching

cd375b5

cypof force-pushed the dev branch from f5af8fb to cd375b5 Compare January 22, 2015 00:26

cypof closed this Jan 22, 2015

cypof deleted the dev branch January 22, 2015 01:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modular model definitions #1290

Modular model definitions #1290

cypof commented Oct 15, 2014

jyegerlehner Oct 20, 2014

jyegerlehner commented Oct 20, 2014

cypof commented Oct 21, 2014

futurely Oct 21, 2014

futurely commented Oct 21, 2014

shelhamer commented Oct 21, 2014

shelhamer Oct 21, 2014

jyegerlehner commented Oct 22, 2014

sguada commented Nov 28, 2014

sguada Nov 30, 2014

sguada commented Nov 30, 2014

longjon commented Dec 1, 2014

cypof commented Dec 1, 2014

shelhamer commented Dec 1, 2014

sguada commented Dec 2, 2014

longjon commented Dec 2, 2014

sguada commented Dec 2, 2014

cypof commented Dec 2, 2014

shelhamer commented Dec 2, 2014

sguada commented Dec 2, 2014

longjon commented Dec 2, 2014

sguada commented Dec 2, 2014

futurely commented Dec 5, 2014

sguada commented Dec 19, 2014

Modular model definitions #1290

Modular model definitions #1290

Conversation

cypof commented Oct 15, 2014

jyegerlehner Oct 20, 2014

Choose a reason for hiding this comment

jyegerlehner commented Oct 20, 2014

cypof commented Oct 21, 2014

futurely Oct 21, 2014

Choose a reason for hiding this comment

futurely commented Oct 21, 2014

shelhamer commented Oct 21, 2014

shelhamer Oct 21, 2014

Choose a reason for hiding this comment

jyegerlehner commented Oct 22, 2014

sguada commented Nov 28, 2014

sguada Nov 30, 2014

Choose a reason for hiding this comment

sguada commented Nov 30, 2014

longjon commented Dec 1, 2014

cypof commented Dec 1, 2014

shelhamer commented Dec 1, 2014

sguada commented Dec 2, 2014

longjon commented Dec 2, 2014

sguada commented Dec 2, 2014

cypof commented Dec 2, 2014

shelhamer commented Dec 2, 2014

sguada commented Dec 2, 2014

longjon commented Dec 2, 2014

sguada commented Dec 2, 2014

futurely commented Dec 5, 2014

sguada commented Dec 19, 2014