Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modular model definitions #1290

Closed
wants to merge 2 commits into from
Closed

Modular model definitions #1290

wants to merge 2 commits into from

Conversation

cypof
Copy link
Member

@cypof cypof commented Oct 15, 2014

A way to import a protobuf model into another one, by specifying a layer of type IMPORT. We went for a file system analogy, to allow referencing layers and blobs from different parts of the model:

  • Imports are renamed to “name of the import layer/name”, e.g. “conv_pool1/relu”.
  • Imported layers can reference layers and blobs from the importing network as a parent folder, e.g. “../data”, or using an absolute path: “/data”.
  • By default names are relative references, so existing network definitions are fine. They will resolve all objects in the root folder.

Imports can be configured using ${variables}, that are applied during load using simple string replace.

We modified mnist/lenet_train_test.prototxt as an example, by exporting the Conv/Pool part as a module that is imported twice.

In lenet_train_test.prototxt:


...
layers {
  name: "cp1"
  type: IMPORT
  import_param {
    net: "examples/mnist/lenet_conv_pool.prototxt"
    var { name: "bottom" value: "/data" }
    var { name: "num_output" value: "20" }
  }
}
layers {
  name: "cp2"
  type: IMPORT
  import_param {
    net: "examples/mnist/lenet_conv_pool.prototxt"
    var { name: "bottom" value: "../cp1/pool" }
    var { name: "num_output" value: "50" }
  }
}
…

lenet_conv_pool.prototxt:


layers {
  name: "conv"
  type: CONVOLUTION
  bottom: "${bottom}"
  top: "conv"
  blobs_lr: 1
  blobs_lr: 2
  convolution_param {
    num_output: ${num_output}
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layers {
  name: "pool"
  type: POOLING
  bottom: "conv"
  top: "pool"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}

CHECK(parse) << "Failed to parse NetParameter file: " << import.net();
CHECK(layer.has_name() && layer.name().length() > 0)
<< "Import layer must have a name";
LoadImports(net, target, ResolveImportName(layer.name(), pwd));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like this recursion checks for self-inclusion. That is, if foo.prototxt imports bar.prototxt, and bar.prototxt imports foo.prototxt, the recursion doesn't end until the stack overflows.

@jyegerlehner
Copy link
Contributor

I'm a bit confused by the variable naming scheme. How come in the lenet example, cp1 can refer to its bottom blob as just /data, where the blob name doesn't have to be qualified with the containing layer name, whereas cp2 has to specify the "path" to the blob with the layer name as ../cp2/pool? It seems like cp1 should also have to specify the name of the containing layer, so instead of /data, it would be /mnist/data.

I don't quite get why ../cp2/pool has the ../ at the beginning. cp2 is already at the top level, not being nested below anything. So cp2 being at the top, ../cp2 wouldn`t seem to be a valid path.

@cypof
Copy link
Member Author

cypof commented Oct 21, 2014

Using different paths for cp1 and 2 was only to show the two ways a module
can reference a parent object. Both data and cp1 are at the root, so using
an absolute path / or the relative path ../ from a layer inside cp2 is
equivalent.

I think adding modules from the same file would be great. No pblm with
renaming to module. Maybe having two fields instead of 'net', like 'file'
and 'reference' for external and internal definitions?


Reply to this email directly or view it on GitHub
#1290 (comment).

@@ -252,6 +252,7 @@ message LayerParameter {
HINGE_LOSS = 28;
IM2COL = 11;
IMAGE_DATA = 12;
IMPORT = 39;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use case of this layer is very similar to the HTML template framework commonly used in web development where the variables are usually replaced by data fetched from the databases. Calling it TEMPLATE layer sounds more natural.

@futurely
Copy link

Think about how definitions are reused in C++. Header files are "include"d and namespaces are "use"d . So instead of "file" and "reference", "include" and "use" are more familiar choices.

@shelhamer
Copy link
Member

@cypof well done! I like the file system analogy for naming and the general variable substitution since it encompasses any variation to bottoms / tops, configuration fields, and weight sharing. While all-in-one nets de-duped related definitions, there is still plenty of duplication within model definitions like the convolution + pooling pairs you've highlighted.

To keep in line with unified model definitions one could ideally define a module as (1) a group of layers in the same prototxt or (2) a separate prototxt file for import as suggested by @jyegerlehner. (1) could be done by introducing a ModuleParameter in caffe.proto to list layers just like in NetParameter.

Naming the layer MODULE or TEMPLATE should make the purpose the most clear. As you brought up the (1) same-net modules and (2) file import cases could be distinguished by the field names. use and include suggested by @futurely or module and file all sound reasonable.

Thanks for bundling an example and test. To clarify the example usage, you could include comments to explain the absolute / relative naming inline in the definition.

@jeffdonahue @longjon and I are all in deadline mode for CVPR but certainly interested in richer and modular net definitions.

@@ -269,7 +269,7 @@ endif

# Debugging
ifeq ($(DEBUG), 1)
COMMON_FLAGS += -DDEBUG -g -O0
COMMON_FLAGS += -DDEBUG -g -O0 -DBOOST_NOINLINE='__attribute__ ((noinline))'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-DBOOST_NOINLINE='__attribute__ ((noinline)) what's the story here?

@jyegerlehner
Copy link
Contributor

@cypof

Using different paths for cp1 and 2 was only to show the two ways a module can reference a
parent

I understood the example is showing both relative and absolute ways of naming. My question was about something different, but doesn't seem like it's confusing anyone else.

I pulled your dev fork and ran your modulized-lenet and all seems well.

Not sure if the intent is to add the ModuleParameter scheme to this PR, as has been discussed above and in #1169. If not, I can try adding that after this PR is merged. Or if our maintainers decide adding that is a prerequisite to merging, I could conceivably submit a PR to your fork.

@sguada
Copy link
Contributor

sguada commented Nov 28, 2014

@cypof let's use .module extension for all the prototxt that are intended to be imported.
@cypof could you clean the unnecessary commits and make sure it can be merged?

@@ -65,6 +65,18 @@ void WriteProtoToBinaryFile(const Message& proto, const char* filename) {
CHECK(proto.SerializeToOstream(&output));
}

string ReadFile(const string& filename) {
std::ifstream in(filename.c_str(), std::ios::in | std::ios::binary);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use std::ifstream in(filename.c_str(), std::ios::in | std::ios::binary | std::ios::ate); to open the file and seek to the end of file at the same time.

@sguada
Copy link
Contributor

sguada commented Nov 30, 2014

@cypof I have defined GoogleNet using modules and it is much cleaner and shorter than what I did before #1317 #1367 #1106
@shelhamer Let's merge soon and I will upload the prototxt and modules needed to train GoogleNet.

@longjon
Copy link
Contributor

longjon commented Dec 1, 2014

I don't want to retard this momentum, but I have some reservations about this (which I've discussed with @shelhamer previously):

  • This PR turns our net specification language from something well-defined (a protocol buffer) into something less clear (and not defined anywhere): protocol buffers, plus some file inclusion and string substitution. Why might that be a problem?
    • What happens when you copy the net definition, but forget the module files? Now: one net, one file. This PR: one net, many files.
    • What happens when you want to process a net definition with external tools? (E.g., to draw a graph, count the layers, make some kind of transformation...) Now: read with any protobuf library. This PR: Implement your own parser, or link against caffe (or some part of caffe).
    • How is this language going to evolve in the future? It's easy to ignore an unknown field in a protobuf. It's not so easy to ignore a new syntax.
  • String substitution is already straightforward using other tools (e.g., shell, perl). Do we need our own implementation, with its own limitations and bugs?
  • String substitution has its pitfalls: e.g., lack of typing pushes errors until after parsing, modules no longer have to even resemble valid syntax. What about recursive expansion? What about escaping special characters? What happens when we need to generalize so that these substitutions involve some computation? Suddenly we're writing an interpreter, and it's not a pretty sight, because it's based on string substitution in protocol buffers.

I know modularity is important and currently we don't provide it and some are ready to use this PR right now, so I don't want to stall this effort, but I also don't feel comfortable committing to this path. Maybe we can merge to a feature branch?

(Personally my view is that rather than confining our net definition language to protobuf, or some hacks on top of protobuf, we should treat protobuf as a human-readable intermediate language, and provide interfaces ("DSLs") in real languages (at least Python) for building nets.)

@cypof
Copy link
Member Author

cypof commented Dec 1, 2014

@shelhame @longjon has some good points. Where are we on the effort to define nets in python? Does it still makes sense to finish this one?

@shelhamer
Copy link
Member

@cypof I do agree that @longjon's points block merge to canonical Caffe and
that the protobuf substrate + DSL on top approach is right. Sorry it took
us a while to converge on this idea, but your PR helped spark the planning.

Right now one can wield the Python protobuf bindings to make a net like
@kmatzen's GoogLeNet but it's quite DIY.

My proposal is to make a caffe.model submodule of pycaffe with helpers and
primitives. Reference models and the zoo could include both generation code
and the serialized prototxt to not force Python. @longjon and all, thoughts?
On Mon, Dec 1, 2014 at 13:42 Cyprien Noel notifications@github.com wrote:

@shelhame @longjon https://github.com/longjon has some good points.
Where are we on the effort to define nets in python? Does it still makes
sense to finish this one?


Reply to this email directly or view it on GitHub
#1290 (comment).

@sguada
Copy link
Contributor

sguada commented Dec 2, 2014

@longjon I understand your concerns, however currently there is no a good alternative to avoid redundant prototxt, which is prone to errors.
On the other hand requiring python to be able to build nets using modules seems a bit high.

What about creating tool within Caffe that could read templates and generate the prototxt files, while the python come up to speed? This would keep Caffe Networks as pure prototxt, and serve as another tool to generate networks.

@longjon
Copy link
Contributor

longjon commented Dec 2, 2014

@sguada I'm happy with the idea of providing tools to produce prototxts, as it's decoupled from defining the input language, and I'm happy with the idea of providing stopgap tools before things are eventually done the right way.

But shell is already a better tool for this than what we have here, as far I can tell.

Witness:

#!/bin/bash

read -d '' MODULE << END
layers {
  name: 'example'
  type: BORING
}
END

function module_func {
cat << END
layers {
  name: 'functional'
  type: EXCITING
  amount_of_awesome: $1
}
END
}

cat << END
layers {
  name: 'input'
}
$MODULE
$MODULE
$(module_func "'a lot'")
END

which produces, as you would expect:

layers {
  name: 'input'
}
layers {
  name: 'example'
  type: BORING
}
layers {
  name: 'example'
  type: BORING
}
layers {
  name: 'functional'
  type: EXCITING
  amount_of_awesome: 'a lot'
}

Everyone has shell, and you can do anything you want instead of just string substitution, while the latter is still easy. Or you can do the same in Perl or Python if that's more your taste...

@sguada
Copy link
Contributor

sguada commented Dec 2, 2014

@longjon thanks for your illustrative bash example, however I think writing scripts that know nothing about protobuf will become cumbersome pretty quickly. The chances of introducing errors is high and it will easily need to become a protobuf parser.
I will try to write a simple tool within Caffe, that allows only string substitution and don't allow recursion, as starting point, and anyone will be welcome to extend it, or port it to other languages.

@cypof
Copy link
Member Author

cypof commented Dec 2, 2014

@sguada do you plan to extract the import code in a separate tool? The Net.Init could share this code. Sorry for not helping finishing this one, right now I'm trying to get ImageNet results on the distrib training PR.

@shelhamer
Copy link
Member

@cypof that's a good division of labor. We're gearing up to take a closer
look at parallelization too.

On Tue, Dec 2, 2014 at 10:43 AM, Cyprien Noel notifications@github.com
wrote:

@sguada https://github.com/sguada do you plan to extract the import
code in a separate tool? The Net.Init could share this code. Sorry for not
helping finishing this one, right now I'm trying to get ImageNet results on
the distrib training PR.


Reply to this email directly or view it on GitHub
#1290 (comment).

@sguada
Copy link
Contributor

sguada commented Dec 2, 2014

@cypof if you don't mind I will reuse pieces of your code, but renaming it as TEMPLATE instead of IMPORT. However I will make it a separate tool that doesn't interfere with Net.Init(). That way Caffe Nets definition will not need to know anything about expanding templates.
So the idea is to have a prototxt with TEMPLATE layers that refer to prototxt.template and use this tool to expand the templates and create a new prototxt with all the template expanded. That new prototxt could be then be used to train a model.

Usage:

expand_templates net_proto_file_in net_proto_file_out

@cypof I'm glad to hear that you are working on the distrib training PR. Let's us know when we can take a look.

@longjon
Copy link
Contributor

longjon commented Dec 2, 2014

@sguada:

[...] however I think writing scripts that know nothing about protobuf will become cumbersome pretty quickly. The chances of introducing errors is high and it will easily need to become a protobuf parser.

That's exactly my point. Unless I've badly misread it, the code that's doing module insertion here knows exactly as much as shell about protobuf, i.e., nothing. (Okay, it parses modules before insertion, so you can be slightly more abusive with shell in ways no reasonable person would attempt, but string substitution is performed on strings, not any parsed thing, just like shell.)

@sguada
Copy link
Contributor

sguada commented Dec 2, 2014

@longjon partially agreed, the module insertion only need to worry about one layer, the rest is untouch and parsed against prototxt, and also after the string substitution the net is parsed again which will allow to discover errors sooner.

Just to be clear, I'm not against adding other tools to achieve similar results, and this tool could be deprecated in the future if it is superseded by others.

@futurely
Copy link

futurely commented Dec 5, 2014

To get an inception module with something similar with the one liner of Torch7 inception_module(2, 480, {{192}, {96, 208}, {16, 48}, {3, 64}}), the only thing needed is a custom module parameter.

message InceptionModuleParameter {
  repeated string bottom = 1; // the name of the bottom blobs
  repeated string top = 2; // the name of the top blobs
  optional string name = 3; // the layer name
  optional int32 depth_dim = 4,
  optional int32 input_size = 5,
  repeated int32 num_output_feature_maps = 6,
}

message NetParameter {
  ...
  repeated InceptionModuleParameter inception_modules = 1000; // a bunch of inception_modules.
}

Then, an inception module equivalent to the above Torch7 example can be defined as simple as the following.

inception_modules {
  bottom: "some_layer",
  top: "some_inception_module",
  name: "some_inception_module_name",
  depth_dim: 2,
  input_size: 480,
  num_output_feature_maps: 192,
  num_output_feature_maps: 96,
  num_output_feature_maps: 208,
  num_output_feature_maps: 16,
  num_output_feature_maps: 48,
  num_output_feature_maps: 3,  
  num_output_feature_maps: 64
}

The module definition can be expanded into more verbose layer definitions by an InceptionModuleExpander.
This is not meant to be a general purpose solution.

@sguada
Copy link
Contributor

sguada commented Dec 19, 2014

Using templates #1518 one can define Inceptions layers as

# Inception (3a)
layers {
  name: "inception_3a"
  type: TEMPLATE
  template_param {
    source: "models/bvlc_googlenet/inception.template"
    variable { name: "input"      value: "/pool2/3x3_s2"}
    variable { name: "1x1"        value: "64"}
    variable { name: "3x3_reduce" value: "96"}
    variable { name: "3x3"        value: "128"}
    variable { name: "5x5_reduce" value: "16"}
    variable { name: "5x5"        value: "32"}
    variable { name: "pool_proj"  value: "32"}
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants