How to use new snapshotting? #35

xksteven · 2015-07-10T16:37:24Z

fast-rcnn doesn't take as an argument --snapshot so I'm not sure how to use a snapshot.

I'm asking because in the /models/VGG16/solver.prototxt it has this :
"We disable standard caffe solver snapshotting and implement our own snapshot"

Thanks

WilsonWangTHU · 2015-08-05T13:27:54Z

It's in the ./lib/fast-rcnn/config.py

xksteven · 2015-08-06T05:44:14Z

In that file I can change the time between snapshots and snapshot infix but nothing on using the snapshot during training.

Would I just change in the solver.prototxt the snapshot number to reference the current snapshot?

WilsonWangTHU · 2015-08-07T11:12:48Z

@xksteven I guess you would like to do the validation during the training?
I am not sure whether that's supported by the current fast-rcnn edition, as all the forward job is started from the python part and I don't think we have a testing function during the training for now.
I am afraid in that way you might need to revise the code yourself.

xksteven · 2015-08-07T15:19:23Z

@WilsonWangTHU
You know when using caffe you can provide the snapshot option such as -snapshot=model_iter_xxx.solverstate to restart the training from that point? Normally in caffe the solverstate and the caffemodel saved as model_iter_xxx.caffemodel are both in the same directory but with fast-rcnn I only see the caffemodel saved in the output/default/imdb_trainval. I'd like to be able to restart the training using those weights stored there.

I'm running it on a cluster with a certain time limit and it will kill my process at certain time intervals. I just want to be able to restart the training from that snapshot.

kyuusaku · 2015-09-23T01:31:17Z

I have the same problem.

kyuusaku · 2015-09-23T01:33:52Z

How to restart the training from a snapshot? Can anyone provide some tips? Thanks.

IdiosyncraticDragon · 2015-10-14T07:06:03Z

@kyuusaku @xksteven I have met the same problem, do you guys get some effective solutions?Thanks

lynetcha · 2015-11-15T19:58:35Z

Make the following modifications and you will be able to use the --snapshot argument

In tools/train_net.py

    def parse_args():
        """
        Parse input arguments
        """
        parser = argparse.ArgumentParser(description='Train a Fast R-CNN network')
        parser.add_argument('--gpu', dest='gpu_id',
                            help='GPU device id to use [0]',
                            default=0, type=int)
        parser.add_argument('--solver', dest='solver',
                            help='solver prototxt',
                            default=None, type=str)
        parser.add_argument('--iters', dest='max_iters',
                            help='number of iterations to train',
                            default=40000, type=int)
        parser.add_argument('--weights', dest='pretrained_model',
                            help='initialize with pretrained model weights',
                            default=None, type=str)
        parser.add_argument('--snapshot', dest='previous_state',
                            help='initialize with previous state',
                            default=None, type=str) 
        parser.add_argument('--cfg', dest='cfg_file',
                            help='optional config file',
                            default=None, type=str)
        parser.add_argument('--imdb', dest='imdb_name',
                            help='dataset to train on',
                            default='voc_2007_trainval', type=str)
        parser.add_argument('--rand', dest='randomize',
                            help='randomize (do not use a fixed seed)',
                            action='store_true')
        parser.add_argument('--set', dest='set_cfgs',
                            help='set config keys', default=None,
                            nargs=argparse.REMAINDER)

In lib/fast_rcnn/train.py

        class SolverWrapper(object):
            """A simple wrapper around Caffe's solver.
            This wrapper gives us control over he snapshotting process, which we
            use to unnormalize the learned bounding-box regression weights.
            """
            def __init__(self, solver_prototxt, roidb, output_dir,
                         pretrained_model=None, previous_state=None):
                """Initialize the SolverWrapper."""
                self.output_dir = output_dir
                print 'Computing bounding-box regression targets...'
                self.bbox_means, self.bbox_stds = \
                        rdl_roidb.add_bbox_regression_targets(roidb)
                print 'done'
                self.solver = caffe.SGDSolver(solver_prototxt)
                if pretrained_model is not None:
                    print ('Loading pretrained model '
                           'weights from {:s}').format(pretrained_model)
                    self.solver.net.copy_from(pretrained_model)
                 elif previous_state is not None:
                    print ('Restoring State from '
                              ' from {:s}').format(previous_state)
                    self.solver.restore(previous_state)
                self.solver_param = caffe_pb2.SolverParameter()
                with open(solver_prototxt, 'rt') as f:
                    pb2.text_format.Merge(f.read(), self.solver_param)
                self.solver.net.layers[0].set_roidb(roidb)
.
.
.
def train_net(solver_prototxt, roidb, output_dir,
              pretrained_model=None, max_iters=40000,previous_state=None):
    """Train a Fast R-CNN network."""
    sw = SolverWrapper(solver_prototxt, roidb, output_dir,
                       pretrained_model=pretrained_model,previous_state=previous_state)
    print 'Solving...'
    sw.train_model(max_iters)
    print 'done solving'

chrert · 2016-01-19T11:28:54Z

Thanks for the code but how to save the solverstate during fast r-cnn training? It looks like the method Solver::SnapshotSolverState isn't exported to pycaffe...

lynetcha · 2016-01-21T02:35:15Z

Did you change "snapshot: 0" to "snapshot: 10000" in your solver.prototxt? That allows you to save the state at iteration 10000 for example.

chrert · 2016-01-21T14:35:05Z

Ah, thanks! Didn't think of that...

smichalowski · 2016-03-11T23:28:06Z

@lynetcha, one more modification:

In tools/train_net.py

    output_dir = get_output_dir(imdb)
    print 'Output will be saved to `{:s}`'.format(output_dir)

    train_net(args.solver, roidb, output_dir,
              pretrained_model=args.pretrained_model,
              max_iters=args.max_iters, **previous_state=args.previous_state**)

also remember to omit --weights param

twmht · 2016-08-27T16:48:06Z

hi @po0ya

what if I don't save the extra file for the last layer weights? would be bad mAP after retraining?

po0ya · 2016-08-29T15:05:10Z

Hello @twmht

Basically it'll mess up the whole network if you want to continue training. The network is trained to work for zero mean and unit variance bboxes. For test time convenience, the weights and bias of the last layer is scaled by the std and shifted by the mean. If it has not been done, the prediction should've been scaled and shifted manually. It's for convenience in testing time, but the weights are not the ones that were learned by backprop, so retraining with these weights would be meaningless for the network.

EDIT: Add these couple of lines to the end of SolverWrapper constructor init

        found = False
        for k in net.params.keys():
            if 'bbox_pred' in k:
                bbox_pred = k
                found = True
            print('[#] Renormalizing the final layers back')
            net.params[bbox_pred][0].data[4:, :] = \
                (net.params[bbox_pred][0].data[4:, :] *
                 1.0 / self.bbox_stds[4:, np.newaxis])
            net.params[bbox_pred][1].data[4:] = \
                    (net.params[bbox_pred][1].data - self.bbox_means)[4:] * 1.0 / self.bbox_stds[4:]
        if not found:
            print('Warning layer \"bbox_pred\" not found')

…tate file (--snapshot /a/b/c.solverstate) (rbgirshick/fast-rcnn#35) solver.cpp is modified according to the master branch of caffe, seemed that miscrosoft made some changes that prevented restoring multiple solvers

ds2268 · 2017-11-17T07:41:14Z

@po0ya but aren't the weights (*.caffemodel) that are saved by the default solver already normalized (because they were never unnormalizied, because the caffemodel was not saved using provided snapshot functionality). So I guess the produced *.solverstate is linked to the *.caffemodel model that was not produced by the faster rcnn snapshot function. Using resuming functionality you get 2 versions of caffemodel, the one provided by the default solver snapshot and the one provided by the snapshot function in faster r-cnn that the weights are unnormalized before saving. So I guess that normalization is not needed.

misssprite · 2018-05-23T10:56:04Z

Net params in snapshot function in SolverWrapper is first unnormalized, saved and restored with normalized version. So the param version is up to when the snapshot in Caffe is called.

I didn't dig the code of Caffe, but I think disabling snapshot in solver.prototxt and manually calling solver.snapshot() will be better to control exactly which version is snapshotted.

Actually, I look into the log and found that the Caffe snapshot is called before snapshot in SolverWrapper. diff the params file shows that Caffe snapshot indeed save a different (normalized) version than SolverWrapper. Manually invocation of solver.snapshop obtained a identical .caffemodel.

So we can resume the .solverstate safely without unnormalizing the parameters with Caffe snapshot. But this produces two version of '.caffemodel's. It's up to you to snapshot which version of parameters.

ericromanenghi mentioned this issue Feb 29, 2016

Resume training from solverstate? rbgirshick/py-faster-rcnn#77

Open

daf11865 mentioned this issue May 30, 2016

strange detection result depend on bbox_pred rbgirshick/py-faster-rcnn#186

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use new snapshotting? #35

How to use new snapshotting? #35

xksteven commented Jul 10, 2015

WilsonWangTHU commented Aug 5, 2015

xksteven commented Aug 6, 2015

WilsonWangTHU commented Aug 7, 2015

xksteven commented Aug 7, 2015

kyuusaku commented Sep 23, 2015

kyuusaku commented Sep 23, 2015

IdiosyncraticDragon commented Oct 14, 2015

lynetcha commented Nov 15, 2015

chrert commented Jan 19, 2016

lynetcha commented Jan 21, 2016

chrert commented Jan 21, 2016

smichalowski commented Mar 11, 2016

twmht commented Aug 27, 2016 •

edited

Loading

po0ya commented Aug 29, 2016 •

edited

Loading

ds2268 commented Nov 17, 2017 •

edited

Loading

misssprite commented May 23, 2018 •

edited

Loading

How to use new snapshotting? #35

How to use new snapshotting? #35

Comments

xksteven commented Jul 10, 2015

WilsonWangTHU commented Aug 5, 2015

xksteven commented Aug 6, 2015

WilsonWangTHU commented Aug 7, 2015

xksteven commented Aug 7, 2015

kyuusaku commented Sep 23, 2015

kyuusaku commented Sep 23, 2015

IdiosyncraticDragon commented Oct 14, 2015

lynetcha commented Nov 15, 2015

chrert commented Jan 19, 2016

lynetcha commented Jan 21, 2016

chrert commented Jan 21, 2016

smichalowski commented Mar 11, 2016

twmht commented Aug 27, 2016 • edited Loading

po0ya commented Aug 29, 2016 • edited Loading

ds2268 commented Nov 17, 2017 • edited Loading

misssprite commented May 23, 2018 • edited Loading

twmht commented Aug 27, 2016 •

edited

Loading

po0ya commented Aug 29, 2016 •

edited

Loading

ds2268 commented Nov 17, 2017 •

edited

Loading

misssprite commented May 23, 2018 •

edited

Loading