-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use new snapshotting? #35
Comments
It's in the ./lib/fast-rcnn/config.py |
In that file I can change the time between snapshots and snapshot infix but nothing on using the snapshot during training. Would I just change in the solver.prototxt the snapshot number to reference the current snapshot? |
@xksteven I guess you would like to do the validation during the training? |
@WilsonWangTHU I'm running it on a cluster with a certain time limit and it will kill my process at certain time intervals. I just want to be able to restart the training from that snapshot. |
I have the same problem. |
How to restart the training from a snapshot? Can anyone provide some tips? Thanks. |
Make the following modifications and you will be able to use the --snapshot argument In tools/train_net.py def parse_args(): """ Parse input arguments """ parser = argparse.ArgumentParser(description='Train a Fast R-CNN network') parser.add_argument('--gpu', dest='gpu_id', help='GPU device id to use [0]', default=0, type=int) parser.add_argument('--solver', dest='solver', help='solver prototxt', default=None, type=str) parser.add_argument('--iters', dest='max_iters', help='number of iterations to train', default=40000, type=int) parser.add_argument('--weights', dest='pretrained_model', help='initialize with pretrained model weights', default=None, type=str) parser.add_argument('--snapshot', dest='previous_state', help='initialize with previous state', default=None, type=str) parser.add_argument('--cfg', dest='cfg_file', help='optional config file', default=None, type=str) parser.add_argument('--imdb', dest='imdb_name', help='dataset to train on', default='voc_2007_trainval', type=str) parser.add_argument('--rand', dest='randomize', help='randomize (do not use a fixed seed)', action='store_true') parser.add_argument('--set', dest='set_cfgs', help='set config keys', default=None, nargs=argparse.REMAINDER) In lib/fast_rcnn/train.py class SolverWrapper(object): """A simple wrapper around Caffe's solver. This wrapper gives us control over he snapshotting process, which we use to unnormalize the learned bounding-box regression weights. """ def __init__(self, solver_prototxt, roidb, output_dir, pretrained_model=None, previous_state=None): """Initialize the SolverWrapper.""" self.output_dir = output_dir print 'Computing bounding-box regression targets...' self.bbox_means, self.bbox_stds = \ rdl_roidb.add_bbox_regression_targets(roidb) print 'done' self.solver = caffe.SGDSolver(solver_prototxt) if pretrained_model is not None: print ('Loading pretrained model ' 'weights from {:s}').format(pretrained_model) self.solver.net.copy_from(pretrained_model) elif previous_state is not None: print ('Restoring State from ' ' from {:s}').format(previous_state) self.solver.restore(previous_state) self.solver_param = caffe_pb2.SolverParameter() with open(solver_prototxt, 'rt') as f: pb2.text_format.Merge(f.read(), self.solver_param) self.solver.net.layers[0].set_roidb(roidb) . . . def train_net(solver_prototxt, roidb, output_dir, pretrained_model=None, max_iters=40000,previous_state=None): """Train a Fast R-CNN network.""" sw = SolverWrapper(solver_prototxt, roidb, output_dir, pretrained_model=pretrained_model,previous_state=previous_state) print 'Solving...' sw.train_model(max_iters) print 'done solving' |
Thanks for the code but how to save the solverstate during fast r-cnn training? It looks like the method Solver::SnapshotSolverState isn't exported to pycaffe... |
Did you change "snapshot: 0" to "snapshot: 10000" in your solver.prototxt? That allows you to save the state at iteration 10000 for example. |
Ah, thanks! Didn't think of that... |
@lynetcha, one more modification: In tools/train_net.py
also remember to omit --weights param |
hi @po0ya what if I don't save the extra file for the last layer weights? would be bad mAP after retraining? |
Hello @twmht Basically it'll mess up the whole network if you want to continue training. The network is trained to work for zero mean and unit variance bboxes. For test time convenience, the weights and bias of the last layer is scaled by the std and shifted by the mean. If it has not been done, the prediction should've been scaled and shifted manually. It's for convenience in testing time, but the weights are not the ones that were learned by backprop, so retraining with these weights would be meaningless for the network. EDIT: Add these couple of lines to the end of SolverWrapper constructor init
|
…tate file (--snapshot /a/b/c.solverstate) (rbgirshick/fast-rcnn#35) solver.cpp is modified according to the master branch of caffe, seemed that miscrosoft made some changes that prevented restoring multiple solvers
…tate file (--snapshot /a/b/c.solverstate) (rbgirshick/fast-rcnn#35) solver.cpp is modified according to the master branch of caffe, seemed that miscrosoft made some changes that prevented restoring multiple solvers
@po0ya but aren't the weights (*.caffemodel) that are saved by the default solver already normalized (because they were never unnormalizied, because the caffemodel was not saved using provided snapshot functionality). So I guess the produced *.solverstate is linked to the *.caffemodel model that was not produced by the faster rcnn snapshot function. Using resuming functionality you get 2 versions of caffemodel, the one provided by the default solver snapshot and the one provided by the snapshot function in faster r-cnn that the weights are unnormalized before saving. So I guess that normalization is not needed. |
Net params in snapshot function in SolverWrapper is first unnormalized, saved and restored with normalized version. So the param version is up to when the snapshot in Caffe is called. I didn't dig the code of Caffe, but I think disabling snapshot in Actually, I look into the log and found that the Caffe snapshot is called before snapshot in So we can resume the |
fast-rcnn doesn't take as an argument --snapshot so I'm not sure how to use a snapshot.
I'm asking because in the /models/VGG16/solver.prototxt it has this :
"We disable standard caffe solver snapshotting and implement our own snapshot"
Thanks
The text was updated successfully, but these errors were encountered: