Multiple calls to `.fit` #161

ravi-mosaicml · 2021-12-15T23:25:29Z

Support multiple calls to .fit for partial training, as discussed in #138. Specifically, this PR:

Moves most initialization logic to Trainer.__init__, so it is invoked only once
Added num_epochs and num_batches as optional parameters to Trainer.fit
Significant refactor of the training loop to support the above
Added test cases (and updated the synthetic dataloader to use a deterministic generator, so the model from the tests would be identical)

Support multiple calls to .fit for partial training, as discussed in #138. Specifically, this PR: * Moves most initialization logic to `Trainer.__init__`, so it is invoked only once * Added `num_epochs` and `num_batches` as optional parameters to `Trainer.fit` * Significant refactor of the training loop to support the above * Added test cases (and updated the synthetic dataloader to use a deterministic generator, so the model from the tests would be identical)

…_multiple_fit

A-Jacobson · 2021-12-16T02:52:37Z

composer/trainer/trainer.py

@@ -565,12 +635,89 @@ def eval_subset_num_batches(self):
    def eval_subset_num_batches(self, eval_subset_num_batches: Optional[int] = None):
        self._eval_subset_num_batches = eval_subset_num_batches

-    def fit(self):
-        """Train and evaluate the model on the provided data."""
+    def fit(self, num_batches: Optional[int] = None, num_epochs: Optional[int] = None):


I like that we're moving some params to fit, but I'm conflicted about num_batches. It doesn't tell me anything about the amount of information I'm feeding to the network. An epochs is useful as it represent a full pass through the dataset, train_fraction (training on a subset of the dataset) could also be useful for debugging, learning curves, and active learning situations. I know large NLP models like tokens so perhaps @moinnadeem some opinions about this?

num_batches and num_epochs will be replaced with a duration argument once #146 is implemented. That will support tokens and a fraction of the total training duration.

(I'd prefer not to add that logic here as #146 is next on my agenda)

ravi-mosaicml · 2022-01-05T00:49:37Z

Closing this PR as we need to revisit multiple calls to .fit

ravi-mosaicml requested review from ajaysaini725, jbloxham, Averylamp, A-Jacobson and anisehsani December 15, 2021 23:25

Merge branch 'ravi/trainer_init_args_as_attributes' into ravi/trainer…

54bc331

…_multiple_fit

A-Jacobson reviewed Dec 16, 2021

View reviewed changes

ravi-mosaicml closed this Jan 5, 2022

ajaysaini725 mentioned this pull request Jan 22, 2022

eval_only flag #40

Closed

Averylamp deleted the ravi/trainer_multiple_fit branch March 10, 2022 00:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple calls to `.fit` #161

Multiple calls to `.fit` #161

ravi-mosaicml commented Dec 15, 2021

A-Jacobson Dec 16, 2021

ravi-mosaicml Dec 16, 2021

ravi-mosaicml Dec 16, 2021

ravi-mosaicml commented Jan 5, 2022

Multiple calls to .fit #161

Multiple calls to .fit #161

Conversation

ravi-mosaicml commented Dec 15, 2021

A-Jacobson Dec 16, 2021

Choose a reason for hiding this comment

ravi-mosaicml Dec 16, 2021

Choose a reason for hiding this comment

ravi-mosaicml Dec 16, 2021

Choose a reason for hiding this comment

ravi-mosaicml commented Jan 5, 2022

Multiple calls to `.fit` #161

Multiple calls to `.fit` #161