`ExperimentHparams` class; Set `state.train_dataloader` #966

ravi-mosaicml · 2022-04-28T01:03:59Z

Added in an ExperimentHparams class. This class describes how to run a training job that may have multiple calls to Trainer.fit and/or Trainer.eval. Specifically, ExperimentHparams.initialize_object() returns a (Trainer, List[FitKwargs], List[EvalKwargs]) tuple, that then the user's entrypoint can consome.
This class does not automatically train the model, nor does it include an entrypoint.
Added typing definitions for FitKwargs and EvalKwargs, along with test cases to ensure they stay in sync with the Trainer signature.
Fix an bug introduced in Multiple calls to Trainer.fit() #948, which removed the setting of State.train_dataloader. Added back the lines to correctly set the train dataloader.

…ps_per_epoch`. 1. Made the `state.dataloader` optional, since it will not be provided on `__init__` as part of mosaicml#40. 2. Binding the active dataloader to the state on `Event.FIT_START`, and switching the dataloader to each evaluation dataloader before `Event.EVAL_START`. Restoring the previous (training) dataloader after `Event.EVAL_END`. 3. Moved `Event.EVAL_START` and `Event.EVAL_END` to run for each evaluator, instead of once for all evaluators. With mosaicml#40, the eval() will take in a dataloader, which would then require `Event.EVAL_START` and `Event.EVAL_END`. This change also permits for algorithms that wish to modify (each) evalution dataloader. 4. Moved scaling of the LR schedulers to `Trainer.fit()` before `Event.FIT_START` fires. Schedulers will be passed in on `Trainer.fit()` as part of mosaicml#40. 5. Removed `steps_per_epoch` as part of the state. Instead, algorithms and callbacks can read len(state.dataloader) directly. While this change will make schedulers no longer accurate when using `train_subset_num_batches`, that flag should only be used for performance measurements. As such, it is not necessarry that SSR behaves correctly for performance runs. Added a warning for the `train_subset_num_batches` field. Implements the first part of mosaicml#40. Closes mosaicml#363.

It can be useful for algorithms and callbacks to know which dataloader is active, so added the `dataloader_label` to the state. Removed `evaluators` from state, as nothing is using that anymore.

…er into ravi/optional_dataloader

…()`. Preferable to keep variables on the state object rather than as trainer members, where appropriate. Before, the state.schedulers was empty after `__init__()` but before `fit()`. Now, state.schedulers contains the compiled composer schedulers or original pytorch schedulers. Restored optimizers on Event.INIT; it will be a bigger issue to rewrite algs to not depend on optimizers on init.

* Removed `precision_context` from state * Switched `train_subset_num_batches` and `eval_subset_num_batches` to use `-1` as the default value instead of `None`.

…er into ravi/optional_dataloader

ravi-mosaicml · 2022-05-09T18:05:20Z

would using ExerimentHparams require changes to the entrypoints such as https://github.com/mosaicml/composer/blob/dev/examples/run_composer_trainer.py ? Currently, everything is initialized from trainerhparams.

This is a non-breaking change; all existing trainer hparams will work as-is.

…b.com:ravi-mosaicml/ravi-composer into experiment_hparams

A-Jacobson

improves the codebase - approved!

ravi-mosaicml added 30 commits March 25, 2022 16:51

Restored dataloader_len on state

558f279

Fixed tests

f5d0a1b

Merge branch 'dev' into i40_1

13c6e53

Added dataloader_label; removed evaluators from State

e4facaa

It can be useful for algorithms and callbacks to know which dataloader is active, so added the `dataloader_label` to the state. Removed `evaluators` from state, as nothing is using that anymore.

Merge branch 'dev' into i40_1

f3f47ea

Fixed pyright

56fd87d

Fixed pyright

b89c3bc

Merge branch 'dev' into ravi/optional_dataloader

79a3094

Made max_duration optional

096b44c

Merge branch 'ravi/optional_dataloader' of github.com:mosaicml/compos…

d102506

…er into ravi/optional_dataloader

Addressed PR feedback; fixed Time type annotations

fe70133

Merge branch 'dev' into ravi/optional_dataloader

7e6e7cf

Fixed doctests

11544bc

Fixed selective backprop

1313a49

Inceased timeout

b5b6192

Merge branch 'dev' into ravi/optional_dataloader

ecc4a59

Remove optimizers from state on init; clean up PR

d3408ba

Merge branch 'dev' into ravi/optional_dataloader

7f00806

Merge branch 'dev' into ravi/optional_dataloader

a4b2697

Fixed the deepspeed schedulers

54d6b88

Merge branch 'dev' into ravi/optional_dataloader

f067ed7

Multiple calls to fit/eval WIP

ebb9fe5

Merge branch 'dev' into trainer_fit_eval_signature

3237c9d

WIP

af7bc21

Merge branch 'dev' into ravi/optional_dataloader

59af154

* Addressed PR Feedback

49e68e8

* Removed `precision_context` from state * Switched `train_subset_num_batches` and `eval_subset_num_batches` to use `-1` as the default value instead of `None`.

Merge branch 'ravi/optional_dataloader' of github.com:mosaicml/compos…

93dd36f

…er into ravi/optional_dataloader

Merge branch 'ravi/optional_dataloader' into trainer_fit_eval_signature

7d3a073

ravi-mosaicml added 3 commits May 9, 2022 09:17

Merge branch 'dev' into trainer_fit_eval_signature

96fdb93

Merge branch 'trainer_fit_eval_signature' into experiment_hparams

32e47f3

Merge branch 'trainer_fit_eval_signature' into experiment_hparams

14e5ab2

ravi-mosaicml added 4 commits May 9, 2022 12:02

Fixed tests

5bcea0a

Fix trainer hparams so defaults match init signature

763ca10

Merge branch 'dev' into trainer_fit_eval_signature

94e0476

Addressed PR feedback

f5b518f

ravi-mosaicml mentioned this pull request May 9, 2022

Multiple calls to Trainer.fit() #948

Merged

ravi-mosaicml added 2 commits May 9, 2022 16:54

Merge branch 'trainer_fit_eval_signature' into experiment_hparams

2e68d37

Merge branch 'dev' into experiment_hparams

7f04987

ravi-mosaicml requested review from A-Jacobson, hanlint and eracah May 10, 2022 00:11

ravi-mosaicml added 6 commits May 9, 2022 17:31

Merge branch 'dev' into experiment_hparams

962dcef

Merge branch 'dev' into experiment_hparams

4a21e15

Merge branches 'experiment_hparams' and 'experiment_hparams' of githu…

0387db9

…b.com:ravi-mosaicml/ravi-composer into experiment_hparams

Merge branch 'dev' into experiment_hparams

1236040

Added back the missing state.train_dataloader

c26d6bf

Fix tests

d3689af

ravi-mosaicml requested a review from a team as a code owner May 11, 2022 15:27

ravi-mosaicml force-pushed the experiment_hparams branch from f8bd71b to d3689af Compare May 11, 2022 15:30

Trigger jenkins

e3daec3

A-Jacobson approved these changes May 11, 2022

View reviewed changes

ravi-mosaicml added 2 commits May 11, 2022 13:11

Merge branch 'dev' into experiment_hparams

69078cc

Fixed doctests

e206207

ravi-mosaicml changed the title ~~ExperimentHparams class~~ ExperimentHparams class; Set state.train_dataloader May 11, 2022

ravi-mosaicml enabled auto-merge (squash) May 11, 2022 20:25

ravi-mosaicml merged commit e85302b into mosaicml:dev May 11, 2022

ravi-mosaicml deleted the experiment_hparams branch May 11, 2022 22:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ExperimentHparams` class; Set `state.train_dataloader` #966

`ExperimentHparams` class; Set `state.train_dataloader` #966

ravi-mosaicml commented Apr 28, 2022 •

edited

Loading

ravi-mosaicml commented May 9, 2022

A-Jacobson left a comment

ExperimentHparams class; Set state.train_dataloader #966

ExperimentHparams class; Set state.train_dataloader #966

Conversation

ravi-mosaicml commented Apr 28, 2022 • edited Loading

ravi-mosaicml commented May 9, 2022

A-Jacobson left a comment

Choose a reason for hiding this comment

`ExperimentHparams` class; Set `state.train_dataloader` #966

`ExperimentHparams` class; Set `state.train_dataloader` #966

ravi-mosaicml commented Apr 28, 2022 •

edited

Loading