[Eval-Only]: Optional timing and dataloader attributes on state; removed evaluators from the state. #832

ravi-mosaicml · 2022-03-25T23:52:09Z

Made the state.dataloader and state.max_duration optional, since they may not be provided on __init__ as part of eval_only flag #40.
Bind dataloader_len to the state, so algorithms can know how many batches to expect per epoch
Added the dataloader_label to the state, so algorithms know which dataloader is currently running.
Binding the active dataloader to the state on Event.FIT_START, and switching the dataloader to each evaluation dataloader before Event.EVAL_START. Restoring the previous (training) dataloader after Event.EVAL_END.
Moved Event.EVAL_START and Event.EVAL_END to run for each evaluator, instead of once for all evaluators. With eval_only flag #40, the eval() will take in a dataloader, which would then call Event.EVAL_START and Event.EVAL_END for each dataloader. This change also permits for algorithms that wish to modify (each) evaluation dataloader.
Moved scaling of the LR schedulers to Trainer.fit() before Event.FIT_START fires. Schedulers will be passed in on Trainer.fit() as part of eval_only flag #40.
Remove the evaluators from the state. Addresses part of Multiple Evaluator Improvements #329. The user need not set state.evaluators; instead, the trainer will keep track of the evaluators.
Removed the precision_context from state. Instead, added a helper static function to precision.py.

Implements the first part of #40.
Closes #363.

…ps_per_epoch`. 1. Made the `state.dataloader` optional, since it will not be provided on `__init__` as part of #40. 2. Binding the active dataloader to the state on `Event.FIT_START`, and switching the dataloader to each evaluation dataloader before `Event.EVAL_START`. Restoring the previous (training) dataloader after `Event.EVAL_END`. 3. Moved `Event.EVAL_START` and `Event.EVAL_END` to run for each evaluator, instead of once for all evaluators. With #40, the eval() will take in a dataloader, which would then require `Event.EVAL_START` and `Event.EVAL_END`. This change also permits for algorithms that wish to modify (each) evalution dataloader. 4. Moved scaling of the LR schedulers to `Trainer.fit()` before `Event.FIT_START` fires. Schedulers will be passed in on `Trainer.fit()` as part of #40. 5. Removed `steps_per_epoch` as part of the state. Instead, algorithms and callbacks can read len(state.dataloader) directly. While this change will make schedulers no longer accurate when using `train_subset_num_batches`, that flag should only be used for performance measurements. As such, it is not necessarry that SSR behaves correctly for performance runs. Added a warning for the `train_subset_num_batches` field. Implements the first part of #40. Closes #363.

hanlint · 2022-03-26T00:00:19Z

I'm slightly concerned about 5 here -- subset_num_batches is also sometimes used to ensure a model overfits on a small amount of data.

ravi-mosaicml · 2022-03-26T00:19:40Z

I'm slightly concerned about 5 here -- subset_num_batches is also sometimes used to ensure a model overfits on a small amount of data.

Ah ok, makes sense. I will refactor the schedulers to use train_subset_num_batches if not None else len(state.dataloader)

composer/algorithms/augmix/augmix.py

composer/algorithms/colout/colout.py

composer/algorithms/augmix/augmix.py

composer/core/state.py

composer/trainer/trainer.py

composer/core/state.py

ravi-mosaicml · 2022-04-06T12:13:59Z

Upon further thought, I think it would make sense to leave subset_num_batches as part of state, so an algorithm (e.g. the future benchmarker) can end an epoch early to do dynamic profiling.

composer/loggers/progress_bar_logger.py

It can be useful for algorithms and callbacks to know which dataloader is active, so added the `dataloader_label` to the state. Removed `evaluators` from state, as nothing is using that anymore.

…er into ravi/optional_dataloader

ajaysaini725

Just a few comments overall looks good!

composer/algorithms/augmix/augmix.py

composer/algorithms/layer_freezing/layer_freezing.py

composer/algorithms/progressive_resizing/progressive_resizing.py

tests/trainer/test_scale_schedule.py

…()`. Preferable to keep variables on the state object rather than as trainer members, where appropriate. Before, the state.schedulers was empty after `__init__()` but before `fit()`. Now, state.schedulers contains the compiled composer schedulers or original pytorch schedulers. Restored optimizers on Event.INIT; it will be a bigger issue to rewrite algs to not depend on optimizers on init.

ajaysaini725

Just a small note otherwise LGTM 👍

composer/core/state.py

* Removed `precision_context` from state * Switched `train_subset_num_batches` and `eval_subset_num_batches` to use `-1` as the default value instead of `None`.

…er into ravi/optional_dataloader

ravi-mosaicml requested a review from jbloxham March 25, 2022 23:52

ravi-mosaicml changed the title ~~[Eval-Only]: Made the state.dataloader optional; removed `state.ste…~~ [Eval-Only]: Optional state.dataloader; removed state.steps_per_epoch. Mar 25, 2022

hanlint reviewed Mar 29, 2022

View reviewed changes

ravi-mosaicml mentioned this pull request Mar 30, 2022

Multiple Evaluator Improvements #329

Closed

3 tasks

Restored dataloader_len on state

558f279

ravi-mosaicml changed the title ~~[Eval-Only]: Optional state.dataloader; removed state.steps_per_epoch.~~ [Eval-Only]: Made state.dataloader optional Apr 12, 2022

ravi-mosaicml added 2 commits April 12, 2022 13:01

Fixed tests

f5d0a1b

Merge branch 'dev' into i40_1

13c6e53

ravi-mosaicml commented Apr 12, 2022

View reviewed changes

composer/loggers/progress_bar_logger.py Show resolved Hide resolved

Added dataloader_label; removed evaluators from State

e4facaa

It can be useful for algorithms and callbacks to know which dataloader is active, so added the `dataloader_label` to the state. Removed `evaluators` from state, as nothing is using that anymore.

ravi-mosaicml changed the title ~~[Eval-Only]: Made state.dataloader optional~~ [Eval-Only]: Set state.dataloader to the active dataloader; remove evaluators from State; run EVAL_START and EVAL_END for each evaluator Apr 12, 2022

Merge branch 'dev' into i40_1

f3f47ea

ravi-mosaicml requested review from hanlint and ajaysaini725 April 12, 2022 21:00

ravi-mosaicml added 5 commits April 12, 2022 14:13

Fixed pyright

56fd87d

Fixed pyright

b89c3bc

Merge branch 'dev' into ravi/optional_dataloader

79a3094

Made max_duration optional

096b44c

Merge branch 'ravi/optional_dataloader' of github.com:mosaicml/compos…

d102506

…er into ravi/optional_dataloader

ravi-mosaicml changed the title ~~[Eval-Only]: Set state.dataloader to the active dataloader; remove evaluators from State; run EVAL_START and EVAL_END for each evaluator~~ [Eval-Only]: Optional timing and dataloader attributes on state; removed evaluators from the state. Apr 13, 2022

ajaysaini725 reviewed Apr 13, 2022

View reviewed changes

ravi-mosaicml added 5 commits April 14, 2022 10:37

Addressed PR feedback; fixed Time type annotations

fe70133

Merge branch 'dev' into ravi/optional_dataloader

7e6e7cf

Fixed doctests

11544bc

Fixed selective backprop

1313a49

Inceased timeout

b5b6192

ravi-mosaicml force-pushed the ravi/optional_dataloader branch from 5b23840 to b5b6192 Compare April 14, 2022 19:55

ravi-mosaicml added 7 commits April 14, 2022 12:56

Merge branch 'dev' into ravi/optional_dataloader

ecc4a59

Remove optimizers from state on init; clean up PR

d3408ba

Merge branch 'dev' into ravi/optional_dataloader

7f00806

Merge branch 'dev' into ravi/optional_dataloader

a4b2697

Fixed the deepspeed schedulers

54d6b88

Merge branch 'dev' into ravi/optional_dataloader

f067ed7

ajaysaini725 approved these changes Apr 21, 2022

View reviewed changes

composer/core/state.py Outdated Show resolved Hide resolved

ravi-mosaicml added 6 commits April 21, 2022 09:48

Merge branch 'dev' into ravi/optional_dataloader

59af154

* Addressed PR Feedback

49e68e8

* Removed `precision_context` from state * Switched `train_subset_num_batches` and `eval_subset_num_batches` to use `-1` as the default value instead of `None`.

Merge branch 'ravi/optional_dataloader' of github.com:mosaicml/compos…

93dd36f

…er into ravi/optional_dataloader

Fixing the dataloader_len setter

da5bd40

Fix tests

42ab447

Merge branch 'dev' into ravi/optional_dataloader

8cac6be

ravi-mosaicml merged commit 0e4955e into dev Apr 22, 2022

ravi-mosaicml deleted the ravi/optional_dataloader branch April 22, 2022 16:04

hanlint mentioned this pull request Apr 29, 2022

Trainer crashes when using FP16 + DeepSpeed #968

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Eval-Only]: Optional timing and dataloader attributes on state; removed evaluators from the state. #832

[Eval-Only]: Optional timing and dataloader attributes on state; removed evaluators from the state. #832

ravi-mosaicml commented Mar 25, 2022 •

edited

Loading

hanlint commented Mar 26, 2022

ravi-mosaicml commented Mar 26, 2022

ravi-mosaicml commented Apr 6, 2022

ajaysaini725 left a comment

ajaysaini725 left a comment

[Eval-Only]: Optional timing and dataloader attributes on state; removed evaluators from the state. #832

[Eval-Only]: Optional timing and dataloader attributes on state; removed evaluators from the state. #832

Conversation

ravi-mosaicml commented Mar 25, 2022 • edited Loading

hanlint commented Mar 26, 2022

ravi-mosaicml commented Mar 26, 2022

ravi-mosaicml commented Apr 6, 2022

ajaysaini725 left a comment

Choose a reason for hiding this comment

ajaysaini725 left a comment

Choose a reason for hiding this comment

ravi-mosaicml commented Mar 25, 2022 •

edited

Loading