-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Eval-Only]: Optional timing and dataloader attributes on state; removed evaluators from the state. #832
Conversation
…ps_per_epoch`. 1. Made the `state.dataloader` optional, since it will not be provided on `__init__` as part of #40. 2. Binding the active dataloader to the state on `Event.FIT_START`, and switching the dataloader to each evaluation dataloader before `Event.EVAL_START`. Restoring the previous (training) dataloader after `Event.EVAL_END`. 3. Moved `Event.EVAL_START` and `Event.EVAL_END` to run for each evaluator, instead of once for all evaluators. With #40, the eval() will take in a dataloader, which would then require `Event.EVAL_START` and `Event.EVAL_END`. This change also permits for algorithms that wish to modify (each) evalution dataloader. 4. Moved scaling of the LR schedulers to `Trainer.fit()` before `Event.FIT_START` fires. Schedulers will be passed in on `Trainer.fit()` as part of #40. 5. Removed `steps_per_epoch` as part of the state. Instead, algorithms and callbacks can read len(state.dataloader) directly. While this change will make schedulers no longer accurate when using `train_subset_num_batches`, that flag should only be used for performance measurements. As such, it is not necessarry that SSR behaves correctly for performance runs. Added a warning for the `train_subset_num_batches` field. Implements the first part of #40. Closes #363.
state.dataloader
optional; removed `state.ste…state.dataloader
; removed state.steps_per_epoch
.
I'm slightly concerned about 5 here -- |
Ah ok, makes sense. I will refactor the schedulers to use |
Upon further thought, I think it would make sense to leave |
state.dataloader
; removed state.steps_per_epoch
.state.dataloader
optional
It can be useful for algorithms and callbacks to know which dataloader is active, so added the `dataloader_label` to the state. Removed `evaluators` from state, as nothing is using that anymore.
state.dataloader
optionalstate.dataloader
to the active dataloader; remove evaluators from State
; run EVAL_START
and EVAL_END
for each evaluator
state.dataloader
to the active dataloader; remove evaluators from State
; run EVAL_START
and EVAL_END
for each evaluatorThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments overall looks good!
composer/algorithms/progressive_resizing/progressive_resizing.py
Outdated
Show resolved
Hide resolved
5b23840
to
b5b6192
Compare
…()`. Preferable to keep variables on the state object rather than as trainer members, where appropriate. Before, the state.schedulers was empty after `__init__()` but before `fit()`. Now, state.schedulers contains the compiled composer schedulers or original pytorch schedulers. Restored optimizers on Event.INIT; it will be a bigger issue to rewrite algs to not depend on optimizers on init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small note otherwise LGTM 👍
* Removed `precision_context` from state * Switched `train_subset_num_batches` and `eval_subset_num_batches` to use `-1` as the default value instead of `None`.
…er into ravi/optional_dataloader
state.dataloader
andstate.max_duration
optional, since they may not be provided on__init__
as part ofeval_only
flag #40.dataloader_len
to the state, so algorithms can know how many batches to expect per epochdataloader_label
to the state, so algorithms know which dataloader is currently running.Event.FIT_START
, and switching the dataloader to each evaluation dataloader beforeEvent.EVAL_START
. Restoring the previous (training) dataloader afterEvent.EVAL_END
.Event.EVAL_START
andEvent.EVAL_END
to run for each evaluator, instead of once for all evaluators. Witheval_only
flag #40, the eval() will take in a dataloader, which would then callEvent.EVAL_START
andEvent.EVAL_END
for each dataloader. This change also permits for algorithms that wish to modify (each) evaluation dataloader.Trainer.fit()
beforeEvent.FIT_START
fires. Schedulers will be passed in onTrainer.fit()
as part ofeval_only
flag #40.precision_context
from state. Instead, added a helper static function to precision.py.Implements the first part of #40.
Closes #363.