Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace state.train_dataloader and state.eval_dataloader with just state.dataloader #363

Closed
1 task
ravi-mosaicml opened this issue Feb 3, 2022 · 0 comments · Fixed by #832
Closed
1 task
Labels
enhancement New (engineering) enhancements, such as features or API changes.

Comments

@ravi-mosaicml
Copy link
Contributor

ravi-mosaicml commented Feb 3, 2022

#329 will result in state.train_dataloader and state.eval_dataloader on the State. However, algorithms and callbacks should only be interested in the currently activate dataloader.

For example, callbacks should not be touching the training dataloader, as evaluation could be running mid-epoch. For callbacks that simply want to run on every batch (like TQDM), it makes the logic more complicated since they will have to check which dataloader to use.

Instead, state.train_dataloader and state.eval_dataloader can be merged into one field -- called state.dataloader. The trainer can be responsible for setting this to the correct dataloader that is being actively iterated.

  • Upon init, state.dataloader should be set to the training dataloader, so dataloader augmentations can be applied.
  • Before Event.EVAL_START, the trainer should set state.dataloader to the dataloader of the evaluator being evaluated. After Event.EVAL_END, the trainer should revert state.dataloader back to the training dataloader (i.e. the value before swapping it at Event.EVAL_START).

TODO:

@ravi-mosaicml ravi-mosaicml added the Needs Design Needs Design Further design is required. Do not start implementation until design questions are reso label Feb 3, 2022
@ravi-mosaicml ravi-mosaicml added this to the Backlog milestone Feb 15, 2022
@ravi-mosaicml ravi-mosaicml added the enhancement New (engineering) enhancements, such as features or API changes. label Feb 28, 2022
@ravi-mosaicml ravi-mosaicml modified the milestones: Backlog, v0.5 Feb 28, 2022
ravi-mosaicml added a commit that referenced this issue Mar 25, 2022
…ps_per_epoch`.

1. Made the `state.dataloader` optional, since it will not be provided on `__init__` as part of #40.
2. Binding the active dataloader to the state on `Event.FIT_START`, and switching the dataloader to each evaluation dataloader before `Event.EVAL_START`. Restoring the previous (training) dataloader after `Event.EVAL_END`.
3. Moved `Event.EVAL_START` and `Event.EVAL_END` to run for each evaluator, instead of once for all evaluators. With #40, the eval() will take in a dataloader, which would then require `Event.EVAL_START` and `Event.EVAL_END`. This change also permits for algorithms that wish to modify (each) evalution dataloader.
4. Moved scaling of the LR schedulers to `Trainer.fit()` before `Event.FIT_START` fires. Schedulers will be passed in on `Trainer.fit()` as part of #40.
5. Removed `steps_per_epoch` as part of the state. Instead, algorithms and callbacks can read len(state.dataloader) directly. While this change will make schedulers no longer accurate when using `train_subset_num_batches`, that flag should only be used for performance measurements. As such, it is not necessarry that SSR behaves correctly for performance runs. Added a warning for the `train_subset_num_batches` field.

Implements the first part of #40.
Closes #363.
@ravi-mosaicml ravi-mosaicml removed the Needs Design Needs Design Further design is required. Do not start implementation until design questions are reso label Mar 30, 2022
ravi-mosaicml added a commit that referenced this issue Apr 22, 2022
…ved evaluators from the state. (#832)

1. Made the `state.dataloader` and `state.max_duration` optional, since they may not be provided on `__init__` as part of #40.
2. Bind `dataloader_len` to the state, so algorithms can know how many batches to expect per epoch
3. Added the `dataloader_label` to the state, so algorithms know which dataloader is currently running.
4. Binding the active dataloader to the state on `Event.FIT_START`, and switching the dataloader to each evaluation dataloader before `Event.EVAL_START`. Restoring the previous (training) dataloader after `Event.EVAL_END`.
5. Moved `Event.EVAL_START` and `Event.EVAL_END` to run for each evaluator, instead of once for all evaluators. With #40, the eval() will take in a dataloader, which would then call `Event.EVAL_START` and `Event.EVAL_END` for each dataloader. This change also permits for algorithms that wish to modify (each) evaluation dataloader.
6. Moved scaling of the LR schedulers to `Trainer.fit()` before `Event.FIT_START` fires. Schedulers will be passed in on `Trainer.fit()` as part of #40.
7. Remove the evaluators from the state. Addresses part of #329. The user need not set state.evaluators; instead, the trainer will keep track of the evaluators.
8. Removed the `precision_context` from state. Instead, added a helper static function to precision.py.

Implements the first part of #40.
Closes #363.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New (engineering) enhancements, such as features or API changes.
Projects
None yet
1 participant