Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExperimentHparams class; Set state.train_dataloader #966

Merged
merged 152 commits into from
May 11, 2022

Conversation

ravi-mosaicml
Copy link
Contributor

@ravi-mosaicml ravi-mosaicml commented Apr 28, 2022

  • Added in an ExperimentHparams class. This class describes how to run a training job that may have multiple calls to Trainer.fit and/or Trainer.eval. Specifically, ExperimentHparams.initialize_object() returns a (Trainer, List[FitKwargs], List[EvalKwargs]) tuple, that then the user's entrypoint can consome.
    This class does not automatically train the model, nor does it include an entrypoint.
  • Added typing definitions for FitKwargs and EvalKwargs, along with test cases to ensure they stay in sync with the Trainer signature.
  • Fix an bug introduced in Multiple calls to Trainer.fit() #948, which removed the setting of State.train_dataloader. Added back the lines to correctly set the train dataloader.

…ps_per_epoch`.

1. Made the `state.dataloader` optional, since it will not be provided on `__init__` as part of mosaicml#40.
2. Binding the active dataloader to the state on `Event.FIT_START`, and switching the dataloader to each evaluation dataloader before `Event.EVAL_START`. Restoring the previous (training) dataloader after `Event.EVAL_END`.
3. Moved `Event.EVAL_START` and `Event.EVAL_END` to run for each evaluator, instead of once for all evaluators. With mosaicml#40, the eval() will take in a dataloader, which would then require `Event.EVAL_START` and `Event.EVAL_END`. This change also permits for algorithms that wish to modify (each) evalution dataloader.
4. Moved scaling of the LR schedulers to `Trainer.fit()` before `Event.FIT_START` fires. Schedulers will be passed in on `Trainer.fit()` as part of mosaicml#40.
5. Removed `steps_per_epoch` as part of the state. Instead, algorithms and callbacks can read len(state.dataloader) directly. While this change will make schedulers no longer accurate when using `train_subset_num_batches`, that flag should only be used for performance measurements. As such, it is not necessarry that SSR behaves correctly for performance runs. Added a warning for the `train_subset_num_batches` field.

Implements the first part of mosaicml#40.
Closes mosaicml#363.
It can be useful for algorithms and callbacks to know which dataloader is active, so added the `dataloader_label` to the state. Removed `evaluators` from state, as nothing is using that anymore.
…()`.

Preferable to keep variables on the state object rather than as trainer members, where appropriate. Before, the state.schedulers was empty after `__init__()` but before `fit()`. Now, state.schedulers contains the compiled composer schedulers or original pytorch schedulers.

Restored optimizers on Event.INIT; it will be a bigger issue to rewrite algs to not depend on optimizers on init.
* Removed `precision_context` from state
* Switched `train_subset_num_batches` and `eval_subset_num_batches` to use `-1` as the default value instead of `None`.
@ravi-mosaicml
Copy link
Contributor Author

would using ExerimentHparams require changes to the entrypoints such as https://github.com/mosaicml/composer/blob/dev/examples/run_composer_trainer.py ? Currently, everything is initialized from trainerhparams.

This is a non-breaking change; all existing trainer hparams will work as-is.

@ravi-mosaicml ravi-mosaicml requested a review from a team as a code owner May 11, 2022 15:27
Copy link
Contributor

@A-Jacobson A-Jacobson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

improves the codebase - approved!

@ravi-mosaicml ravi-mosaicml changed the title ExperimentHparams class ExperimentHparams class; Set state.train_dataloader May 11, 2022
@ravi-mosaicml ravi-mosaicml enabled auto-merge (squash) May 11, 2022 20:25
@ravi-mosaicml ravi-mosaicml merged commit e85302b into mosaicml:dev May 11, 2022
@ravi-mosaicml ravi-mosaicml deleted the experiment_hparams branch May 11, 2022 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants