Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoYAHP Part 3: Refactor the models, datasets, and trainer_hparams #1072

Conversation

ravi-mosaicml
Copy link
Contributor

@ravi-mosaicml ravi-mosaicml commented May 20, 2022

This PR cleans up the model hparams, dataset hparams, and trainer hparams for AutoYAHP. It does not depend on AutoYAHP itself (a future PR will remove the underlying hparam classes).

  • Fix invalid type annotations in hparam classes (Yahp 1.1 has stricter type settings)
  • Refactor the trainer hparams and trainer to have the same argument names. Specifically, this meant renaming "optimizer" to "optimizers" and "deepspeed" to "deepspeed_config" in the trainer hparams, and "load_strict" to "load_strict_model_weights" in the Trainer. Updated the YAMLs as appropriate.
  • Removed the device and precision settings from the YAMLs, as the trainer now uses smart defaults (AMP + GPU if cuda is available; otherwise FP32 + CPU). This is necessarry as AutoYAHP will parse the yamls into the DeviceGPU() / DeviceCPU() instances directly, and DeviceGPU() throws an error if cuda is not available (before, yahp would parse into DeviceGPUHparams(), which could then be replaced with DeviceCPUHparams()). (While it would have also worked to patch the yaml directly, I thought it'd be generally better to make the yamls more portable).

- Treat all python warnings as errors in tests. Existing tests that were throwing warnings were either fixed (yay!) or modified with `@pytest.mark.filterwarnings`
- In BlurPool, throwing an error if both `replace_maxpools` and `replace_convs` are False, as that results in the algorithm being a no-op.
-  Made the default optimizer warning in the trainer less verbose.
- Converted the bert yaml to have duration specified in terms of samples, to fix an issue where the warmup period, combine with max duration, was a noop.
- Moved `TestMetricSetter` to `tests/common` and renamed as `MetricSetterCallback`. Tests should not be importing from other test files (tests/common is OK)
- Removed TestWandBLogger, since the trainer tests do the same thing
Fix progressive resizing
This PR refactors the algorithms and tests as will be required by AutoYAHP. It does not depend on AutoYAHP itself (a future PR will remove the underlying hparam classes).

- Refactored algorithm tests to not depend on hparams
- Reformatted the factorize and selective backprop docstrings so they would be correctly parsed by auto-yahp
- Refactor `algorithm_settings.py` to not depend on hparams and to return a list of `pytest.param` objects for a `pytest.mark.parametrize`. This change makes it more re-usable since it now includes information about markers required for each algorithm.
- Moved the `TestTrainerAlgorithms` into `tests/algorithms/test_algorithms_train.py`, since it tests the individual algorithms, not the trainer, and thus should live in `tests/algorithms`.
- Add helper methods for scanning a module to discover subclass implementations, check that the registry contains an entry, and test that a class is constructable from yaml
Silencing deepspeed deprecation warnings
@A-Jacobson
Copy link
Contributor

Thoughts on how to go forward?

  1. Undo the change in Multiple calls to Trainer.fit() #948 and always use CPU + FP32, in both the functional api and hparams path. Require our yamls to make these explicit. (Not a fan of this option, as it would require an extra parameter for the 99% case where if one has a GPU, then it should be used).
  2. Keep the devices and precisions specified in the yamls, but leave the smart defaults in the trainer. (Basically, revert these changes in this PR). This would make the YAMLs less portable, but can see that making sense if we want to view them more as "benchmarks" rather than "examples".
  3. Auto-default the device, but always use FP32 precision unless otherwise specified (Match what PTL does)
  4. Leave this PR as-is

Definitely not 2, a few of us have already run afoul on this while trying to debug runs on cpu. I don't think we can go wrong with 1, 3 or 4 as long as it's clear what is happening. My personal order or preference is 3. -> 4. -> 1.

Yes, amp is the best performance but it could cause some training runs to overflow. If I'm training a model and getting nans it would be important to me to at least be aware that amp is being used to I could try toggling it off.

Copy link
Contributor

@A-Jacobson A-Jacobson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments asking about why the hparams classes are organized the way the are + my opinion on the device defaults.

Overall LGTM, especially since many of these hparams classes will be leaving is soon (rest in peace you will not be missed).

@ravi-mosaicml
Copy link
Contributor Author

Definitely not 2, a few of us have already run afoul on this while trying to debug runs on cpu. I don't think we can go wrong with 1, 3 or 4 as long as it's clear what is happening. My personal order or preference is 3. -> 4. -> 1.

Discussed offline; decided to go with option 4 (leave as is), as option 3 would require the yamls to specify precision: amp, which would defeat the portability.

@ravi-mosaicml ravi-mosaicml changed the title AutoYAHP Part 3: Refactor the models, datasets, and trainer_hparams for AutoYAHP AutoYAHP Part 3: Refactor the models, datasets, and trainer_hparams May 26, 2022
@ravi-mosaicml ravi-mosaicml merged commit f1b1f22 into mosaicml:dev May 26, 2022
@ravi-mosaicml ravi-mosaicml deleted the refactor_model_dataset_trainer_for_autoyahp branch May 31, 2022 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants