-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deterministic mode (and use it for tests); simplify checkpointing tests #203
Conversation
1. Deterministic mode on GPUs requires the CUBLAS_WORKSPACE_CONFIG environment variable to be set _before_ the training script is launched. The launch script can properly set this variable 2. Renamed composer.utils.determinism to composer.utils.reproducability to match the pytorch naming in the docs 3. Switched the testrunner to use deterministic mode 4. Renamed the run directory environment variable to COMPOSER_RUN_DIRECTORY, so it would have a parallel name to COMPOSER_USE_DETERMINISTIC_MODE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should remove deterministic as an arg to Trainer, but other than that looks fine
…et that in the hparams. Further reduced number of checkpoint tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm,
Deferring to Jamie on deepspeed tests.
Also am noticing that our root conftest.py
is getting large again, when we do have our own fixtures subfolder, so bonus points for moving deepspeed fixtures into its own deepspeed fixtures folder
… tests (mosaicml#203) 1. Deterministic mode on GPUs requires the `CUBLAS_WORKSPACE_CONFIG`. Setting this variable when configuring deterministic mode. 2. Renamed composer.utils.determinism to composer.utils.reproducibility to match the pytorch naming in the docs 3. Configured tests to use deterministic mode by default. 4. Configured checkpoints to not compress by default, since compression is slow 5. Made the gpu and deepspeed markers mutually exclusive, since they require different process initialization
CUBLAS_WORKSPACE_CONFIG
. Setting this variable when configuring deterministic mode.