Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinguish between dist and DDP #201

Merged
merged 10 commits into from
Jan 6, 2022
Merged

Conversation

jbloxham
Copy link
Contributor

@jbloxham jbloxham commented Jan 5, 2022

The distributed runtime and the DDP engine are distinct entities, but our code has been treating them almost as synonyms. This is already causing some confusion in parts of the DeepSpeed integration, and it will only get worse if we experiment with other parallelism techniques like model and pipeline parallelism. The purpose of this PR is to separate out DDP-specific code from anything that just deals with the distributed runtime in general.

This is purely a refactor of something that was making me unhappy.

@jbloxham jbloxham marked this pull request as ready for review January 5, 2022 23:01
Copy link
Contributor

@ravi-mosaicml ravi-mosaicml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for doing this!

@jbloxham jbloxham merged commit 1566ce2 into mosaicml:dev Jan 6, 2022
coryMosaicML pushed a commit to coryMosaicML/composer that referenced this pull request Feb 23, 2022
* dist, not ddp

* simplify ClosureGradScaler

* formatting

* formatting and more fixes

* that did not save

* small fixes

* dont need to worry about circular dependencies any longer

* dumb pyright fix

* woops
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants