Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

for now, delete the float8-all-gather-only functionality from float8 … #1451

Merged
merged 1 commit into from
Dec 20, 2024

Conversation

vkuzo
Copy link
Contributor

@vkuzo vkuzo commented Dec 19, 2024

…training

Summary:

In #1093 we added a config option, off by default, to use only float8 all-gather for training and do the matrix multiply in high precision.

This seems generally useful for communication bound workloads, but we can probably think of a cleaner way to add this functionality (such as a weight wrapper tensor subclass). The current implementation adds non-trivial complexity and doesn't jive well with where we want to take this codebase.

Since no one is using this internally or externally yet and we haven't talked about it in the release notes, I think we should do a BC-breaking delete as a one-off. However, if people have concerns - let me know and we can talk about less aggressive options.

Test Plan:

./test/float8/test_everything.sh

Reviewers:

Subscribers:

Tasks:

Tags:

Copy link

pytorch-bot bot commented Dec 19, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1451

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 910edb3 with merge base 7618c26 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 19, 2024
@vkuzo vkuzo requested a review from y-sq December 19, 2024 23:34
@vkuzo vkuzo added the topic: bc-breaking Use this tag if this PR breaks backward compatibility label Dec 19, 2024
…training

Summary:

In #1093 we added a config option, off
by default, to use only float8 all-gather for training and do the matrix
multiply in high precision.

This seems generally useful for communication bound workloads, but we
can probably think of a cleaner way to add this functionality (such as a
weight wrapper tensor subclass). The current implementation adds
non-trivial complexity and doesn't jive well with where we want to take this
codebase.

Since no one is using this internally or externally yet and we haven't talked
about it in the release notes, I think we should do a BC-breaking delete
as a one-off.  However, if people have concerns - let me know and we can
talk about less aggressive options.

Test Plan:

```
./test/float8/test_everything.sh
```

Reviewers:

Subscribers:

Tasks:

Tags:
@vkuzo vkuzo force-pushed the 20241219_float8_allgather_only_delete branch from 488185d to 910edb3 Compare December 19, 2024 23:37
@vkuzo vkuzo requested review from jerryzh168 and drisspg December 19, 2024 23:40
@vkuzo vkuzo merged commit 84fb834 into main Dec 20, 2024
18 checks passed
amdfaa pushed a commit that referenced this pull request Jan 10, 2025
#1451)

for now, delete the float8-all-gather-only functionality from float8 training

Summary:

In #1093 we added a config option, off
by default, to use only float8 all-gather for training and do the matrix
multiply in high precision.

This seems generally useful for communication bound workloads, but we
can probably think of a cleaner way to add this functionality (such as a
weight wrapper tensor subclass). The current implementation adds
non-trivial complexity and doesn't jive well with where we want to take this
codebase.

Since no one is using this internally or externally yet and we haven't talked
about it in the release notes, I think we should do a BC-breaking delete
as a one-off.  However, if people have concerns - let me know and we can
talk about less aggressive options.

Test Plan:

```
./test/float8/test_everything.sh
```

Reviewers:

Subscribers:

Tasks:

Tags:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: bc-breaking Use this tag if this PR breaks backward compatibility
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants