for now, delete the float8-all-gather-only functionality from float8 … #1451

vkuzo · 2024-12-19T23:34:26Z

…training

Summary:

In #1093 we added a config option, off by default, to use only float8 all-gather for training and do the matrix multiply in high precision.

This seems generally useful for communication bound workloads, but we can probably think of a cleaner way to add this functionality (such as a weight wrapper tensor subclass). The current implementation adds non-trivial complexity and doesn't jive well with where we want to take this codebase.

Since no one is using this internally or externally yet and we haven't talked about it in the release notes, I think we should do a BC-breaking delete as a one-off. However, if people have concerns - let me know and we can talk about less aggressive options.

Test Plan:

./test/float8/test_everything.sh

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2024-12-19T23:34:30Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1451

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 910edb3 with merge base 7618c26 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…training Summary: In #1093 we added a config option, off by default, to use only float8 all-gather for training and do the matrix multiply in high precision. This seems generally useful for communication bound workloads, but we can probably think of a cleaner way to add this functionality (such as a weight wrapper tensor subclass). The current implementation adds non-trivial complexity and doesn't jive well with where we want to take this codebase. Since no one is using this internally or externally yet and we haven't talked about it in the release notes, I think we should do a BC-breaking delete as a one-off. However, if people have concerns - let me know and we can talk about less aggressive options. Test Plan: ``` ./test/float8/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags:

#1451) for now, delete the float8-all-gather-only functionality from float8 training Summary: In #1093 we added a config option, off by default, to use only float8 all-gather for training and do the matrix multiply in high precision. This seems generally useful for communication bound workloads, but we can probably think of a cleaner way to add this functionality (such as a weight wrapper tensor subclass). The current implementation adds non-trivial complexity and doesn't jive well with where we want to take this codebase. Since no one is using this internally or externally yet and we haven't talked about it in the release notes, I think we should do a BC-breaking delete as a one-off. However, if people have concerns - let me know and we can talk about less aggressive options. Test Plan: ``` ./test/float8/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 19, 2024

vkuzo requested a review from y-sq December 19, 2024 23:34

vkuzo added the topic: bc-breaking Use this tag if this PR breaks backward compatibility label Dec 19, 2024

vkuzo force-pushed the 20241219_float8_allgather_only_delete branch from 488185d to 910edb3 Compare December 19, 2024 23:37

vkuzo requested review from jerryzh168 and drisspg December 19, 2024 23:40

drisspg approved these changes Dec 19, 2024

View reviewed changes

y-sq approved these changes Dec 20, 2024

View reviewed changes

vkuzo merged commit 84fb834 into main Dec 20, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

for now, delete the float8-all-gather-only functionality from float8 … #1451

for now, delete the float8-all-gather-only functionality from float8 … #1451

vkuzo commented Dec 19, 2024

pytorch-bot bot commented Dec 19, 2024 •

edited

Loading

for now, delete the float8-all-gather-only functionality from float8 … #1451

for now, delete the float8-all-gather-only functionality from float8 … #1451

Conversation

vkuzo commented Dec 19, 2024

pytorch-bot bot commented Dec 19, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1451

✅ No Failures

pytorch-bot bot commented Dec 19, 2024 •

edited

Loading