Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Request to Update Forked Megatron-LM Repository with Flash-Attention Improvement #1766

Closed
leocnj opened this issue Jul 24, 2023 · 3 comments

Comments

@leocnj
Copy link

leocnj commented Jul 24, 2023

Dear Accelerate Developers,

I would like to take this opportunity to express my gratitude for your continuous work and contribution towards developing this indispensable tool.

As it currently stands, we are utilizing a forked version of Megatron-LM (https://github.com/huggingface/Megatron-LM), ↗,) which is significantly lagging behind the main repository (NVIDIA:main) by 524 commits. Among the missing updates, there is a particular commit that stands out for its potential to significantly expedite Transformer training — the Flash-Attention update from Tri Dao.

On January 11, 2023, Tri Dao's pull request (https://github.com/NVIDIA/Megatron-LM/pull/267) ↗) which integrated Flash-Attention into Megatron-LM, was successfully merged. Recently, Tri Dao even released the second version of his impressive Flash-Attention update.

Given the efficiency enhancement that Flash-Attention brings to Transformer training, I believe its integration would be highly beneficial for a broad spectrum of Accelerate users who rely on Megatron-LM. Therefore, I kindly request that you consider updating the forked version of Megatron-LM to a more recent version that incorporates the changes made by PR 267.

Looking forward to your response and potential plan of action on this matter.

Best regards

@sgugger
Copy link
Collaborator

sgugger commented Jul 24, 2023

cc @pacman100

@leocnj
Copy link
Author

leocnj commented Jul 27, 2023

@pacman100, after some tweaks, I just created a PR to implement the requested function. Will you please take a look?

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this as completed Sep 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants