Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use '-e webgpu' to generate a model for webgpu #1278

Merged
merged 3 commits into from
Feb 26, 2025
Merged

Conversation

guschmue
Copy link
Contributor

change webgpu from '-e web' to '-e webgpu' for consistency.

We now use GQA instead of MHA and added a extra_option "use_webgpu_fp32=1" to enabled gpu's that do not support fp16.

@xenova
Copy link

xenova commented Feb 26, 2025

We now use GQA instead of MHA and added a extra_option "use_webgpu_fp32=1" to enabled gpu's that do not support fp16.

Should we wait for microsoft/onnxruntime#22987 before merging this PR?

guschmue and others added 2 commits February 26, 2025 09:19
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
@guschmue
Copy link
Contributor Author

Should be ok without that PR. microsoft/onnxruntime#22987 fixes the issue with packed qkv and ROE inside GQA.
But this PR will intentionally not enable those 2 because the flashattention2 code in the new webgpu doesn't implement those yet.
Once that all works and is well tested on jsep and webgpu ep we will change model builder to enable those.

@kunal-vaishnavi kunal-vaishnavi enabled auto-merge (squash) February 26, 2025 17:27
@xenova
Copy link

xenova commented Feb 26, 2025

@guschmue Makes sense! 🚀

@kunal-vaishnavi kunal-vaishnavi merged commit faefce2 into main Feb 26, 2025
14 checks passed
@kunal-vaishnavi kunal-vaishnavi deleted the gs/webgpu-builder branch February 26, 2025 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants