-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add --fast argument to enable experimental optimizations.
Optimizations that might break things/lower quality will be put behind this flag first and might be enabled by default in the future. Currently the only optimization is float8_e4m3fn matrix multiplication on 4000/ADA series Nvidia cards or later. If you have one of these cards you will see a speed boost when using fp8_e4m3fn flux for example.
- Loading branch information
1 parent
d1a6bd6
commit 9953f22
Showing
4 changed files
with
52 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
9953f22
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave it a shot, but I get an error.
RTX 4090, Win11, pytorch version: 2.3.1+cu121
RuntimeError: _scaled_mm_out_cuda is not compiled for this platform.
The whole thing:
9953f22
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might need a newer pytorch version on windows. I only tested it on Linux which is why it's behind the --fast argument.
9953f22
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works now after running update_comfyui_and_python_dependencies.bat which updated torch to 2.4.0 and some other things.
Pretty significant speed bump, ~40%-ish!
9953f22
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, here is a post about further optimizations for Flux: https://www.reddit.com/r/StableDiffusion/comments/1ex64jj/comment/lj3v03m/?context=3 .
9953f22
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i just tested on the 4090 using flux with Q8.gguf, but the speed is not increase~~
9953f22
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fp8_e4m3fn only
9953f22
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
works with SD15, Flux models, but on sdxl models I get black images
9953f22
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not working for me, getting this:
TypeError: _scaled_mm() missing 2 required positional argument: "scale_a", "scale_b"
My setup: Windows 11, 4090, non-standalone install, Python 3.12.5, pytorch 2.5.0.dev20240818+cu124, cuda_12.6.r12.6
9953f22
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update comfyui, pytorch nightly is supported now.
9953f22
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it works now. Thanks!