xformers attention #1851

C43H66N12O12S2 · 2022-10-07T02:28:30Z

This PR adds xformer optimized cross-attention, a flag to disable it and use split instead, _maybe_init function that - for some reason - seems to be necessary for xformers to work in this instance and enables functorch in xformers, which further increased performance on my machine.

We still need a way for easy distribution of xformers. Otherwise, this PR is good to go (barring bugs I've not been able to perceive)
cc. @Doggettx @Thomas-MMJ @ArrowM @consciencia

PS. Much thanks to @fmassa @danthe3rd @yocabon and many others for their generous efforts to bring xformers to Windows.

I've seen a %15 improvement with batch size 1, 100 steps, 512x512 and euler_a. xFormers allows me to output 2048x2048 whereas I would previously OOM.

closes #576

modules/sd_hijack.py

rabidcopy · 2022-10-07T03:35:06Z

I'm having trouble finding information on this but does this inadvertently kill Linux AMD support as a new default? I'm not certain xformers can be compiled for ROCM.

C43H66N12O12S2 · 2022-10-07T03:36:35Z

Yeah, it likely would. We could add another check to the if statement for ROCm. Not sure PyTorch has that, I'll look at it though.

danthe3rd · 2022-10-07T06:57:14Z

We still need a way for easy distribution of xformers

Yeah totally agree - we are working on something here for linux tho, but no plans at the moment for windows cc @bottler

I'm not certain xformers can be compiled for ROCM.

That's not something we are supporting indeed.

danthe3rd

Just added a few comments to simplify the code - this looks great otherwise :)

modules/sd_hijack_optimizations.py

SafentisFox · 2022-10-07T11:29:36Z

Something I've seen in some colabs is downloading a pre-compiled version of xformers, is this a viable way to distribute xformers here too?

C43H66N12O12S2 · 2022-10-07T12:02:59Z

It is - we can't use those exact ones as they were built for Linux - but it still has to be built & distributed, something I have no experience in.

wsippel · 2022-10-07T13:29:41Z

Has anyone tried running xFormers through hipify yet? Google gave me nothing, and I don't have CUDA set up to try myself right now.

Thomas-MMJ · 2022-10-07T14:36:19Z

Here are windows xformers for 3.9 https://github.com/neonsecret/xformers/releases/tag/v0.14

To get a wheel just do

python setup.py bdist_wheel

x02Sylvie · 2022-10-07T15:10:26Z

I wonder if xformers could be combined with AITemplate #1625 for 15% * 200% * 250% speed boost

C43H66N12O12S2 · 2022-10-07T15:19:13Z

@Thomas-MMJ I think seperate wheels are needed for different GPU archs. Official builds of xformers build seperate wheels. Example: https://app.circleci.com/pipelines/github/facebookresearch/xformers/2900/workflows/5c5de2be-9557-4684-9d10-34cd3835663e

I could provide the compute (build it locally) if somebody's willing to setup the workflow.

C43H66N12O12S2 · 2022-10-07T15:39:43Z

Could somebody please test this?

def xformers_attention_forward(self, x, context=None, mask=None):
    h = self.heads
    q_in = self.to_q(x)
    context = default(context, x)
    k_in = self.to_k(context)
    v_in = self.to_v(context)
    q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> b n h d', h=h), (q_in, k_in, v_in))
    del q_in, k_in, v_in
    out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None)

    out = rearrange(out, 'b n h d -> b n (h d)', h=h)
    return self.to_out(out)

This exact same code, that produced broken images yesterday, now works for some reason.... still no clue why it failed yesterday or why it suddenly works now.

ArrowM · 2022-10-07T15:55:58Z

Could somebody please test this?

Works for me

C43H66N12O12S2 · 2022-10-07T16:32:06Z

I can now reach 22it/s with the newest version and batch size 8 with a 3080 12GB. Just need to find a way to distribute packages to people and we can ship this to everyone.

Thomas-MMJ · 2022-10-07T17:58:41Z

@Thomas-MMJ I think seperate wheels are needed for different GPU archs. Official builds of xformers build seperate wheels. Example: https://app.circleci.com/pipelines/github/facebookresearch/xformers/2900/workflows/5c5de2be-9557-4684-9d10-34cd3835663e

I could provide the compute (build it locally) if somebody's willing to setup the workflow.

Looks like conda will be added to their continuous integration,

https://github.com/facebookresearch/xformers/pull/466/files

SafentisFox · 2022-10-07T18:11:00Z

@C43H66N12O12S2 I can now reach 22it/s with the newest version and batch size 8 with a 3080 12GB. Just need to find a way to distribute packages to people and we can ship this to everyone.

22it/s?!? With batch size 8?! You did not misspell, right? You didn't mean batch count 8 or 2.2it/s?
Because 22it/s with batch size 8 is insane lol

Thomas-MMJ · 2022-10-07T18:16:36Z

This exact same code, that produced broken images yesterday, now works for some reason.... still no clue why it failed yesterday or why it suddenly works now.

So that is without the init? I thought it was the lack of init that was the issue yesterday. (Of course that was pure speculation...)

ArrowM · 2022-10-07T18:22:36Z

22it/s?!? With batch size 8?! You did not misspell, right? You didn't mean batch count 8 or 2.2it/s? Because 22it/s with batch size 8 is insane lol

With a 3080, they definitely meant one or the other, not both at the same time.

C43H66N12O12S2 · 2022-10-08T01:00:06Z

@SafentisFox
Sorry, should've clarified 😅, it's 2.77it/s with batch size 8, which basically amounts to 22it/s for a single image.

@Thomas-MMJ
Yep, without the init function.

@AUTOMATIC1111
It'd be great to get some help with this. xformers should basically build out-of-the-box now, we just need to distribute built packages. We can't have the users build it as it requires VC++ build tools and nvcc.

C43H66N12O12S2 · 2022-10-08T15:25:19Z

@htkg We limited it to Ampere as my wheels only work with Ampere. Hopefully Meta will distribute wheels for Windows, and we can remove a lot (nearly all, actually) of these checks.

chekaaa · 2022-10-08T15:54:49Z

My tests so far using a 3070:

Euler a - 20 steps

xFormers off - Batch count 10 :

xFormers off - Batch count 5 - Batch size 2 :

xFormers on - Batch count 10 :

xFormers on - Batch count 5 - Batch size 2 :

C43H66N12O12S2 · 2022-10-08T15:57:02Z

@chekaaa ramp up the batch size for larger gains

chekaaa · 2022-10-08T16:08:23Z

xFormers off - Batch count 5 - Batch size 6:

xFormers on - Batch count 5 - Batch size 6:

leohumnew · 2022-10-08T16:19:51Z

What do I need to do to get this to work? Just add "--xformers" to COMMANDLINE_ARGS in the .bat file?

chekaaa · 2022-10-08T16:21:07Z

@leohumnew yes

kaneda2004 · 2022-10-08T16:41:40Z

Doesn't appear to auto-install xformers
GPU is RTX 3090

Console log:

venv "F:\StableD\stable-diffusion-automatic1111\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Commit hash: 3061cdb
Installing xformers
Installing requirements for Web UI
Launching Web UI with arguments: --xformers
Cannot import xformers
Traceback (most recent call last):
File "F:\StableD\stable-diffusion-automatic1111\modules\sd_hijack_optimizations.py", line 15, in
import xformers.ops
ModuleNotFoundError: No module named 'xformers'

wstrinz · 2022-10-08T16:53:24Z

Doesn't appear to auto-install xformers GPU is RTX 3090

Console log:

venv "F:\StableD\stable-diffusion-automatic1111\venv\Scripts\Python.exe" Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] Commit hash: 3061cdb Installing xformers Installing requirements for Web UI Launching Web UI with arguments: --xformers Cannot import xformers Traceback (most recent call last): File "F:\StableD\stable-diffusion-automatic1111\modules\sd_hijack_optimizations.py", line 15, in import xformers.ops ModuleNotFoundError: No module named 'xformers'

Same here, #1851 (comment) got it working for me

kaneda2004 · 2022-10-08T17:26:33Z

Sounds like you've got Python 3.9 installed - that whl won't work for me - but this one did:

https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/b/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl

And it's crazy fast. @C43H66N12O12S2 thank you for your work on this PR. Keep flying high :)

leohumnew · 2022-10-08T19:23:44Z

Is there any decrease in quality with this? Or should it be equivalent to without, but just a bit faster?

kaneda2004 · 2022-10-08T19:26:30Z

Is there any decrease in quality with this? Or should it be equivalent to without, but just a bit faster?

So far my testing shows same quality (have only tested a handful of samplers w/ it)
and I'm getting approx 50% speedup, more if I batch together smaller images to max out my vram. in which case I'm seeing over 100% speedup. (Eight 512x512 images batched takes 8 seconds, 1 sec per image)

JustMaier · 2022-10-08T20:19:51Z

I'm running a 3090 and noticing about a 20-30% speed up, good stuff.

I have however noticed a strange issue, repeat generations with the same params can give different results. I've created an issue about it. I wonder if it's just me or if others have noticed the same thing.
#1999

ifffrt · 2022-10-08T22:18:02Z

Is there a guide on how to DIY the wheels for this on your local computer? I'm running an outdated Maxwell GPU but I still want to try this out anyway.

kaneda2004 · 2022-10-08T22:20:10Z

Is there a guide on how to DIY the wheels for this on your local computer? I'm running an outdated Maxwell GPU but I still want to try this out anyway.

I've built it for t4 and for p100 but only on a colab environment.

It's literally a pip install command once you have the build environment setup. I expect that it's not too different on windows.

Hope you have a lot of time though. Took about 45 minutes to compile and the first time it failed lol.

qJake · 2022-10-09T02:21:39Z

Trying to get a prebuilt xformers running on Windows x64 / Python 3.9 with a 3070.

Running pip install https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/b/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl --prefer-binary yields:

ERROR: xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl is not a supported wheel on this platform.

Running pip install https://github.com/neonsecret/xformers/releases/download/v0.14/xformers-0.0.14.dev0-cp39-cp39-win_amd64.whl --prefer-binary installs the wheel successfully, but SD Web (with --xformers) does not load it.

Running import xformers in a console yields:

>>> import xformers
Could not find module 'C:\Users\USERNAME\AppData\Roaming\Python\Python39\site-packages\xformers\_C.pyd' (or one of its dependencies). Try using the full path with constructor syntax.
WARNING:root:WARNING: Could not find module 'C:\Users\USERNAME\AppData\Roaming\Python\Python39\site-packages\xformers\_C.pyd' (or one of its dependencies). Try using the full path with constructor syntax.
Need to compile C++ extensions to get sparse attention suport. Please run python setup.py build develop

Any fixes for this?

ifffrt · 2022-10-09T11:41:41Z

Is there a guide on how to DIY the wheels for this on your local computer? I'm running an outdated Maxwell GPU but I still want to try this out anyway.

I've built it for t4 and for p100 but only on a colab environment.

It's literally a pip install command once you have the build environment setup. I expect that it's not too different on windows.

Hope you have a lot of time though. Took about 45 minutes to compile and the first time it failed lol.

Actually I just found a guide for windows on reddit. It's actually a little bit more involved than that, but it sounds doable.
https://www.reddit.com/r/StableDiffusion/comments/xz26lq/automatic1111_xformers_cross_attention_with_on/

C43H66N12O12S2 · 2022-10-10T10:21:33Z

@htkg @duckness @githubartman @ilcane87 could you please test this wheel? https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/c/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl

duckness · 2022-10-10T13:21:05Z

could you please test this wheel? https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/c/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl

it works for me (1070)

ilcane87 · 2022-10-10T13:24:34Z

@C43H66N12O12S2
Works for me (1060) after:
pip uninstall xformers
pip install xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
Still no speed difference with or without --force-enable-xformers, but that was the case even with my own built wheel.

salieri-dev · 2022-10-10T13:28:22Z

@C43H66N12O12S2 i've built wheels by myself, so I cant test it, I think...

p.s building wheels took 15m on rtx super 2060

got 30-40% boost which is awesome

salieri-dev · 2022-10-10T13:29:32Z

will try after go to pc if nobody will test on rtx 2060 before

qJake · 2022-10-10T22:06:15Z

Trying to get a prebuilt xformers running on Windows x64 / Python 3.9 with a 3070.

Running pip install https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/b/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl --prefer-binary yields:

ERROR: xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl is not a supported wheel on this platform.

Running pip install https://github.com/neonsecret/xformers/releases/download/v0.14/xformers-0.0.14.dev0-cp39-cp39-win_amd64.whl --prefer-binary installs the wheel successfully, but SD Web (with --xformers) does not load it.

Running import xformers in a console yields:
>>> import xformers

Could not find module 'C:\Users\USERNAME\AppData\Roaming\Python\Python39\site-packages\xformers\_C.pyd' (or one of its dependencies). Try using the full path with constructor syntax.

WARNING:root:WARNING: Could not find module 'C:\Users\USERNAME\AppData\Roaming\Python\Python39\site-packages\xformers\_C.pyd' (or one of its dependencies). Try using the full path with constructor syntax.

Need to compile C++ extensions to get sparse attention suport. Please run python setup.py build develop
Any fixes for this?

Closing the loop on this... ended up following the instructions to build xformers locally... was trying to avoid the 3GB of CUDA / 7GB of VC++ dev libraries, but oh well.

Worked first time after pip install -e . finished, took about 45 minutes on a 9th-gen i7.

Thomas-MMJ · 2022-10-11T02:29:34Z

To create a wheel, do

python setup.py bdist_wheel

and there will be a wheel put in your dist folder. You can share it and/or keep it around to reinstall later.

ghost · 2022-10-11T02:32:14Z

I just wanted to say that the new updates solved my problems, and I am really grateful for that. It was a frustrating experience... thanks to the devs who made it possible. If I only waited long enough I wouldnt have to be in a battle with the cmd all day long...

Thomas-MMJ · 2022-10-11T07:24:58Z

There are now official conda linux xformers builds

[danthe3rd](https://github.com/danthe3rd) commented [13 hours ago](https://github.com/huggingface/diffusers/pull/532#issuecomment-1273656447)

Hi, I'm a maintainer of xFormers,
Just wanted to clarify a few things:
(1) xFormers supports Linux & Windows
(2) We don't have official binaries for windows, but we now (since today) have binaries for linux! You can get them with "conda install xformers -c xformers/label/dev", but they are only available for Python 3.9 or 3.10, CUDA 11.3 or 11.6, and PyTorch 1.12.1
(3) If you don't use binaries, the build can be very long indeed - however it can be significantly faster if you install ninja before, as it can be parallelised. It still takes a dozen minutes on GPUs where we build flash attention (compute capability > 7.5)

huggingface/diffusers#532 (comment)

Renaldas111 · 2022-10-12T16:21:00Z

Working with python 3.10.8, was not working with 3.9.5 with ModuleNotFoundError: No module named 'xformers' error.

Thomas-MMJ · 2022-10-12T16:43:50Z

The xformers wheel you download has to match your python version (3.8/3.9/3.10) and your cuda version (11.6/11.7/11.8) - if either mismatches it won't work.

salieri-dev · 2022-10-16T10:53:25Z

any success running xformers under WSL?

Thomas-MMJ · 2022-10-16T16:35:35Z

any success running xformers under WSL?

yeah xformers works great for me under wsl. If you don't want to build from source, you can use official ones for some python and CUDA and pytorch combinations,

Just wanted to clarify a few things:
(1) xFormers supports Linux & Windows
(2) We don't have official binaries for windows, but we now (since today) have binaries for linux! You can get them with "conda install xformers -c xformers/label/dev", but they are only available for Python 3.9 or 3.10, CUDA 11.3 or 11.6, and PyTorch 1.12.1
(3) If you don't use binaries, the build can be very long indeed - however it can be significantly faster if you install ninja before, as it can be parallelised. It still takes a dozen minutes on GPUs where we build flash attention (compute capability > 7.5)

Originally posted by @danthe3rd in huggingface/diffusers#532 (comment)

Note that to use deepspeed pinning (used for dreambooth ) under WSL you need the Windows 22H2 (released a week ago) and updated wsl (wsl --update), otherwise it is limited to pinning 2 GB of RAM (for dreambooth it wants to pin 16GB)

C43H66N12O12S2 added 4 commits October 7, 2022 05:21

add xformers attention

f174fb2

Update sd_hijack.py

2eb911b

Update shared.py

da4ab27

Update requirements.txt

cd8bb59

C43H66N12O12S2 requested a review from AUTOMATIC1111 October 7, 2022 02:28

Update sd_hijack.py

35d6b23

Thomas-MMJ reviewed Oct 7, 2022

View reviewed changes

modules/sd_hijack.py Outdated Show resolved Hide resolved

Update sd_hijack.py

5303df2

Update sd_hijack.py

5e3ff84

danthe3rd reviewed Oct 7, 2022

View reviewed changes

modules/sd_hijack_optimizations.py Outdated Show resolved Hide resolved

modules/sd_hijack_optimizations.py Outdated Show resolved Hide resolved

This comment was marked as resolved.

Sign in to view

C43H66N12O12S2 added 2 commits October 8, 2022 04:09

switch to the proper way of calling xformers

c9cc65b

delete broken and unnecessary aliases

b70eaeb

ThereforeGames mentioned this pull request Oct 9, 2022

Enable installation of xformers for Python 3.9 #2032

Closed

xformers attention #1851

xformers attention #1851

Conversation

C43H66N12O12S2 commented Oct 7, 2022 • edited Loading

rabidcopy commented Oct 7, 2022

C43H66N12O12S2 commented Oct 7, 2022

danthe3rd commented Oct 7, 2022

danthe3rd left a comment

Choose a reason for hiding this comment

SafentisFox commented Oct 7, 2022

C43H66N12O12S2 commented Oct 7, 2022

wsippel commented Oct 7, 2022

Thomas-MMJ commented Oct 7, 2022

x02Sylvie commented Oct 7, 2022 • edited Loading

C43H66N12O12S2 commented Oct 7, 2022

C43H66N12O12S2 commented Oct 7, 2022

ArrowM commented Oct 7, 2022

C43H66N12O12S2 commented Oct 7, 2022

This comment was marked as resolved.

Thomas-MMJ commented Oct 7, 2022 • edited Loading

SafentisFox commented Oct 7, 2022 • edited Loading

Thomas-MMJ commented Oct 7, 2022 • edited Loading

ArrowM commented Oct 7, 2022 • edited Loading

C43H66N12O12S2 commented Oct 8, 2022 • edited Loading

C43H66N12O12S2 commented Oct 8, 2022

chekaaa commented Oct 8, 2022 • edited Loading

C43H66N12O12S2 commented Oct 8, 2022

chekaaa commented Oct 8, 2022

leohumnew commented Oct 8, 2022

chekaaa commented Oct 8, 2022

kaneda2004 commented Oct 8, 2022

wstrinz commented Oct 8, 2022

kaneda2004 commented Oct 8, 2022

leohumnew commented Oct 8, 2022

kaneda2004 commented Oct 8, 2022

JustMaier commented Oct 8, 2022

ifffrt commented Oct 8, 2022

kaneda2004 commented Oct 8, 2022

qJake commented Oct 9, 2022

ifffrt commented Oct 9, 2022

C43H66N12O12S2 commented Oct 10, 2022 • edited Loading

duckness commented Oct 10, 2022 • edited Loading

ilcane87 commented Oct 10, 2022

salieri-dev commented Oct 10, 2022 • edited Loading

salieri-dev commented Oct 10, 2022 • edited Loading

qJake commented Oct 10, 2022

Thomas-MMJ commented Oct 11, 2022

ghost commented Oct 11, 2022

Thomas-MMJ commented Oct 11, 2022 • edited Loading

Renaldas111 commented Oct 12, 2022

Thomas-MMJ commented Oct 12, 2022

salieri-dev commented Oct 16, 2022

Thomas-MMJ commented Oct 16, 2022 • edited Loading

C43H66N12O12S2 commented Oct 7, 2022 •

edited

Loading

x02Sylvie commented Oct 7, 2022 •

edited

Loading

Thomas-MMJ commented Oct 7, 2022 •

edited

Loading

SafentisFox commented Oct 7, 2022 •

edited

Loading

Thomas-MMJ commented Oct 7, 2022 •

edited

Loading

ArrowM commented Oct 7, 2022 •

edited

Loading

C43H66N12O12S2 commented Oct 8, 2022 •

edited

Loading

chekaaa commented Oct 8, 2022 •

edited

Loading

C43H66N12O12S2 commented Oct 10, 2022 •

edited

Loading

duckness commented Oct 10, 2022 •

edited

Loading

salieri-dev commented Oct 10, 2022 •

edited

Loading

salieri-dev commented Oct 10, 2022 •

edited

Loading

Thomas-MMJ commented Oct 11, 2022 •

edited

Loading

Thomas-MMJ commented Oct 16, 2022 •

edited

Loading