Remove model.AttnBlock and replace with attention.SpatialSelfAttention. #519

mh-dm · 2022-09-12T15:43:17Z

model.AttnBlock and attention.SpatialSelfAttention make the exact same computation but in different ways so just delete the former. All SpatialSelfAttention optimizations now apply for free.
Also remove pointless del statements.

Please approve #518 first (the other way would lead to conflicts).

Thomas-MMJ

you might want to make a separate pull request for the variable cleanup and the switch from AttnBlock to SpatialSelfAttention, since they are unrelated pull requests.

mh-dm · 2022-09-12T17:27:33Z

Good point, del cleanup done in #520

mh-dm · 2022-09-12T19:05:57Z

@Any-Winter-4079 Hey, I'm doing a cleanup prompted by experimenting to get dream running on just an intel CPU. Turns out model.AttnBlock is a duplicate of attention.SpatialSelfAttention and can just be removed. Bringing it up since you've made changes to it and may have comments/want to review.
I've tested and looking at max VRAM any memory optimizations (past a quick softmax(dtype=x.dtype) to avoid autocast promotion) don't seem needed.

lstein · 2022-09-12T20:35:08Z

Any code that can be deleted is a great step forward. I'll hold off on reviewing this until you've cleared draft status.

lstein · 2022-09-13T14:34:35Z

model.AttnBlock and attention.SpatialSelfAttention make the exact same computation but in different ways so just delete the former. All SpatialSelfAttention optimizations now apply for free. Also remove pointless del statements.

Please approve #518 first (the other way would lead to conflicts).

I approved and merged this. Just waiting on a review from someone with appropriate expertise.

lstein · 2022-09-13T14:54:45Z

@Any-Winter-4079 Hey, I'm doing a cleanup prompted by experimenting to get dream running on just an intel CPU. Turns out model.AttnBlock is a duplicate of attention.SpatialSelfAttention and can just be removed. Bringing it up since you've made changes to it and may have comments/want to review. I've tested and looking at max VRAM any memory optimizations (past a quick softmax(dtype=x.dtype) to avoid autocast promotion) don't seem needed.

Interestingly as long as --full_precision is specified, current development branch seems to run on CPU-only intel hardware.

mh-dm · 2022-09-14T14:23:35Z

Interestingly as long as --full_precision is specified, current development branch seems to run on CPU-only intel hardware.

Yes, the final fix that enabled that was #518

Vargol · 2022-09-16T15:32:13Z

Need to do a few more runs, but seeing some fishy results here, second and third iter running at 80% slower
(after applying the >=8 fix)


100%|███████████| 50/50 [04:31<00:00,  5.42s/it]
data: 100%|██████| 1/1 [04:33<00:00, 273.75s/it]
100%|███████████| 50/50 [07:54<00:00,  9.48s/it]
data: 100%|██████| 1/1 [08:01<00:00, 481.76s/it]
100%|███████████| 50/50 [08:04<00:00,  9.68s/it]
data: 100%|██████| 1/1 [08:11<00:00, 491.08s/it]
Sampling: 100%|█████████| 3/3 [20:46<00:00, 415.53s/it]

mh-dm · 2022-09-16T15:33:55Z

Does it do the same at development head (after applying the >=8 fix locally)?
Another thing I've noticed is that any other programs actively running significantly affect runtimes.

Vargol · 2022-09-16T15:50:37Z

I knew this rang a bell, we've sen it before will changes to Attn

See

#431 (comment)

and read down the conversation.

Vargol · 2022-09-16T16:28:31Z

the version checked from development branch runs normally

100%|████████████████████████████████████████████████████| 50/50 [04:03<00:00,  4.88s/it]
100%|████████████████████████████████████████████████████| 50/50 [04:14<00:00,  5.09s/it]
100%|████████████████████████████████████████████████████| 50/50 [04:25<00:00,  5.31s/it]
Generating: 100%|█████████████████████████████████████████| 3/3 [13:18<00:00, 266.29s/it]
>> Usage stats:

Looking at the code, you seems to have replaced the memory optimised model.AttnBlock with
something that hasn't been optimised at all.

mh-dm · 2022-09-16T16:30:22Z

Thanks for testing.
Moving this PR to draft, will update after landing #582

…nd replace with attention.SpatialSelfAttention)

model.AttnBlock and attention.SpatialSelfAttention make the exact same computation but in different ways. Delete the former as model already imports from attention. Apply same softmax dtype optimization as in invoke-ai#495. Note that from testing, tensors in SpatialSelfAttention are significantly smaller so they don't need the same involved memory split optimization as attention.CrossAttention. Tested that max VRAM is unchanged.

Vargol · 2022-09-21T17:33:49Z

Still seeing a big slowdown in extra iters.

Current development

100%|███████████| 50/50 [03:33<00:00,  4.27s/it]
data: 100%|██████| 1/1 [03:42<00:00, 222.03s/it]
100%|███████████| 50/50 [03:24<00:00,  4.09s/it]
data: 100%|██████| 1/1 [03:35<00:00, 215.32s/it]
100%|███████████| 50/50 [03:20<00:00,  4.00s/it]
data: 100%|██████| 1/1 [03:31<00:00, 211.11s/it]
Sampling: 100%|██ 3/3 [10:48<00:00, 216.16s/it]

With this PR

100%|███████████| 50/50 [03:28<00:00,  4.17s/it]
data: 100%|██████| 1/1 [03:32<00:00, 212.43s/it]
100%|███████████| 50/50 [07:41<00:00,  9.22s/it]
data: 100%|██████| 1/1 [07:47<00:00, 467.49s/it]
100%|███████████| 50/50 [07:51<00:00,  9.43s/it]data:   0%|                                                                                                data: 100%|██████| 1/1 [07:57<00:00, 477.20s/it]
Sampling: 100%|██| 3/3 [19:17<00:00, 385.71s/it]

mh-dm · 2022-09-21T17:50:57Z

Wow you were quick to test after the rebase. Thank you. I wasn't even sure I was going to pursue this.
There are potential benefits here (only on the last 1 out of the usual 50+1 iterations) but my guess is there will be a significant time cost (esp. since I don't have an M1 to quickly iterate) to land this.
Closing pull request.

Vargol · 2022-09-21T18:06:05Z

You timed it well, gave me something to do while waiting for my supper to cook :-)

Was wondering if there was any benefit from going the other way and using the AttnBlock code in SpatialSelfAttention
I've hacked it in based on the 'it does the same thing so I can copy and paste' and the results where not quite conclusive

100%|██████| 50/50 [03:11<00:00,  3.84s/it]
data: 100%|█| 1/1 [03:19<00:00, 199.11s/it]
100%|██████| 50/50 [03:23<00:00,  4.07s/it]
data: 100%|█| 1/1 [03:35<00:00, 215.20s/it]
100%|██████| 50/50 [03:15<00:00,  3.91s/it]
data: 100%|█| 1/1 [03:27<00:00, 207.76s/it]
Sampling: 100%|██████████| 3/3 [10:22<00:00, 207.36s/it]

Marginally faster, but could be variance, any idea if SpatialSelfAttention is called directly ?

mh-dm · 2022-09-21T18:08:37Z

SpatialSelfAttention is unused for txt2img. Probably for other flows too but haven't checked.

Vargol · 2022-09-21T18:09:54Z

okay thanks for the reply.

mh-dm marked this pull request as draft September 12, 2022 16:59

Thomas-MMJ suggested changes Sep 12, 2022

View reviewed changes

mh-dm force-pushed the model.py branch from 97d3faf to a6221b0 Compare September 12, 2022 18:30

mh-dm force-pushed the model.py branch from a6221b0 to cd21ed0 Compare September 12, 2022 22:34

mh-dm requested a review from Thomas-MMJ September 12, 2022 22:52

mh-dm marked this pull request as ready for review September 12, 2022 22:52

lstein requested review from Any-Winter-4079 and removed request for Thomas-MMJ September 13, 2022 14:34

mh-dm force-pushed the model.py branch 2 times, most recently from 2acea38 to 42441a0 Compare September 14, 2022 14:18

mh-dm marked this pull request as draft September 16, 2022 16:29

Birch-san added a commit to Birch-san/stable-diffusion that referenced this pull request Sep 18, 2022

cherry-pick @mh-dm's invoke-ai/InvokeAI#519 (Remove model.AttnBlock a…

d2dd6f5

…nd replace with attention.SpatialSelfAttention)

Birch-san mentioned this pull request Sep 18, 2022

Keep VRAM usage and faster slicing consistent in attention.py #582

Merged

mh-dm force-pushed the model.py branch from 42441a0 to 607b077 Compare September 21, 2022 16:24

mh-dm closed this Sep 21, 2022

mh-dm deleted the model.py branch September 21, 2022 17:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove model.AttnBlock and replace with attention.SpatialSelfAttention. #519

Remove model.AttnBlock and replace with attention.SpatialSelfAttention. #519

mh-dm commented Sep 12, 2022

Thomas-MMJ left a comment

mh-dm commented Sep 12, 2022

mh-dm commented Sep 12, 2022

lstein commented Sep 12, 2022

lstein commented Sep 13, 2022

lstein commented Sep 13, 2022

mh-dm commented Sep 14, 2022

Vargol commented Sep 16, 2022 •

edited

Loading

mh-dm commented Sep 16, 2022 •

edited

Loading

Vargol commented Sep 16, 2022 •

edited

Loading

Vargol commented Sep 16, 2022

mh-dm commented Sep 16, 2022 •

edited

Loading

Vargol commented Sep 21, 2022

mh-dm commented Sep 21, 2022

Vargol commented Sep 21, 2022 •

edited

Loading

mh-dm commented Sep 21, 2022

Vargol commented Sep 21, 2022

Remove model.AttnBlock and replace with attention.SpatialSelfAttention. #519

Remove model.AttnBlock and replace with attention.SpatialSelfAttention. #519

Conversation

mh-dm commented Sep 12, 2022

Thomas-MMJ left a comment

Choose a reason for hiding this comment

mh-dm commented Sep 12, 2022

mh-dm commented Sep 12, 2022

lstein commented Sep 12, 2022

lstein commented Sep 13, 2022

lstein commented Sep 13, 2022

mh-dm commented Sep 14, 2022

Vargol commented Sep 16, 2022 • edited Loading

mh-dm commented Sep 16, 2022 • edited Loading

Vargol commented Sep 16, 2022 • edited Loading

Vargol commented Sep 16, 2022

mh-dm commented Sep 16, 2022 • edited Loading

Vargol commented Sep 21, 2022

mh-dm commented Sep 21, 2022

Vargol commented Sep 21, 2022 • edited Loading

mh-dm commented Sep 21, 2022

Vargol commented Sep 21, 2022

Vargol commented Sep 16, 2022 •

edited

Loading

mh-dm commented Sep 16, 2022 •

edited

Loading

Vargol commented Sep 16, 2022 •

edited

Loading

mh-dm commented Sep 16, 2022 •

edited

Loading

Vargol commented Sep 21, 2022 •

edited

Loading