Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove model.AttnBlock and replace with attention.SpatialSelfAttention. #519

Closed
wants to merge 1 commit into from

Conversation

mh-dm
Copy link
Contributor

@mh-dm mh-dm commented Sep 12, 2022

model.AttnBlock and attention.SpatialSelfAttention make the exact same computation but in different ways so just delete the former. All SpatialSelfAttention optimizations now apply for free.
Also remove pointless del statements.

Please approve #518 first (the other way would lead to conflicts).

@mh-dm mh-dm marked this pull request as draft September 12, 2022 16:59
Copy link

@Thomas-MMJ Thomas-MMJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might want to make a separate pull request for the variable cleanup and the switch from AttnBlock to SpatialSelfAttention, since they are unrelated pull requests.

@mh-dm
Copy link
Contributor Author

mh-dm commented Sep 12, 2022

Good point, del cleanup done in #520

@mh-dm
Copy link
Contributor Author

mh-dm commented Sep 12, 2022

@Any-Winter-4079 Hey, I'm doing a cleanup prompted by experimenting to get dream running on just an intel CPU. Turns out model.AttnBlock is a duplicate of attention.SpatialSelfAttention and can just be removed. Bringing it up since you've made changes to it and may have comments/want to review.
I've tested and looking at max VRAM any memory optimizations (past a quick softmax(dtype=x.dtype) to avoid autocast promotion) don't seem needed.

@lstein
Copy link
Collaborator

lstein commented Sep 12, 2022

Any code that can be deleted is a great step forward. I'll hold off on reviewing this until you've cleared draft status.

@mh-dm mh-dm requested a review from Thomas-MMJ September 12, 2022 22:52
@mh-dm mh-dm marked this pull request as ready for review September 12, 2022 22:52
@lstein
Copy link
Collaborator

lstein commented Sep 13, 2022

model.AttnBlock and attention.SpatialSelfAttention make the exact same computation but in different ways so just delete the former. All SpatialSelfAttention optimizations now apply for free. Also remove pointless del statements.

Please approve #518 first (the other way would lead to conflicts).

I approved and merged this. Just waiting on a review from someone with appropriate expertise.

@lstein lstein requested review from Any-Winter-4079 and removed request for Thomas-MMJ September 13, 2022 14:34
@lstein
Copy link
Collaborator

lstein commented Sep 13, 2022

@Any-Winter-4079 Hey, I'm doing a cleanup prompted by experimenting to get dream running on just an intel CPU. Turns out model.AttnBlock is a duplicate of attention.SpatialSelfAttention and can just be removed. Bringing it up since you've made changes to it and may have comments/want to review. I've tested and looking at max VRAM any memory optimizations (past a quick softmax(dtype=x.dtype) to avoid autocast promotion) don't seem needed.

Interestingly as long as --full_precision is specified, current development branch seems to run on CPU-only intel hardware.

@mh-dm mh-dm force-pushed the model.py branch 2 times, most recently from 2acea38 to 42441a0 Compare September 14, 2022 14:18
@mh-dm
Copy link
Contributor Author

mh-dm commented Sep 14, 2022

Interestingly as long as --full_precision is specified, current development branch seems to run on CPU-only intel hardware.

Yes, the final fix that enabled that was #518

@Vargol
Copy link
Contributor

Vargol commented Sep 16, 2022

Need to do a few more runs, but seeing some fishy results here, second and third iter running at 80% slower
(after applying the >=8 fix)


100%|███████████| 50/50 [04:31<00:00,  5.42s/it]
data: 100%|██████| 1/1 [04:33<00:00, 273.75s/it]
100%|███████████| 50/50 [07:54<00:00,  9.48s/it]
data: 100%|██████| 1/1 [08:01<00:00, 481.76s/it]
100%|███████████| 50/50 [08:04<00:00,  9.68s/it]
data: 100%|██████| 1/1 [08:11<00:00, 491.08s/it]
Sampling: 100%|█████████| 3/3 [20:46<00:00, 415.53s/it]

@mh-dm
Copy link
Contributor Author

mh-dm commented Sep 16, 2022

Does it do the same at development head (after applying the >=8 fix locally)?
Another thing I've noticed is that any other programs actively running significantly affect runtimes.

@Vargol
Copy link
Contributor

Vargol commented Sep 16, 2022

I knew this rang a bell, we've sen it before will changes to Attn

See

#431 (comment)

and read down the conversation.

@Vargol
Copy link
Contributor

Vargol commented Sep 16, 2022

the version checked from development branch runs normally

100%|████████████████████████████████████████████████████| 50/50 [04:03<00:00,  4.88s/it]
100%|████████████████████████████████████████████████████| 50/50 [04:14<00:00,  5.09s/it]
100%|████████████████████████████████████████████████████| 50/50 [04:25<00:00,  5.31s/it]
Generating: 100%|█████████████████████████████████████████| 3/3 [13:18<00:00, 266.29s/it]
>> Usage stats:

Looking at the code, you seems to have replaced the memory optimised model.AttnBlock with
something that hasn't been optimised at all.

@mh-dm mh-dm marked this pull request as draft September 16, 2022 16:29
@mh-dm
Copy link
Contributor Author

mh-dm commented Sep 16, 2022

Thanks for testing.
Moving this PR to draft, will update after landing #582

Birch-san added a commit to Birch-san/stable-diffusion that referenced this pull request Sep 18, 2022
…nd replace with attention.SpatialSelfAttention)
model.AttnBlock and attention.SpatialSelfAttention make the exact same computation but in different ways. Delete the former as model already imports from attention. Apply same softmax dtype optimization as in invoke-ai#495. Note that from testing, tensors in SpatialSelfAttention are significantly smaller so they don't need the same involved memory split optimization as attention.CrossAttention. Tested that max VRAM is unchanged.
@Vargol
Copy link
Contributor

Vargol commented Sep 21, 2022

Still seeing a big slowdown in extra iters.

Current development

100%|███████████| 50/50 [03:33<00:00,  4.27s/it]
data: 100%|██████| 1/1 [03:42<00:00, 222.03s/it]
100%|███████████| 50/50 [03:24<00:00,  4.09s/it]
data: 100%|██████| 1/1 [03:35<00:00, 215.32s/it]
100%|███████████| 50/50 [03:20<00:00,  4.00s/it]
data: 100%|██████| 1/1 [03:31<00:00, 211.11s/it]
Sampling: 100%|██ 3/3 [10:48<00:00, 216.16s/it]

With this PR

100%|███████████| 50/50 [03:28<00:00,  4.17s/it]
data: 100%|██████| 1/1 [03:32<00:00, 212.43s/it]
100%|███████████| 50/50 [07:41<00:00,  9.22s/it]
data: 100%|██████| 1/1 [07:47<00:00, 467.49s/it]
100%|███████████| 50/50 [07:51<00:00,  9.43s/it]data:   0%|                                                                                                data: 100%|██████| 1/1 [07:57<00:00, 477.20s/it]
Sampling: 100%|██| 3/3 [19:17<00:00, 385.71s/it]

@mh-dm
Copy link
Contributor Author

mh-dm commented Sep 21, 2022

Wow you were quick to test after the rebase. Thank you. I wasn't even sure I was going to pursue this.
There are potential benefits here (only on the last 1 out of the usual 50+1 iterations) but my guess is there will be a significant time cost (esp. since I don't have an M1 to quickly iterate) to land this.
Closing pull request.

@mh-dm mh-dm closed this Sep 21, 2022
@mh-dm mh-dm deleted the model.py branch September 21, 2022 17:51
@Vargol
Copy link
Contributor

Vargol commented Sep 21, 2022

You timed it well, gave me something to do while waiting for my supper to cook :-)

Was wondering if there was any benefit from going the other way and using the AttnBlock code in SpatialSelfAttention
I've hacked it in based on the 'it does the same thing so I can copy and paste' and the results where not quite conclusive

100%|██████| 50/50 [03:11<00:00,  3.84s/it]
data: 100%|█| 1/1 [03:19<00:00, 199.11s/it]
100%|██████| 50/50 [03:23<00:00,  4.07s/it]
data: 100%|█| 1/1 [03:35<00:00, 215.20s/it]
100%|██████| 50/50 [03:15<00:00,  3.91s/it]
data: 100%|█| 1/1 [03:27<00:00, 207.76s/it]
Sampling: 100%|██████████| 3/3 [10:22<00:00, 207.36s/it]

Marginally faster, but could be variance, any idea if SpatialSelfAttention is called directly ?

@mh-dm
Copy link
Contributor Author

mh-dm commented Sep 21, 2022

SpatialSelfAttention is unused for txt2img. Probably for other flows too but haven't checked.

@Vargol
Copy link
Contributor

Vargol commented Sep 21, 2022

okay thanks for the reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants