BUG: OOMs with c78b496 #358

tildebyte · 2022-09-04T01:58:14Z

Describe your environment

GPU: [cuda]
VRAM: [8G]
CPU arch: [x86]
OS: [Windows]
Python: [pip/pyenv]
Branch: [HEAD detached at c78b496] NOTE: the immediately previous commit (92d1ed7) does NOT OOM
Commit: [Merge: 92d1ed7 dd2af3f]

Describe the bug
dream.py immediately OOMs when trying to generate anything larger than 512x512

To Reproduce
Steps to reproduce the behavior:

winpty python scripts/dream.py -Ak_euler_a
Wait for dream> prompt
Enter prompt with any dimensions large than 512x512
See error

Expected behavior
No OOM

Additional context
Did an extensive manual bisect and found: 92d1ed7 - No OOMs, c78b496 - Always OOMs

The text was updated successfully, but these errors were encountered:

lstein · 2022-09-04T02:52:06Z

Thanks for doing the bisect. It's a pain with the wait for model initialization. I have a theory about where the OOM is coming from and will look into it before bed tonight.

blessedcoolant · 2022-09-04T03:32:09Z

Can confirm. There's a memory overload somewhere. My max res is generally 512x768 but can only do 512x704 now.

lstein · 2022-09-04T04:09:31Z

I'm pretty sure that it's the variation code that just went in. Peak VRAM usage has jumped up. I'm trying to isolate the problem to understand it. Teaches me a lesson about announcing a release so soon after a major update.

blessedcoolant · 2022-09-04T04:11:36Z

I'm pretty sure that it's the variation code that just went in. Peak VRAM usage has jumped up. I'm trying to isolate the problem to understand it. Teaches me a lesson about announcing a release so soon after a major update.

Surprisingly it started working at the max res again without me doing anything. I'm trying to isolate the problem too but nothing stands out at first glance.

tildebyte · 2022-09-04T04:26:10Z

Teaches me a lesson about announcing a release so soon after a major update

Meh. Live and learn; managing software projects is hard 😀

We should probably institute some kind of testing sign-off, like ask people to volunteer to test, and require at least one ACK per platform.

lstein · 2022-09-04T07:17:40Z

I've been testing using the "Max VRAM" line in the usage stats and the memory regression is definitely occuring at 4fe2657, which is where the variant code was folded in. For a 512x768 image, it took 7.10G of VRAM before the commit, and 7.15G afterward. Not much, but enough to push a 8G card over the edge. Probably the VRAM is being used for system graphics as well, which is why it seemed to cure itself after a while.

I do not understand why the code is causing extra RAM usage when it is not being run, but I'll get it figured out.

lstein · 2022-09-04T08:04:10Z

@bakkot There is a small but significant memory usage regression that appeared when "bakkot-seed-fuzz" was merged into development. The specific commit is 2d65b03. Before the commit, the prompt "banana sushi" -W512 -H768 used 7.10G peak VRAM. After the commit the same prompt used 7.16G.

It's not a large difference, but it's enough to exhaust memory on 8G GPUs running 512x768 images. Presumably the system needs some VRAM too for its display.

I've been hunting and there's nothing obvious happening. Indeed, none of the variation code gets run unless the -v or -F options are specified. So it's mysterious, but the only other set of changes that went in at that time were web server-related ones, which shouldn't have an effect.

When you have a chance, could you see if you can find what I'm missing? Thx

lstein · 2022-09-04T11:59:33Z

Hey guess what? The discussion in #364 pointed me at an attention.py optimization that reduces the peak VRAM usage of my test prompt from 4.4G to 3.6G. When generating a 512x768 image, it uses 5.30G whereas previously it was using 7.16G.

I've pulled it into refactor-simplet2i if you want to stress test it.

blessedcoolant · 2022-09-04T14:42:44Z

Hey guess what? The discussion in #364 pointed me at an attention.py optimization that reduces the peak VRAM usage of my test prompt from 4.4G to 3.6G. When generating a 512x768 image, it uses 5.30G whereas previously it was using 7.16G.

I've pulled it into refactor-simplet2i if you want to stress test it.

Wow. Max VRAM usage dropped from 7.xx GB to 5.3x GB at the moment. Any trade offs?

However I've noticed that the lowered VRAM usage does not mean I can run larger resolutions. If I try to boot a larger image from the prompt, I get OOM'd anyway.

Edit: I implemented just the attention code myself. I can now run 576x768 which is a jump from 512 to 768 but anything above that still OOM's.

bakkot · 2022-09-04T20:35:32Z

@tildebyte #375 should fix the original regression, independently of other optimizations; sorry about that.

bakkot mentioned this issue Sep 4, 2022

put no_grad decorator on make_image closures #375

Merged

lstein closed this as completed Sep 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: OOMs with c78b496 #358

BUG: OOMs with c78b496 #358

tildebyte commented Sep 4, 2022

lstein commented Sep 4, 2022

blessedcoolant commented Sep 4, 2022

lstein commented Sep 4, 2022 •

edited

Loading

blessedcoolant commented Sep 4, 2022 •

edited

Loading

tildebyte commented Sep 4, 2022

lstein commented Sep 4, 2022

lstein commented Sep 4, 2022 •

edited

Loading

lstein commented Sep 4, 2022

blessedcoolant commented Sep 4, 2022 •

edited

Loading

bakkot commented Sep 4, 2022

BUG: OOMs with c78b496 #358

BUG: OOMs with c78b496 #358

Comments

tildebyte commented Sep 4, 2022

lstein commented Sep 4, 2022

blessedcoolant commented Sep 4, 2022

lstein commented Sep 4, 2022 • edited Loading

blessedcoolant commented Sep 4, 2022 • edited Loading

tildebyte commented Sep 4, 2022

lstein commented Sep 4, 2022

lstein commented Sep 4, 2022 • edited Loading

lstein commented Sep 4, 2022

blessedcoolant commented Sep 4, 2022 • edited Loading

bakkot commented Sep 4, 2022

lstein commented Sep 4, 2022 •

edited

Loading

blessedcoolant commented Sep 4, 2022 •

edited

Loading

lstein commented Sep 4, 2022 •

edited

Loading

blessedcoolant commented Sep 4, 2022 •

edited

Loading