Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: OOMs with c78b496 #358

Closed
tildebyte opened this issue Sep 4, 2022 · 10 comments
Closed

BUG: OOMs with c78b496 #358

tildebyte opened this issue Sep 4, 2022 · 10 comments

Comments

@tildebyte
Copy link
Contributor

Describe your environment

  • GPU: [cuda]
  • VRAM: [8G]
  • CPU arch: [x86]
  • OS: [Windows]
  • Python: [pip/pyenv]
  • Branch: [HEAD detached at c78b496] NOTE: the immediately previous commit (92d1ed7) does NOT OOM
  • Commit: [Merge: 92d1ed7 dd2af3f]

Describe the bug
dream.py immediately OOMs when trying to generate anything larger than 512x512

To Reproduce
Steps to reproduce the behavior:

  1. winpty python scripts/dream.py -Ak_euler_a
  2. Wait for dream> prompt
  3. Enter prompt with any dimensions large than 512x512
  4. See error

Expected behavior
No OOM

Additional context
Did an extensive manual bisect and found: 92d1ed7 - No OOMs, c78b496 - Always OOMs

@lstein
Copy link
Collaborator

lstein commented Sep 4, 2022

Thanks for doing the bisect. It's a pain with the wait for model initialization. I have a theory about where the OOM is coming from and will look into it before bed tonight.

@blessedcoolant
Copy link
Collaborator

Can confirm. There's a memory overload somewhere. My max res is generally 512x768 but can only do 512x704 now.

@lstein
Copy link
Collaborator

lstein commented Sep 4, 2022

I'm pretty sure that it's the variation code that just went in. Peak VRAM usage has jumped up. I'm trying to isolate the problem to understand it. Teaches me a lesson about announcing a release so soon after a major update.

@blessedcoolant
Copy link
Collaborator

blessedcoolant commented Sep 4, 2022

I'm pretty sure that it's the variation code that just went in. Peak VRAM usage has jumped up. I'm trying to isolate the problem to understand it. Teaches me a lesson about announcing a release so soon after a major update.

Surprisingly it started working at the max res again without me doing anything. I'm trying to isolate the problem too but nothing stands out at first glance.

@tildebyte
Copy link
Contributor Author

Teaches me a lesson about announcing a release so soon after a major update

Meh. Live and learn; managing software projects is hard 😀

We should probably institute some kind of testing sign-off, like ask people to volunteer to test, and require at least one ACK per platform.

@lstein
Copy link
Collaborator

lstein commented Sep 4, 2022

I've been testing using the "Max VRAM" line in the usage stats and the memory regression is definitely occuring at 4fe2657, which is where the variant code was folded in. For a 512x768 image, it took 7.10G of VRAM before the commit, and 7.15G afterward. Not much, but enough to push a 8G card over the edge. Probably the VRAM is being used for system graphics as well, which is why it seemed to cure itself after a while.

I do not understand why the code is causing extra RAM usage when it is not being run, but I'll get it figured out.

@lstein
Copy link
Collaborator

lstein commented Sep 4, 2022

@bakkot There is a small but significant memory usage regression that appeared when "bakkot-seed-fuzz" was merged into development. The specific commit is 2d65b03. Before the commit, the prompt "banana sushi" -W512 -H768 used 7.10G peak VRAM. After the commit the same prompt used 7.16G.

It's not a large difference, but it's enough to exhaust memory on 8G GPUs running 512x768 images. Presumably the system needs some VRAM too for its display.

I've been hunting and there's nothing obvious happening. Indeed, none of the variation code gets run unless the -v or -F options are specified. So it's mysterious, but the only other set of changes that went in at that time were web server-related ones, which shouldn't have an effect.

When you have a chance, could you see if you can find what I'm missing? Thx

@lstein
Copy link
Collaborator

lstein commented Sep 4, 2022

Hey guess what? The discussion in #364 pointed me at an attention.py optimization that reduces the peak VRAM usage of my test prompt from 4.4G to 3.6G. When generating a 512x768 image, it uses 5.30G whereas previously it was using 7.16G.

I've pulled it into refactor-simplet2i if you want to stress test it.

@blessedcoolant
Copy link
Collaborator

blessedcoolant commented Sep 4, 2022

Hey guess what? The discussion in #364 pointed me at an attention.py optimization that reduces the peak VRAM usage of my test prompt from 4.4G to 3.6G. When generating a 512x768 image, it uses 5.30G whereas previously it was using 7.16G.

I've pulled it into refactor-simplet2i if you want to stress test it.

Wow. Max VRAM usage dropped from 7.xx GB to 5.3x GB at the moment. Any trade offs?

However I've noticed that the lowered VRAM usage does not mean I can run larger resolutions. If I try to boot a larger image from the prompt, I get OOM'd anyway.


Edit: I implemented just the attention code myself. I can now run 576x768 which is a jump from 512 to 768 but anything above that still OOM's.

@bakkot
Copy link
Contributor

bakkot commented Sep 4, 2022

@tildebyte #375 should fix the original regression, independently of other optimizations; sorry about that.

@lstein lstein closed this as completed Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants