MPS support for doggettx-optimizations #431

Any-Winter-4079 · 2022-09-07T22:28:21Z

Okay, so I've seen @lstein has added
x = x.contiguous() if x.device.type == 'mps' else x
to ldm/modules/attention.py in the doggettx-optimizations branch
but there's another error happening how
KeyError: 'active_bytes.all.current'
and this has to do with this function in attention.py

def forward(self, x, context=None, mask=None):
        h = self.heads

        q_in = self.to_q(x)
        context = default(context, x)
        k_in = self.to_k(context)
        v_in = self.to_v(context)
        del context, x

        q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> (b h) n d', h=h), (q_in, k_in, v_in))
        del q_in, k_in, v_in

        r1 = torch.zeros(q.shape[0], q.shape[1], v.shape[2], device=q.device)

        stats = torch.cuda.memory_stats(q.device)
        mem_active = stats['active_bytes.all.current']
        mem_reserved = stats['reserved_bytes.all.current']
        mem_free_cuda, _ = torch.cuda.mem_get_info(torch.cuda.current_device())
        mem_free_torch = mem_reserved - mem_active
        mem_free_total = mem_free_cuda + mem_free_torch

        gb = 1024 ** 3
        tensor_size = q.shape[0] * q.shape[1] * k.shape[1] * 4
        mem_required = tensor_size * 2.5
        steps = 1

        if mem_required > mem_free_total:
            steps = 2**(math.ceil(math.log(mem_required / mem_free_total, 2)))
            # print(f"Expected tensor size:{tensor_size/gb:0.1f}GB, cuda free:{mem_free_cuda/gb:0.1f}GB "
            #       f"torch free:{mem_free_torch/gb:0.1f} total:{mem_free_total/gb:0.1f} steps:{steps}")

        if steps > 64:
            max_res = math.floor(math.sqrt(math.sqrt(mem_free_total / 2.5)) / 8) * 64
            raise RuntimeError(f'Not enough memory, use lower resolution (max approx. {max_res}x{max_res}). '
                               f'Need: {mem_required/64/gb:0.1f}GB free, Have:{mem_free_total/gb:0.1f}GB free')

        slice_size = q.shape[1] // steps if (q.shape[1] % steps) == 0 else q.shape[1]
        for i in range(0, q.shape[1], slice_size):
            end = i + slice_size
            s1 = einsum('b i d, b j d -> b i j', q[:, i:end], k) * self.scale

            s2 = s1.softmax(dim=-1)
            del s1

            r1[:, i:end] = einsum('b i j, b j d -> b i d', s2, v)
            del s2

        del q, k, v

        r2 = rearrange(r1, '(b h) n d -> b n (h d)', h=h)
        del r1

        return self.to_out(r2)

Which is basically the code that detects your free memory, and then splits the softmax operation in steps, to allow to generate larger images.

Now, because we are on Mac, I'm not sure @lstein can help us much (unless he has one around), but I open this issue for anyone that wants to collaborate in porting this functionality to M1

The text was updated successfully, but these errors were encountered:

lstein · 2022-09-08T00:07:34Z

If someone knows how to get free VRAM memory on MPS devices, we just need to replace the torch.cuda calls.

lstein · 2022-09-08T03:49:00Z

I Googled around, and there doesn't seem to be an equivalent set of memory interrogation calls for CPU.

I'm not sure how the M1 works, but if it is sharing main memory (i.e. RAM) you might be able to get the needed metrics using psutil

Vargol · 2022-09-08T11:17:26Z

I just hacked it all out in my fork and set slice_size to 1 :-) that gets me doing 1024x1024 (very slowly) on a 8Gb M1 mini.
Be interesting to see the results of that on a larger GPU.

Any-Winter-4079 · 2022-09-08T14:20:08Z

I just hacked it all out in my fork and set slice_size to 1 :-) that gets me doing 1024x1024 (very slowly) on a 8Gb M1 mini.
Be interesting to see the results of that on a larger GPU.

It definitely works. I'll add the results below.
In this v1,
I've changed ldm/modules/diffusionmodules/model.py and ldm/modules/attention.py from Doggettx-optimization branch.
attention.py.zip
model.py.zip

In model.py I've commented out

# stats = torch.cuda.memory_stats(q.device)
# mem_active = stats['active_bytes.all.current']
# mem_reserved = stats['reserved_bytes.all.current']
# mem_free_cuda, _ = torch.cuda.mem_get_info(torch.cuda.current_device())
# mem_free_torch = mem_reserved - mem_active
# mem_free_total = mem_free_cuda + mem_free_torch

and left steps at 1

tensor_size = q.shape[0] * q.shape[1] * k.shape[2] * 4
mem_required = tensor_size * 2.5
steps = 1

And commented out

# if mem_required > mem_free_total:
#     steps = 2**(math.ceil(math.log(mem_required / mem_free_total, 2)))

so there's probably an improvement to be made using psutil here.

Where I did use psutil is in attention.py
Again, convention this out

# stats = torch.cuda.memory_stats(q.device)
# mem_active = stats['active_bytes.all.current']
# mem_reserved = stats['reserved_bytes.all.current']
# mem_free_cuda, _ = torch.cuda.mem_get_info(torch.cuda.current_device())
# mem_free_torch = mem_reserved - mem_active
# mem_free_total = mem_free_cuda + mem_free_torch

but importing psutil
import psutil
and using
mem_free_total = psutil.virtual_memory().available
So we can use it to calculate steps (the same way they do), instead of leaving it at steps=1

gb = 1024 ** 3
tensor_size = q.shape[0] * q.shape[1] * k.shape[1] * 4
mem_required = tensor_size * 2.5
steps = 1

if mem_required > mem_free_total:
        steps = 2**(math.ceil(math.log(mem_required / mem_free_total, 2)))

Any-Winter-4079 · 2022-09-08T14:21:22Z

Definitely a step in the right direction

lstein · 2022-09-08T20:43:57Z

This is looking pretty encouraging. When you are satisfied with the performance on MPS, could you make your changes conditional on the device type so that CUDA systems will work as well? Then make a PR against the doggettx-optimizations branch.

Think this might be done by tonight? I'm planning a development freeze, some testing, and then pulling into main over the weekend.

Any-Winter-4079 · 2022-09-08T22:06:32Z

That's the plan, yes, to make the changes conditional based on device type.
I'm not sure about the PR tonight (b/c I've never done a pull request and it's already 00:00h here, so I might be a bit tired to look into how to make one tonight -fork, pull...), but I'll leave my code here in #431 in 1-1h30' max.

Any-Winter-4079 · 2022-09-08T22:33:42Z

Okay, changes are done. I'm doing the testing.

Any-Winter-4079 · 2022-09-08T23:17:00Z

attention 3.py.zip
model 2.py.zip

Any-Winter-4079 · 2022-09-08T23:23:12Z

@lstein Above are the files.
ldm/modules/diffusionmodules/model.py and ldm/modules/attention.py

Performance seems comparable to CompViz vanilla with some M1 workarounds dbc8fc79008795875eb22ebf0c57927061af86bc (lstein fork) which is the best performance I've seen on M1.

Regarding memory, I have to do more digging because while this afternoon I could generate 896x896 and 1024x768 (results I couldn't generate before), now at night I'm back to memory errors.

In any case, this change should benefit CUDA users while allowing MPS devices to (apparently/presumably/hopefully) function at least as well as we currently do on the development branch

Any-Winter-4079 · 2022-09-09T00:23:05Z

In the end, still awake :)
And I've found something very interesting for version 2 of these changes (post merge into development)
https://pullanswer.com/questions/mps-mpsndarray-error-product-of-dimension-sizes-2-31

The common error I think all M1 users get of Error: product of dimension sizes > 2**31' is referenced here

which made me think about tinkering with values, and setting steps = 64 (max value), it generates a 1024x1024 on my M1!
However, if we do it like this (currently the way in Doggettx-optimizations branch) steps = 2**(math.ceil(math.log(mem_required / mem_free_total, 2))) it fails

It takes a long time with steps=64, but testing around, it also works with steps=32, and even steps=4 (taking much less time).

Pretty nice, and calls for some testing tomorrow

PS: I'd just merge the 2 files above and leave this "finding" for a future PR.

heurihermilab · 2022-09-09T02:07:43Z

I can confirm that it's working and an improvement. Speedup was 2x over plain development branch, and now I'm testing larger image sizes...

Environment: Development branch with the two files above swapped in. Machine: MBP 14", M1 Pro, 16GB, latest OS, running miniforge with a base of Python 3.10.6. Browser: Firefox 104.0.2.

Vargol · 2022-09-09T08:19:21Z

which made me think about tinkering with values, and setting steps = 64 (max value), it generates a 1024x1024 on my M1! However, if we do it like this (currently the way in Doggettx-optimizations branch) steps = 2**(math.ceil(math.log(mem_required / mem_free_total, 2))) it fails

That's odd I've managed a very slow 1024x1024 from doggettx's optimaztions on my 8Gb M1
Here's the code I'm using, as I'm said before it was a cut of lstein's paint but with the Doggettx
code added and hard coded to 1 step.

https://github.com/Vargol/stable-diffusion_m1_8gb

Have you got an lot of other stuff running at the same time eating up Memory ?

Any-Winter-4079 · 2022-09-09T10:13:21Z

This is my memory usage right after booting the computer. Only the model loaded + 1 VS tab with the code

So, I introduced 2 prints

print('mem_required', mem_required / 10**9)
print('mem_free_total', mem_free_total / 10**9)

and I get

mem_required 25.17630976
mem_free_total 51.890225152

The mem_free_total is calculated with mem_free_total = psutil.virtual_memory().available and it makes sense with the picture above from Activity Monitor. (64 - 13) GB = 51 GB
However, it crashes with Error: product of dimension sizes > 2**31 even though in theory, it only needs 25GB. Which doesn't make sense, does it?

In this discussion https://pullanswer.com/questions/mps-mpsndarray-error-product-of-dimension-sizes-2-31 they were saying that the problem was with Metal and that depending on the size/number of dimensions of the operation (e.g. einsum), a different algorithm might get selected.

So maybe you give it a smaller array and it fails but feed it a bigger array and chooses a different algorithm that doesn't have the Error: product of dimension sizes > 2**31 bug and it works. That's my understanding.

Any-Winter-4079 · 2022-09-09T10:24:37Z

For example, setting the steps as you do (with fixed value instead of calculating it), and "banana sushi" -s50 -C7.5 -n3 -W896 -H896 here are my results:

With steps = 1

mem_required 25.17630976
mem_free_total 53.156200448

Result: Error: product of dimension sizes > 2**31

With steps = 2

mem_required 25.17630976
mem_free_total 52.520927232

It works.

Why, it's really not apparent to me.

The problem with hard-setting the steps, though, is that, as the code progresses, mem_free_total reduces (note: leak or expected behavior ?).

mem_required 25.17630976
mem_free_total 32.847593472

So, maybe there could be a point where it failed mid-execution, because we hard-set steps = 2 and it can't do it anymore.

The solution I'm thinking is a mix of both techniques. Setting the steps dynamically (so it doesn't run out of memory), but also setting steps = max (2, steps) -not letting it be steps = 1, where it throws the error- for images larger than 512x512 or something like that.

Vargol · 2022-09-09T11:21:04Z

The 2**31 seems to be einsum trying to use a Tensor with more than 2,147,483,648 values as part of its calculation
it not a memory thing, just a bug or limitation in the einsum implementation somewhere.

I remember have a similar issue when I simply set steps to 1 but allowed the slice_size calculation to go ahead

slice_size = q.shape[1] // steps if (q.shape[1] % steps) == 0 else q.shape[1]

which wasn't in an older cut of the code..

my code doing this , is that what you tried ?

        steps=1

        for i in range(0, q.shape[0], steps):
            end = i + steps

And yes I appreciate that if people try even bigger images they may run out of memory but for me more steps just means slower renders and 1024x1024 is already 50 S / IT, as in n_sample=50, n_iter=1 takes 40 odd Minutes to generate an image .

lstein · 2022-09-09T13:10:36Z

attention 3.py.zip model 2.py.zip

Looks like I was monitoring the wrong thread! I'll fold in these changes this morning and freeze development for testing. Thanks so much for this.

Any-Winter-4079 · 2022-09-09T14:29:19Z

@Vargol I can take slice_size up to ~10k. More than that, it seems to fail.

Have you tried bigger slice_size? You are basically using 1 as slice_size, which seems very low (even for 8GB)

I tried something very similar to your code, simply with larger slice_size, in my case using this formula they have slice_size = q.shape[1] // steps if (q.shape[1] % steps) == 0 else q.shape[1] which depends on steps.

For example, for "banana sushi" -s5 -C7.5 -n3 -W896 -H896 then q.shape = torch.Size([16, 12544, 40]) so q.shape[1] is 12544 and given that my mem_free_total >= mem_required, it maintains steps = 1, so with the formula above, keeps slice_size = 12544 and fails.

We should be able to find a sweet spot, shouldn't we?

Any-Winter-4079 · 2022-09-09T15:04:11Z

For example,
slice_size = 8192 (or below) allows me to run 896x896
slice_size = 6000 (or below) allows me to run 1024x1024

Hopefully there's some formula we can come up with for all M1 machines (8GB to 128GB)

Update:
So for 1024x1024, doggettx-optimizations branch suggests I use slice_size = 8192 (which fails)
But manually hard-setting slice_size = 8185 works.
So their calculation is not far off, but not completely precise for M1

Vargol · 2022-09-09T15:40:13Z

So with my fixed value steps = slice_size, running
dream> "banana sushi" -s1 -C7.5 -n1 -W832 -H768

1 - 6 steps work, 6 steps are over 5x slower than 1 step

step = slice_size = 1
 1/1 [00:20<00:00, 20.45s/it

step = slice_size = 6
1/1 [01:50<00:00, 110.50s/it]

7 - 10 steps blow memory while sampling,

step = slice_size = 7 - 10
	The Metal Performance Shaders operations encoded on it may not have completed.
	Error: 
	(null)
	Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
	<AGXG13GFamilyCommandBuffer: 0x16d1049d0>

steps >= 11 fail with a oversized? buffer before sampling shows up in dreams.py

step = slice_size = 11
RuntimeError: Invalid buffer size: 5.94 GB

Any-Winter-4079 · 2022-09-09T15:43:56Z

@Vargol Hmm I'll study your case too.
To me, it happened the weirdest thing. Doggettx branch suggested 8192 slice size. Guess what? It failed. But 8191 works for 1024x1024

Any-Winter-4079 · 2022-09-09T15:58:25Z

Oh, this is interesting. So my computer can take slice_size = 8191 for 1024x1024. Found by trial and error.

Okay, so what slice_size could It take for 896x896? Well, I did (1024x1024 / 896x896) * 8191 = 10698.4489796, which rounding up is 10699. I tried that value and... it works!

But, I tried 10700 (one more) and it fails!

I'm sure there is a formula to be found (including RAM), but at least we seem to be able to hack the max slice_size for our own devices, which is awesome!

Update: So, I picked a random size. I wanted a 3200x1600 image. I used the formula and slice_size = 1677.5168. This time, I could not round up, but rounding down to 1677, it works again! (I only did it for 1 step but hey, it completed successfully)
And 1678 fails.

Any-Winter-4079 · 2022-09-09T16:09:44Z

I'll study it a bit more, but the problem with Doggettx (besides 8192 vs 8191) is that sometimes it suggests an even larger slice_size (I guess when it computes steps = 1 instead of steps = 2 based on memory), and then it breaks. If it weren't for that, maybe we could have used Doggettx's slice_size - 1

Any-Winter-4079 · 2022-09-09T16:48:37Z

@i3oc9i can you try your max slice_size for say 1024x1024 in your Mac with 128GB? We might be able to work out a formula including the RAM. Or someone else with a Mac different than 64 GB (which I have)

i3oc9i · 2022-09-09T18:47:08Z

@Any-Winter-4079
sorry, this last two days I was busy at my work, I just checkout the development branch 75f633c

and fail, may be I'm missing somethibg ? whre is the new code to test ?

dream> "a don on the moon" -s50 -W1024 -H1024 -C7.5 -Ak_lms
>> This input is larger than your defaults. If you run out of memory, please use a smaller image.
/Users/ivano/Code/Ai/dream-dev/ldm/modules/embedding_manager.py:152: UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at  /Users/runner/work/_temp/anaconda/conda-bld/pytorch_1659484612588/work/aten/src/ATen/mps/MPSFallback.mm:11.)
  placeholder_idx = torch.where(
Generating:   0%|                                                                                                                                                                                                                                                                                                                          | 0/1 [00:00<?, ?it/s/AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:705: failed assertion `[MPSNDArray initWithDevice:descriptor:] Error: product of dimension sizes > 2**31'                                                                  | 0/50 [00:00<?, ?it/s]
zsh: abort      python scripts/dream.py --full_precision --outdir ../@Stuffs/images/samples
/Users/ivano/.miniconda/envs/dream-dev/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

dream> "test" -s50 -W832 -H832 -C7.5 -Ak_lms -S12345678 (run but I get noise)

ryudrigo · 2022-09-09T19:32:45Z

Could someone with a Mac please run these lines?

import torch
print (torch.cuda.get_device_name(0))

That's the best way I know to detect if it's a Mac GPU, but I couldn't find what to check it against. Thanks!

Any-Winter-4079 · 2022-09-09T19:33:54Z

@ryudrigo

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/eduardoarinopelegrin/opt/anaconda3/envs/do_not_touch-osx-arm64-stable-diffusion/lib/python3.9/site-packages/torch/cuda/__init__.py", line 329, in get_device_name
    return get_device_properties(device).name
  File "/Users/eduardoarinopelegrin/opt/anaconda3/envs/do_not_touch-osx-arm64-stable-diffusion/lib/python3.9/site-packages/torch/cuda/__init__.py", line 359, in get_device_properties
    _lazy_init()  # will define _get_device_properties
  File "/Users/eduardoarinopelegrin/opt/anaconda3/envs/do_not_touch-osx-arm64-stable-diffusion/lib/python3.9/site-packages/torch/cuda/__init__.py", line 211, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Basically the command torch.cuda is going to fail because we don't have cuda.
You can detect it like this: device_type = 'mps' if x.device.type == 'mps' else 'cuda' Here you have 2 files that check for Mac GPU #431 (comment)

Doggettx · 2022-09-09T20:27:14Z

I'll study it a bit more, but the problem with Doggettx (besides 8192 vs 8191) is that sometimes it suggests an even larger slice_size (I guess when it computes steps = 1 instead of steps = 2 based on memory), and then it breaks. If it weren't for that, maybe we could have used Doggettx's slice_size - 1

I wouldn't adjust the slice_size because then it starts running incomplete parts of the whole array. It's best to increase the multiplier, which is probably too low then. So this part:

mem_required = tensor_size * 2.5

Probably needs more than .5 extra, could try 2.6 or if you want to be safe just put it at 3, it'll just scale up the steps a bit earlier than needed. Which scales down the slice_size

On a side note, it doesn't really have to step up in powers of 2, I just found that that was faster on average. You could change this part:

    slice_size = q.shape[1] // steps if (q.shape[1] % steps) == 0 else q.shape[1]
    for i in range(0, q.shape[1], slice_size):
        end = i + slice_size

To something like

    slice_size = q.shape[1] // steps
    for i in range(0, q.shape[1], slice_size):
        end = min(q.shape[1], i + slice_size)

then it can run at any step or slice_size (even higher than 64, but you'll crash later then anyhow due to other parts running out of memory)

netsvetaev · 2022-09-13T13:19:30Z

@netsvetaev do you have other stuff running ?

I would have expected a chunk more available, 8 - 9 Gb based on what by 8Gb has available at that point.

Sorry, I had a safari window. Now 6.99gb and 1.30s/it. But 768 is very slow/doesn't work.

Any-Winter-4079 · 2022-09-13T13:20:15Z

As a new threshold, I propose maybe 3GB?
Vargol had a peak of 2.08GB
The other option is to use psutil.virtual_memory().total instead of psutil.virtual_memory().available

Any-Winter-4079 · 2022-09-13T13:24:04Z

@netsvetaev can you run 1024x1024? Does it take 10 minutes? 10 minutes is what the other person with Mac and 16GB reported.
Also, this should produce the same results but I have hopefully fixed the 512x512 issue using mem_total so you get around 1.3s/it
attention 12.py.zip

netsvetaev · 2022-09-13T13:32:40Z

@netsvetaev can you run 1024x1024? Does it take 10 minutes?

It takes 30 at 35-37s/it. Hm.
Noticed that after I close terminal and open it again, it shows more ram available, +200-500mb.

768 gives an error, then goes for 150s/it.

rror: command buffer exited with error status. | 0/50 [00:00<?, ?it/s] The Metal Performance Shaders operations encoded on it may not have completed. Error: (null) Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory) <AGXG13XFamilyCommandBuffer: 0x288c234a0> label = <none> device = <AGXG13XDevice: 0x13f646400> name = Apple M1 Pro commandQueue = <AGXG13XFamilyCommandQueue: 0x12a34fe00> label = <none> device = <AGXG13XDevice: 0x13f646400> name = Apple M1 Pro retainedReferences = 1

Any-Winter-4079 · 2022-09-13T13:34:18Z

Some advice is, every time you run a dream command and completes, exit with q because at least a few days ago, there were some memory leaks. And then you can re-enter again and have a bit more RAM

Any-Winter-4079 · 2022-09-13T13:35:02Z

In any case, have you been able to run 1024x1024 with better results with some other code?

netsvetaev · 2022-09-13T13:36:30Z

In any case, have you been able to run 1024x1024 with better results with some other code?

No, my best was around 35-40, no differences with your code. Maybe 1-2s.

Any-Winter-4079 · 2022-09-13T13:38:05Z

You know, it might be because you have a bit less RAM available than the other person with 16GB. I can't find any other reason. Architecture? Pytorch version?

Any-Winter-4079 · 2022-09-13T13:40:21Z

As a final test, do you get better results for large images with slice_size = 2 instead of slice_size = 1 It might break though.
Here

netsvetaev · 2022-09-13T13:40:49Z

You know, it might be because you have a bit less RAM available than the other person with 16GB. I can't find any other reason. Architecture? Pytorch version?

MacOS 13 beta, I think.

Any-Winter-4079 · 2022-09-13T13:42:46Z

You know, it might be because you have a bit less RAM available than the other person with 16GB. I can't find any other reason. Architecture? Pytorch version?

MacOS 13 beta, I think.

I'm running 12.5.1. No idea if there is any performance improvement/loss with the beta. I'd assume these results are more RAM-dependant than OS version-dependant, but who knows :)

Any-Winter-4079 · 2022-09-13T13:54:25Z

This is the version I'm planning on doing a PR with https://github.com/lstein/stable-diffusion/discussions/457#discussioncomment-3635644 If someone experiences a downgrade in performance vs. before, let me know.

netsvetaev · 2022-09-13T14:12:38Z

attention 12.py.zip

Getting better:
512 — 1.6 (from 1.8 to 1.2)
768 — 9.8, was 13
1024 — 35.

Any-Winter-4079 · 2022-09-13T14:14:55Z

Oh, so happy to read that!

i3oc9i · 2022-09-13T14:19:54Z

attention 12.py.zip

Sorry bu I dont see speed difference with this.
I get the same execution time for 512x512 and 896x576

Any-Winter-4079 · 2022-09-13T14:25:31Z

Not for 64-128GB. The update is to give more speed for 16 -32GB Mac.
For us it should be the same speed.
I've done the PR #540

netsvetaev · 2022-09-13T15:38:31Z

Oh, so happy to read that!

Is it ok that I have better results on a main branch? Seems like it less ram hungry and also faster. 512 at 1.48 (yours is 1.35), 1024 at 29.7 with up to 4.2gb swap.

@Doggettx

…aboration incorporating a lot of people's contributions -- including for example @Doggettx and the original code from @neonsecret on which the Doggetx optimizations were based (see invoke-ai/InvokeAI#431, https://github.com/sd-webui/stable-diffusion-webui/pull/771\#issuecomment-1239716055). Takes exactly the same amount of time to run 8 steps as original CompVis code does (10.4 secs, ~1.25s/it).

netsvetaev · 2022-09-14T20:33:21Z

I'm happy to add that on the latest macos 13 beta 1.14 main is got faster as 1.15s/it on 512px (58s total, always was 1:15-1:25), 8.4it/s on 768 (7:18, was 10-12 mins), and still 35s/it on 1024.

UPD. After a fresh install I've got 1.38s/it 512, 6.20s/it 768 and 20.5s/it 1024. So it were mine problems.

@Doggettx

…aboration incorporating a lot of people's contributions -- including for example @Doggettx and the original code from @neonsecret on which the Doggetx optimizations were based (see invoke-ai/InvokeAI#431, https://github.com/sd-webui/stable-diffusion-webui/pull/771\#issuecomment-1239716055). Takes exactly the same amount of time to run 8 steps as original CompVis code does (10.4 secs, ~1.25s/it).

@Doggettx

…aboration incorporating a lot of people's contributions -- including for example @Doggettx and the original code from @neonsecret on which the Doggetx optimizations were based (see invoke-ai/InvokeAI#431, https://github.com/sd-webui/stable-diffusion-webui/pull/771\#issuecomment-1239716055). Takes exactly the same amount of time to run 8 steps as original CompVis code does (10.4 secs, ~1.25s/it). (#1177) Co-authored-by: Alex Birch <birch-san@users.noreply.github.com>

@hafiidz

* resolve conflict with master * - Added option to select custom models instead of just using the default one, if you want to use a custom model just place your .ckpt file in "models/custom" and the UI will detect it and let you switch between stable diffusion and your custom model, make sure to give the filename a proper name that is easy to distinguish from other models because that name will be used on the UI. - Implemented basic Text To Video tab, will continue to improve it as it is really basic right now. - Improved the model loading, you now should see less frequently errors about it not been loaded correctly. * fix: advanced editor (#827), close #811 refactor js_Call hook to take all gradio arguments * Added num_inference_steps to config file and fixed incorrectly calls to the config file from the txt2vid tab calling txt2img instead. * update readme as per installation step & format * proposed streamlit code organization changes I want people of all skill levels to be able to contribute This is one way the code could be split up with the aim of making it easy to understand and contribute especially for people on the lower end of the skill spectrum All i've done is split things, I think renaming and reorganising is still needed * Fixed missing diffusers dependency for Streamlit * Streamlit: Allow user defaults to be specified in a userconfig_streamlit.yaml file. * Changed Streamit yaml default configs Changed `update_preview_frequency` from every 1 step to every 5 steps. This results in a massive gain in performance (roughly going from 2-3 times slower to only 10-15% slower) while still showing good image generation output. Changed default GFPGAN and realESRGAN settings to be off by default. That way, users can decide if they want to use them on, and what images they wish to do so. * Made sure img2txt and img2img checkboxes respect YAML defaults * Move location of user file to configs/webui folder * Fixed the path in webui_streamlit.py * Display Info and Stats when render is complete, similar to what Gradio shows. * Add info and stats to img2img * chore: update maintenance scripts and docs (#891) * automate conda_env_name as per name in yaml * Embed installation links directly in README.md Include links to Windows, Linux, and Google Colab installations. * Fix conda update in webui.sh for pip bug * Add info about new PRs Co-authored-by: Hafiidz <3688500+Hafiidz@users.noreply.github.com> Co-authored-by: Tom Pham <54967380+TomPham97@users.noreply.github.com> Co-authored-by: GRMrGecko <grmrgecko@gmail.com> * Improvements to the txt2vid tab. * Urgent Fix to PR:860 * Update attention.py * Update FUNDING.yml * when in outcrop mode, mask added regions and fill in with voroni noise for better outpainting * frontend: display current device info (#889) Displays the current device info at the bottom of the page. For users who run multiple instances of `sd-webui` on the same system (for multiple GPUs), it helps to know which of the active `CUDA_VISIBLE_DEVICES` is being used. * Fixed aspect ratio box not being updated on txt2img tab, for issue 219 from old repo (#812) * Metadata cleanup - Maintain metadata within UI (#845) * Metadata cleanup - Maintain metadata within UI This commit, when combined with Gradio 3.2.1b1+, maintains image metadata as an image is passed throughout the UI. For example, if you generate an image, send it to Image Lab, upscale it, fix faces, and then drag the resulting image back in to Image Lab, it will still remember the image generation parameters. When the image is saved, the metadata will be stripped from it if save-metadata is not enabled. If the image is saved by *dragging* out of the UI on to the filesystem it may maintain its metadata. Note: I have ran into UI responsiveness issues with upgrading Gradio. Seems there may be some Gradio queue management issues. *Without* the gradio update this commit will maintain current functionality, but will not keep meetadata when dragging an image between UI components. * Move ImageMetadata into its own file Cleans up webui, enables webui_streamlit et al to use it as well. * Fix typo * Add filename formatting argument (#908) * Update webui.py Filename formatting argument * Update scripts/webui.py Co-authored-by: Thomas Mello <work.mello@gmail.com> * Tiling parameter (#911) * tiling * default to False * fix: filename format parameter (#923) * For issue :884, ensure webui.cmd before init src * Remove embeddings file path * Add mask_restore to restore images based on mask, fixing #665 (#898) * Add mask_restore option to give users the option to restore images based on mask, fixing #665. Before commit c73fdd7 (Implement masking during sampling to improve blending, #308) image mask was applied after sampling, resulting in masked parts that are not regenerated to actually stay the same. Since c73fdd7 the masked img2img will change the whole image, even in masked areas. It gives better looking results at first glance, but will result in image degredation when applied a few times. See issue #665. In the workflow of using repeated masked img2img, users may want to use this options to keep the parts of image they actually want to keep without image degradation. A final masked img2img or whole image img2img with mask_restore disabled will give the better blending of "Implement masking during sampling". * revert changes of a7be43b in change_image_editor_mode * fix ui_functions.change_image_editor_mode by adding gr.update to the end of the list it returns * revert inserted newlines and whitespaces to match format of previous code * improve caption of new option mask_restore "Only modify regenerated parts of image" * fix ui_functions.change_image_editor_mode by adding gr.update to the end of the list it returns an old copy of the function exists in webui.py, this superflous function mistakenly was changed by the earlier commit b6a9e16 * remove unused functions that are near duplicates of functions in ui_functions.py * Added CSS to center the image in the txt2img interface * add img2img option for color correction. (#936) color correction is already used for loopback to prevent color drift with the first image as correction target. the option allows to use the color correction even without loopback mode. it helps keeping the colors similar to the input image. * Image transparency is used as mask for inpainting * fix: lost imports from #921 * Changed StreamIt to `k_euler` 30 steps as default * Fixed an issue with the txt2vid model. * Removed old files from a split test we deed that are not needed anymore, we plan to do the split differently. * Changed the scheduler for the txt2vid tab back to LMS, for now we can only use that. * Better support for large batches in optimized mode * Removed some unused lines from the css file for the streamlit version. * Changed the diffusers version to be 0.2.4 or lower as a new version breaks the txt2vid generation. * Added the models/custom folder to gitignore to ignore custom models. * Added two new scripts that will be used for the new implementation of the txt2vid tab which uses the latest version of the diffusers library. * - Improved the progress bar for the txt2vid tab, it now shows more information during generation. - Changed the guidance_scale variable to be cfg_scale. * Perform masked image restoration for GFPGAN, RealESRGAN, fixing #947 * Perform masked image restoration when using GFPGAN or RealESRGAN, fixing #947. Also fixes bug in image display when using masked image restoration with RealESRGAN. When the image is upscaled using RealESRGAN the image restoration can not use the original image because it has wrong resolution. In this case the image restoration will restore the non-regenerated parts of the image with an RealESRGAN upscaled version of the original input image. Modifications from GFPGAN or color correction in (un)masked parts are also restored to the original image by mask blending. * Update scripts/webui.py Co-authored-by: Thomas Mello <work.mello@gmail.com> * fix: sampler name in GoBig #988 * add sampler_name defaults to img2img * add metadata to other file output file types * remove deprecated kwargs/parameter * refactor: sort out dependencies Co-Authored-By: oc013 <101832295+oc013@users.noreply.github.com> Co-Authored-By: Aarni Koskela <akx@iki.fi> Co-Authored-By: oc013 <101832295+oc013@users.noreply.github.com> Co-Authored-By: Aarni Koskela <akx@iki.fi> * webui: detect scoped-down GPU environment (#993) * webui: detect scoped-down GPU environment check if we're using a scoped-down GPU environment (pynvml does not listen to CUDA_VISIBLE_DEVICES) so that we can measure memory on the correct GPU * remove unnecessary import * Added piexif dependency. * Changed the minimum value for the Sampling Steps and Inference Steps to 10 and added step with a value of 10 to make it easier to move the slider as it will require a higher maximum value than in other tabs for good results on the text2vid tab. * Commented an import that is not used for now but will be used soon. * write same metadata to file and yaml * include piexif in environment needed for exif labelling of non-png files * fix individual image file format saves * introduces a general config setting save_format similar to grid_format for individual file saves * Add NSFW filter to avoid unexpected (#955) * Add NSFW filter to avoid unexpected * Fix img2img configuration numbering * Added some basic layout for the Model Manager tab and added there the models that most people use to make it easy to download instead of having to go do the wiki or searching through discord for links, it also shows the path where you are supposed to put those models for them to work. * webui: display the GPU in use during startup (#994) * webui: display the GPU in use during startup tell the user which GPU the code is actually going to use before spending lots of time loading everything onto the GPU * typo * add some info messages * evaluate current GPU properly * add debug flag gating not everyone wants or needs to see debug messages :) * add in stray debug msg * Docker updates - Add LDSR, streamlit, other updates for new repository * Update util.py * Docker - Set PYTHONPATH to parent directory to avoid `No module named frontend` error * Add missing comma for nsfw toggle in img2img (#1028) * Multiple improvements to the txt2vid tab. - Improved txt2vid speed by 2 times. - Added DDIM scheduler. - Added sliders for beta_start and beta_end to have more control over these parameters on the scheduler. - Added option to select the scheduler type from scaled_linear or linear. - Added option to save info files for the txt2vid tab and improved the information saved to include most of the parameters used to run the generation. - You can now download any model from the huggingface website to use on the txt2vid tab, just add the name to the custom_models_list on the config file. * webui: add prompt output to console (#1031) * webui: add prompt output to console show the user what prompt is currently being rendered * fix prompt print location * support negative prompts separated by ### e.g. "shopping mall ### people" will try to generate an image of a mall without people in it. * Docker validate model files if not a symlink in case user has VALIDATE_MODELS=false set (#1038) * - Added changes made by @hafiidz on the ui-improvements branch to the css for the streamli-on-hover-tabs component. * Added streamlit-on-Hover-tabs and streamlit-option-menu dependencies to the environment.yaml file. * Changed some values to be dynamic instead of a fixed value so they are more responsive. * Changed the cmd script to use the dark theme by default when launching the streamlit UI. * Removed the padding at the top of the sidebar so we can have more free space. * - Added code for @hafiidz's changes made on the css. * Fixed an error with the metadata not able to be saved because of the seed was not converted to a string before so it had no attribute encode on it. * add masking to streamlit img2img, find_noise_for_image, matched_noise * Use the webui script directories as PWD (#946) * add Gradio API endpoint settings (#1055) * add Gradio API endpoint settings * Add comments crediting code authors. (probably not enough, but better than none) * Renamed the save_grid option for txt2vid on the config file to be save_video, this will be used to determine if the user wants to save a video at the end of the generation or not, similar to the save_grid that is used on txt2img and img2img but for video. * -Added the Update Image Preview option to be part of the current tab options under Preview Settings. - Added Dynamic Preview Frequency option for the txt2vid tab which tries to find the lowest value for update_preview_frequency at which we can update the preview image during generation while at the same time minimizing the impact it has in performance. - Added option to save a video file on the outputs/txt2vid-samples folder after the generation is complete similar to how the save_grid option works on other tabs. - Added a video preview which shows a video on the txt2vid tab when the generation is completed. - Formated some lines of code to make it use less space and fit on the a single screen. - Added a script called Settings.py to the script folder in which Settings for the Setting page will be placed. Empty for now. * Commented some print statements that were used for debugging and forgot to remove previously. * fix: disable live prompt parsing * Fix issue where loopback was using batch mode * Fix indentation error that prevents mask_restore from working unless ESRGAN is turned on * Fixed Sidebar CSS for 4K displays * img2img mask fixes and fix image2noise normalization * Made it so the sampling_steps is added to num_inference_steps, otherwise it would not match the value you set for it on the slider. * Changed the loading of the model on the txt2vid tab so the half models are only loaded if the no_half option on the config file is set to False. * fix: launcher batch file fix #920, fix #605 - Allow reading environment.yaml file in either LF or CRLF - Only update environment if environment.yaml changes - Remove custom_conda_path to discourage changing source file - Fix unable to launch webui due to frontend module missing (#605) * Update README.md (#1075) fix typo * half precision streamlit txt2vid `RuntimeError: expected scalar type Half but found Float` with both `torch_dtype=torch.float16` and `revision="fp16"` * Add mask restore feature to streamlit, prevent color correction from modifying initial image when mask_restore is turned on * Add mask_restore to streamlit config * JobManager: Fix typo breaking jobs close #858 close #1041 * JobManager: Buttons skip queue (#1092) Have JobManager buttons skip Gradio's queue, since otherwise they aren't sending JobManager button presses. * The webui_streamlit.py file has been split into multiple modules containing their own code making it easier to work with than a single big file. The list of modules is as follow: - webuit_streamlit.py: contains the main layout as well as the functions that load the css which is needed by the layout. - webui_streamlit_old.py: contains the code for the previous version of the WebUI. Will be removed once the new UI code starts to get used and if everything works as it should. - txt2img.py: contains the code for the txt2img tab. - img2img.py: contains the code for the img2img tab. - txt2vid.py: contains the code for the txt2vid tab. - sd_utils.py: contains utility functions used by more than one module, any function that meets such condition should be placed here. - ModelManager.py: contains the code for the Model Manager page on the sidebar menu. - Settings.py: contains the code for the Settings page on the sidebar menu. - home.py: contains the code for the Home tab, history and gallery implemented by @devilismyfriend. - imglab.py: contains the code for the Image Lab tab implemented by @devilismyfriend * fix: patch docker conda install pip requirements (#1094) (cherry picked from commit fab5765) Co-authored-by: Sérgio <smaisidoro@gmail.com> * Using the Optimization from Dogettx (#974) * Update attention.py * change to dogettx Co-authored-by: hlky <106811348+hlky@users.noreply.github.com> * Update Dockerfile (#1101) add expose for streamlit port * Publish Streamlit ports (#1102) (cherry picked from commit 833a910) Co-authored-by: Charlie <outlookhazy@users.noreply.github.com> * Forgot to call the layout() for the Model Manager tab after the import so it was not been used and the tab was shown as empty. * Removed the "find_noise_for_image.py" and "matched_noise.py" scripts as their content is now part of "sd_utils.py" * - Added the functions to load the optimized models, this "should" make it so optimized and turbo mode work now but needs to be tested more. - Added function to load LDSR. * Fixed some imports. * Fixed the info message on the txt2img tab not showing the info but just showing the text "Done" * Made the defaults settings from the config file be stored inside st.session_state to avoid loading it multiple times when calling the "sd_utils.py" file from other modules. * Moved defaults to the webui_streamlit.py file and fixed some imports. * Removed condition to check if the defaults are in the st.session_state dictionary, this is not needed and would cause issues with it not being reloaded when the user changes something on it. * Modified the way the defaults settings are loaded from the config file so we only load them on the webui_strealit.py file and use st.session_state to access them from anywhere else, this makes it so the config can be modified externally like before the code split and the changes will be updated on next rerun of the UI. * fix: [streamlit] optimization mode * temp disable nvml support for multiple gpus * Fixed defaults not being loaded correctly or missing in some places. * Add a separate update script instead of git pull on startup (#1106) * - Fixed max_frame not being properly used and instead sampling_steps was the variable being use. - Fixed several issues with wrong variable being used on multiple places. - Addd option to toggle some extra option from the config file for when the model is loading on the txt2vid tab. * Re-merge #611 - View/Cancel in-progress diffusions (#796) * JobManager: Re-merge #611 PR #611 seems to have got lost in the shuffle after the transition to 'dev'. This commit re-merges the feature branch. This adds support for viewing preview images as the image generates, as well as cancelling in-progress images and a couple fixes and clean-ups. * JobManager: Clear jobs that fail to start Sometimes if a job fails to start it will get stuck in the active job list. This commit ensures that jobs that raise exceptions are cleared, and also adds a start timer to clear out jobs that fail to start within a reasonable amount of time. * chore: add breaks to cmds for readability (#1134) * Added custom models list to the txt2img tab. * Small fix to the custom model list. * Corrected breaking issues introduced in #1136 to txt2img and made state variables consistent with img2img. Fixed a bug where switching models after running would not reload the used model. * Formatted tabs as spaces * Fixed update_preview_frequency and update_preview using defaults from webui_streamlit.yaml instead of state variables from UI. * Prompt user if they want to restore changes (#1137) - After stashing any changes and pulling updates, ask user if they wish to pop changes - If user declines the restore, drop the stash to prevent the case of an ever growing stash pile * Added streamlit_nested_layout component as dependency and imported on the webui_streamli.py file to allow us to use nested columns and expanders. * - Added the Home tab made by @devilismyfriend - Added gallery tab on txt2img. * Added case insensitivity to restore prompt (#1152) * Calculate aspect ratio and pixel count on start (#1157) * Fix errors rendering galleries when there are not enough images to render * Fix the gallery back/next buttons and add a refresh button * Fix invalid invocation of find_noise_for_image * Removed the Home tab until the gallery is fixed. * Fixed a missing import on the ModelManager script. * Added discord server link to the Readme.md * - Increased the max value for the width and height sliders on the txt2img tab. - Fixed a leftover line from removing the home tab. * Update conda environment on startup always (#1171) * Update environment on startup always * Message to explicitly state no environment.yaml update required Co-authored-by: hlky <106811348+hlky@users.noreply.github.com> * environment update from .cmd * Update .gitignore * Enable negative prompts on streamlit * - Bumped the version of diffusers used on the txt2vid tab to be now v0.3.0. - Added initial file for the textual inversion tab. * add missing argument to GoBig sample function, fixes #1183 (#1184) * cherry-pick @Any-Winter-4079's invoke-ai/InvokeAI#540. this is a collaboration incorporating a lot of people's contributions -- including for example @Doggettx and the original code from @neonsecret on which the Doggetx optimizations were based (see invoke-ai/InvokeAI#431, https://github.com/sd-webui/stable-diffusion-webui/pull/771\#issuecomment-1239716055). Takes exactly the same amount of time to run 8 steps as original CompVis code does (10.4 secs, ~1.25s/it). (#1177) Co-authored-by: Alex Birch <birch-san@users.noreply.github.com> * allow webp uploads to img2img tab #991 * Don't attempt mask restoration when there is no mask given (#1186) * When running a batch with preview turned on, produce a grid of preview images * When early terminating, generation_callback gets invoked but st.session_state is empty. When this happens, just bail. * Collect images for final display This is a collection of several changes to enhance image display: * When using GFPGAN or RealESRGAN, only the final output will be displayed. * In batch>1 mode, each final image will be collected into an image grid for display * The image is constrained to a reasonable size to ensure that batch grids of RealESRGAN'd images don't end up spitting out a massive image that the browser then has to handle. * Additionally, the progress bar indicator is updated as each image is post-processed. * Display the final image before running postprocessing, and don't preview when i=0 * Added a config option to use embeddings from the huggingface stable diffusion concept library. * Added option to enable enable_attention_slicing and enable_minimal_memory_usage, this for now only works on txt2vid which uses diffusers. * Basic implementation for the Concept Library tab made by cloning the Home tab. * Temporarily hide sd_concept_library due to missing layout() * st.session_state["defaults"] fix * Used loaded_model state variable in .yaml generation (#1196) Used loaded_model state variable in .yaml generation * Streamlit txt2img page settings now follow defaults (#1195) * Some options on the Streamlit txt2img page now follow the defaults from the relevant config files. * Fixed a copy-paste gone wrong in my previous commit. * st.session_state["defaults"] fix Co-authored-by: hlky <106811348+hlky@users.noreply.github.com> * default img2img denoising strength increased * slider_steps and slider_bounds in defaults config slider_steps and slider_bounds in defaults config * fix: copy to clipboard button Co-authored-by: ZeroCool940711 <alejandrogilelias940711@gmail.com> Co-authored-by: ZeroCool <ZeroCool940711@users.noreply.github.com> Co-authored-by: Hafiidz <3688500+Hafiidz@users.noreply.github.com> Co-authored-by: hlky <106811348+hlky@users.noreply.github.com> Co-authored-by: Joshua Kimsey <jkimsey95@gmail.com> Co-authored-by: Tony Beeman <beeman@gmail.com> Co-authored-by: Tom Pham <54967380+TomPham97@users.noreply.github.com> Co-authored-by: GRMrGecko <grmrgecko@gmail.com> Co-authored-by: TingTingin <36141041+TingTingin@users.noreply.github.com> Co-authored-by: Logan zoellner <nagolinc@gmail.com> Co-authored-by: M <mchaker@users.noreply.github.com> Co-authored-by: James Pound <jamespoundiv@gmail.com> Co-authored-by: cobryan05 <13701027+cobryan05@users.noreply.github.com> Co-authored-by: Michoko <michoko@hotmail.com> Co-authored-by: VulumeCode <2590984+VulumeCode@users.noreply.github.com> Co-authored-by: xaedes <xaedes@googlemail.com> Co-authored-by: Michael Hearn <git@mikehearn.net> Co-authored-by: Soul-Burn <sugoibaka@gmail.com> Co-authored-by: JJ <jjisnow@gmail.com> Co-authored-by: oc013 <101832295+oc013@users.noreply.github.com> Co-authored-by: Aarni Koskela <akx@iki.fi> Co-authored-by: osi1880vr <87379616+osi1880vr@users.noreply.github.com> Co-authored-by: Rae Fu <rraefu@gmail.com> Co-authored-by: Brian Semrau <brian.semrau@gmail.com> Co-authored-by: Matt Soucy <git@msoucy.me> Co-authored-by: endomorphosis <endomorphosis@users.noreply.github.com> Co-authored-by: unnamedplugins <79282950+unnamedplugins@users.noreply.github.com> Co-authored-by: Syahmi Azhar <prsyahmi@gmail.com> Co-authored-by: Ahmad Abdullah <83442967+ahmad1284@users.noreply.github.com> Co-authored-by: Sérgio <smaisidoro@gmail.com> Co-authored-by: Charlie <outlookhazy@users.noreply.github.com> Co-authored-by: protoplm <protoplmz@gmail.com> Co-authored-by: Ascended <dspradau@gmail.com> Co-authored-by: JuanLagu <32816584+JuanLagu@users.noreply.github.com> Co-authored-by: Chris Heald <cheald@gmail.com> Co-authored-by: Charles Galant <cgalant@gmail.com> Co-authored-by: Alex Birch <birch-san@users.noreply.github.com> Co-authored-by: protoplm <57930981+protoplm@users.noreply.github.com> Co-authored-by: Dekker3D <dekker3d@gmail.com>

lstein · 2022-10-11T07:32:30Z

On my HPC node, I also see a remarkable variation in VRAM usage in the doggettx branch as I make small adjustments to image size. Fortunately, it is pretty stable at lower image sizes where most people will be working.

Any-Winter-4079 · 2022-10-11T10:21:05Z

@lstein There was a later update to attention.py in #582
These are the two major PR's I believe.

Performance improvements to generate larger images in M1 invoke-ai#431 Update attention.py Added dtype=r1.dtype to softmax

@blessedcoolant

commit 1c649e4 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Mon Sep 12 13:29:16 2022 -0400 fix torchvision dependency version #511 commit 4d197f6 Merge: a3e07fb 190ba78 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Mon Sep 12 07:29:19 2022 -0400 Merge branch 'development' of github.com:lstein/stable-diffusion into development commit a3e07fb Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Mon Sep 12 07:28:58 2022 -0400 fix grid crash commit 9fa1f31 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Mon Sep 12 07:07:05 2022 -0400 fix opencv and realesrgan dependencies in mac install commit 190ba78 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Mon Sep 12 01:50:58 2022 -0400 Update requirements-mac.txt Fixed dangling dash on last line. commit 25d9ccc Author: Any-Winter-4079 <50542132+Any-Winter-4079@users.noreply.github.com> Date: Mon Sep 12 03:17:29 2022 +0200 Update model.py commit 9cdf3ac Author: Any-Winter-4079 <50542132+Any-Winter-4079@users.noreply.github.com> Date: Mon Sep 12 02:52:36 2022 +0200 Update attention.py Performance improvements to generate larger images in M1 invoke-ai#431 Update attention.py Added dtype=r1.dtype to softmax commit 49a96b9 Author: Mihai <299015+mh-dm@users.noreply.github.com> Date: Sat Sep 10 16:58:07 2022 +0300 ~7% speedup (1.57 to 1.69it/s) from switch to += in ldm.modules.attention. (invoke-ai#482) Tested on 8GB eGPU nvidia setup so YMMV. 512x512 output, max VRAM stays same. commit aba94b8 Author: Niek van der Maas <mail@niekvandermaas.nl> Date: Fri Sep 9 15:01:37 2022 +0200 Fix macOS `pyenv` instructions, add code block highlight (invoke-ai#441) Fix: `anaconda3-latest` does not work, specify the correct virtualenv, add missing init. commit aac5102 Author: Henry van Megen <h.vanmegen@gmail.com> Date: Thu Sep 8 05:16:35 2022 +0200 Disabled debug output (invoke-ai#436) Co-authored-by: Henry van Megen <hvanmegen@gmail.com> commit 0ab5a36 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 17:19:46 2022 -0400 fix missing lines in outputs commit 5e43372 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 16:20:14 2022 -0400 upped max_steps in v1-finetune.yaml and fixed TI docs to address invoke-ai#493 commit 7708f4f Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 16:03:37 2022 -0400 slight efficiency gain by using += in attention.py commit b86a1de Author: blessedcoolant <54517381+blessedcoolant@users.noreply.github.com> Date: Mon Sep 12 07:47:12 2022 +1200 Remove print statement styling (invoke-ai#504) Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com> commit 4951e66 Author: chromaticist <mhostick@gmail.com> Date: Sun Sep 11 12:44:26 2022 -0700 Adding support for .bin files from huggingface concepts (invoke-ai#498) * Adding support for .bin files from huggingface concepts * Updating documentation to include huggingface .bin info commit 79b445b Merge: a323070 f7662c1 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 15:39:38 2022 -0400 Merge branch 'development' of github.com:lstein/stable-diffusion into development commit a323070 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 15:28:57 2022 -0400 update requirements for new location of gfpgan commit f7662c1 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 15:00:24 2022 -0400 update requirements for changed location of gfpgan commit 93c242c Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 14:47:58 2022 -0400 make gfpgan_model_exists flag available to web interface commit c7c6cd7 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 14:43:07 2022 -0400 Update UPSCALE.md New instructions needed to accommodate fact that the ESRGAN and GFPGAN packages are now installed by environment.yaml. commit 77ca83e Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 14:31:56 2022 -0400 Update CLI.md Final documentation tweak. commit 0ea145d Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 14:29:26 2022 -0400 Update CLI.md More doc fixes. commit 162285a Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 14:28:45 2022 -0400 Update CLI.md Minor documentation fix commit 37c921d Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 14:26:41 2022 -0400 documentation enhancements commit 4f72cb4 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 13:05:38 2022 -0400 moved the notebook files into their own directory commit 878ef2e Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 12:58:06 2022 -0400 documentation tweaks commit 4923118 Merge: 16f6a67 defafc0 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 12:51:25 2022 -0400 Merge branch 'development' of github.com:lstein/stable-diffusion into development commit defafc0 Author: Dominic Letz <dominic@diode.io> Date: Sun Sep 11 18:51:01 2022 +0200 Enable upscaling on m1 (invoke-ai#474) commit 16f6a67 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 12:47:26 2022 -0400 install GFPGAN inside SD repository in order to fix 'dark cast' issue invoke-ai#169 commit 0881d42 Author: blessedcoolant <54517381+blessedcoolant@users.noreply.github.com> Date: Mon Sep 12 03:52:43 2022 +1200 Docs Update (invoke-ai#466) Authored-by: @blessedcoolant Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com> commit 9a29d44 Author: Gérald LONLAS <gerald@lonlas.com> Date: Sun Sep 11 23:23:18 2022 +0800 Revert "Add 3x Upscale option on the Web UI (invoke-ai#442)" (invoke-ai#488) This reverts commit f8a5408. commit d301836 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 10:52:19 2022 -0400 can select prior output for init_img using -1, -2, etc commit 70aa674 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 10:34:06 2022 -0400 merge PR invoke-ai#495 - keep using float16 in ldm.modules.attention commit 8748370 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 10:22:32 2022 -0400 negative -S indexing recovers correct previous seed; closes issue invoke-ai#476 commit 839e30e Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 10:02:44 2022 -0400 improve CUDA VRAM monitoring extra check that device==cuda before getting VRAM stats commit bfb2781 Author: tildebyte <337875+tildebyte@users.noreply.github.com> Date: Sat Sep 10 10:15:56 2022 -0400 fix(readme): add note about updating env via conda (invoke-ai#475) commit 5c43988 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 10 10:02:43 2022 -0400 reduce VRAM memory usage by half during model loading * This moves the call to half() before model.to(device) to avoid GPU copy of full model. Improves speed and reduces memory usage dramatically * This fix contributed by @mh-dm (Mihai) commit 9912270 Merge: 817c4a2 ecc6b75 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 10 09:54:34 2022 -0400 Merge branch 'development' of github.com:lstein/stable-diffusion into development commit 817c4a2 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 10 09:53:27 2022 -0400 remove -F option from normalized prompt; closes invoke-ai#483 commit ecc6b75 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 10 09:53:27 2022 -0400 remove -F option from normalized prompt commit 723d074 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Fri Sep 9 18:49:51 2022 -0400 Allow ctrl c when using --from_file (invoke-ai#472) * added ansi escapes to highlight key parts of CLI session * adjust exception handling so that ^C will abort when reading prompts from a file commit 75f633c Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Fri Sep 9 12:03:45 2022 -0400 re-add new logo commit 10db192 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Fri Sep 9 09:26:10 2022 -0400 changes to dogettx optimizations to run on m1 * Author @Any-Winter-4079 * Author @dogettx Thanks to many individuals who contributed time and hardware to benchmarking and debugging these changes. commit c85ae00 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Thu Sep 8 23:57:45 2022 -0400 fix bug which caused seed to get "stuck" on previous image even when UI specified -1 commit 1b5aae3 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Thu Sep 8 22:36:47 2022 -0400 add icon to dream web server commit 6abf739 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Thu Sep 8 22:25:09 2022 -0400 add favicon to web server commit db825b8 Merge: 33874ba afee7f9 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Thu Sep 8 22:17:37 2022 -0400 Merge branch 'deNULL-development' into development commit 33874ba Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Thu Sep 8 22:16:29 2022 -0400 Squashed commit of the following: commit afee7f9 Merge: 6531446 171f8db Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Thu Sep 8 22:14:32 2022 -0400 Merge branch 'development' of github.com:deNULL/stable-diffusion into deNULL-development commit 171f8db Author: Denis Olshin <me@denull.ru> Date: Thu Sep 8 03:15:20 2022 +0300 saving full prompt to metadata when using web ui commit d7e67b6 Author: Denis Olshin <me@denull.ru> Date: Thu Sep 8 01:51:47 2022 +0300 better logic for clicking to make variations commit afee7f9 Merge: 6531446 171f8db Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Thu Sep 8 22:14:32 2022 -0400 Merge branch 'development' of github.com:deNULL/stable-diffusion into deNULL-development commit 6531446 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Thu Sep 8 20:41:37 2022 -0400 work around unexplained crash when timesteps=1000 (invoke-ai#440) * work around unexplained crash when timesteps=1000 * this fix seems to work commit c33a84c Author: blessedcoolant <54517381+blessedcoolant@users.noreply.github.com> Date: Fri Sep 9 12:39:51 2022 +1200 Add New Logo (invoke-ai#454) * Add instructions on how to install alongside pyenv (invoke-ai#393) Like probably many others, I have a lot of different virtualenvs, one for each project. Most of them are handled by `pyenv`. After installing according to these instructions I had issues with ´pyenv`and `miniconda` fighting over the $PATH of my system. But then I stumbled upon this nice solution on SO: https://stackoverflow.com/a/73139031 , upon which I have based my suggested changes. It runs perfectly on my M1 setup, with the anaconda setup as a virtual environment handled by pyenv. Feel free to incorporate these instructions as you see fit. Thanks a million for all your hard work. * Disabled debug output (invoke-ai#436) Co-authored-by: Henry van Megen <hvanmegen@gmail.com> * Add New Logo Co-authored-by: Håvard Gulldahl <havard@lurtgjort.no> Co-authored-by: Henry van Megen <h.vanmegen@gmail.com> Co-authored-by: Henry van Megen <hvanmegen@gmail.com> Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com> commit f8a5408 Author: Gérald LONLAS <gerald@lonlas.com> Date: Fri Sep 9 01:45:54 2022 +0800 Add 3x Upscale option on the Web UI (invoke-ai#442) commit 244239e Author: James Reynolds <magnusviri@users.noreply.github.com> Date: Thu Sep 8 05:36:33 2022 -0600 macOS CI workflow, dream.py exits with an error, but the workflow com… (invoke-ai#396) * macOS CI workflow, dream.py exits with an error, but the workflow completes. * Files for testing Co-authored-by: James Reynolds <magnsuviri@me.com> Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com> commit 711d49e Author: James Reynolds <magnusviri@users.noreply.github.com> Date: Thu Sep 8 05:35:08 2022 -0600 Cache model workflow (invoke-ai#394) * Add workflow that caches the model, step 1 for CI * Change name of workflow job Co-authored-by: James Reynolds <magnsuviri@me.com> Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com> commit 7996a30 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Thu Sep 8 07:34:03 2022 -0400 add auto-creation of mask for inpainting (invoke-ai#438) * now use a single init image for both image and mask * turn on debugging for now to write out mask and image * add back -M option as a fallback commit a69ca31 Author: elliotsayes <elliotsayes@gmail.com> Date: Thu Sep 8 15:30:06 2022 +1200 .gitignore WebUI temp files (invoke-ai#430) * Add instructions on how to install alongside pyenv (invoke-ai#393) Like probably many others, I have a lot of different virtualenvs, one for each project. Most of them are handled by `pyenv`. After installing according to these instructions I had issues with ´pyenv`and `miniconda` fighting over the $PATH of my system. But then I stumbled upon this nice solution on SO: https://stackoverflow.com/a/73139031 , upon which I have based my suggested changes. It runs perfectly on my M1 setup, with the anaconda setup as a virtual environment handled by pyenv. Feel free to incorporate these instructions as you see fit. Thanks a million for all your hard work. * .gitignore WebUI temp files Co-authored-by: Håvard Gulldahl <havard@lurtgjort.no> commit 5c6b612 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Wed Sep 7 22:50:55 2022 -0400 fix bug that caused same seed to be redisplayed repeatedly commit 56f155c Author: Johan Roxendal <johan@roxendal.com> Date: Thu Sep 8 04:50:06 2022 +0200 added support for parsing run log and displaying images in the frontend init state (invoke-ai#410) Co-authored-by: Johan Roxendal <johan.roxendal@litteraturbanken.se> Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com> commit 4168774 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Wed Sep 7 20:24:35 2022 -0400 added missing initialization of latent_noise to None commit 171f8db Author: Denis Olshin <me@denull.ru> Date: Thu Sep 8 03:15:20 2022 +0300 saving full prompt to metadata when using web ui commit d7e67b6 Author: Denis Olshin <me@denull.ru> Date: Thu Sep 8 01:51:47 2022 +0300 better logic for clicking to make variations commit d1d044a Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Wed Sep 7 17:56:59 2022 -0400 actual image seed now written into web log rather than -1 (invoke-ai#428) commit edada04 Author: Arturo Mendivil <60411196+artmen1516@users.noreply.github.com> Date: Wed Sep 7 10:42:26 2022 -0700 Improve notebook and add requirements file (invoke-ai#422) commit 29ab3c2 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Wed Sep 7 13:28:11 2022 -0400 disable neonpixel optimizations on M1 hardware (invoke-ai#414) * disable neonpixel optimizations on M1 hardware * fix typo that was causing random noise images on m1 commit 7670ecc Author: cody <cnmizell@gmail.com> Date: Wed Sep 7 12:24:41 2022 -0500 add more keyboard support on the web server (invoke-ai#391) add ability to submit prompts with the "enter" key add ability to cancel generations with the "escape" key commit dd2aeda Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Wed Sep 7 13:23:53 2022 -0400 report VRAM usage stats during initial model loading (invoke-ai#419) commit f628477 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Tue Sep 6 17:12:39 2022 -0400 Squashed commit of the following: commit 7d1344282d942a33dcecda4d5144fc154ec82915 Merge: caf4ea3 ebeb556 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Mon Sep 5 10:07:27 2022 -0400 Merge branch 'development' of github.com:WebDev9000/stable-diffusion into WebDev9000-development commit ebeb556 Author: Web Dev 9000 <rirath@gmail.com> Date: Sun Sep 4 18:05:15 2022 -0700 Fixed unintentionally removed lines commit ff2c4b9 Author: Web Dev 9000 <rirath@gmail.com> Date: Sun Sep 4 17:50:13 2022 -0700 Add ability to recreate variations via image click commit c012929 Author: Web Dev 9000 <rirath@gmail.com> Date: Sun Sep 4 14:35:33 2022 -0700 Add files via upload commit 02a6018 Author: Web Dev 9000 <rirath@gmail.com> Date: Sun Sep 4 14:35:07 2022 -0700 Add files via upload commit eef7889 Author: Olivier Louvignes <olivier@mg-crea.com> Date: Tue Sep 6 12:41:08 2022 +0200 feat(txt2img): allow from_file to work with len(lines) < batch_size (invoke-ai#349) commit 720e5cd Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Mon Sep 5 20:40:10 2022 -0400 Refactoring simplet2i (invoke-ai#387) * start refactoring -not yet functional * first phase of refactor done - not sure weighted prompts working * Second phase of refactoring. Everything mostly working. * The refactoring has moved all the hard-core inference work into ldm.dream.generator.*, where there are submodules for txt2img and img2img. inpaint will go in there as well. * Some additional refactoring will be done soon, but relatively minor work. * fix -save_orig flag to actually work * add @neonsecret attention.py memory optimization * remove unneeded imports * move token logging into conditioning.py * add placeholder version of inpaint; porting in progress * fix crash in img2img * inpainting working; not tested on variations * fix crashes in img2img * ported attention.py memory optimization invoke-ai#117 from basujindal branch * added @torch_no_grad() decorators to img2img, txt2img, inpaint closures * Final commit prior to PR against development * fixup crash when generating intermediate images in web UI * rename ldm.simplet2i to ldm.generate * add backward-compatibility simplet2i shell with deprecation warning * add back in mps exception, addresses @Vargol comment in #354 * replaced Conditioning class with exported functions * fix wrong type of with_variations attribute during intialization * changed "image_iterator()" to "get_make_image()" * raise NotImplementedError for calling get_make_image() in parent class * Update ldm/generate.py better error message Co-authored-by: Kevin Gibbons <bakkot@gmail.com> * minor stylistic fixes and assertion checks from code review * moved get_noise() method into img2img class * break get_noise() into two methods, one for txt2img and the other for img2img * inpainting works on non-square images now * make get_noise() an abstract method in base class * much improved inpainting Co-authored-by: Kevin Gibbons <bakkot@gmail.com> commit 1ad2a8e Author: thealanle <35761977+thealanle@users.noreply.github.com> Date: Mon Sep 5 17:35:04 2022 -0700 Fix --outdir function for web (invoke-ai#373) * Fix --outdir function for web * Removed unnecessary hardcoded path commit 52d8bb2 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Mon Sep 5 10:31:59 2022 -0400 Squashed commit of the following: commit 0cd48e932f1326e000c46f4140f98697eb9bdc79 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Mon Sep 5 10:27:43 2022 -0400 resolve conflicts with development commit d7bc8c1 Author: Scott McMillin <scott@scottmcmillin.com> Date: Sun Sep 4 18:52:09 2022 -0500 Add title attribute back to img tag commit 5397c89 Author: Scott McMillin <scott@scottmcmillin.com> Date: Sun Sep 4 13:49:46 2022 -0500 Remove temp code commit 1da080b Author: Scott McMillin <scott@scottmcmillin.com> Date: Sun Sep 4 13:33:56 2022 -0500 Cleaned up HTML; small style changes; image click opens image; add seed to figcaption beneath image commit caf4ea3 Author: Adam Rice <adam@askadam.io> Date: Mon Sep 5 10:05:39 2022 -0400 Add a 'Remove Image' button to clear the file upload field (invoke-ai#382) * added "remove image" button * styled a new "remove image" button * Update index.js commit 95c088b Author: Kevin Gibbons <bakkot@gmail.com> Date: Sun Sep 4 19:04:14 2022 -0700 Revert "Add CORS headers to dream server to ease integration with third-party web interfaces" (invoke-ai#371) This reverts commit 91e826e. commit a20113d Author: Kevin Gibbons <bakkot@gmail.com> Date: Sun Sep 4 18:59:12 2022 -0700 put no_grad decorator on make_image closures (invoke-ai#375) commit 0f93dad Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 4 21:39:15 2022 -0400 fix several dangling references to --gfpgan option, which no longer exists commit f4004f6 Author: tildebyte <337875+tildebyte@users.noreply.github.com> Date: Sun Sep 4 19:43:04 2022 -0400 TOIL(requirements): Split requirements to per-platform (invoke-ai#355) * toil(reqs): split requirements to per-platform Signed-off-by: Ben Alkov <ben.alkov@gmail.com> * toil(reqs): fix for Win and Lin... ...allow pip to resolve latest torch, numpy Signed-off-by: Ben Alkov <ben.alkov@gmail.com> * toil(install): update reqs in Win install notebook Signed-off-by: Ben Alkov <ben.alkov@gmail.com> Signed-off-by: Ben Alkov <ben.alkov@gmail.com> commit 4406fd1 Merge: 5116c81 fd7a72e Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 4 08:23:53 2022 -0400 Merge branch 'SebastianAigner-main' into development Add support for full CORS headers for dream server. commit fd7a72e Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 4 08:23:11 2022 -0400 remove debugging message commit 3a2be62 Merge: 91e826e 5116c81 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 4 08:15:51 2022 -0400 Merge branch 'development' into main commit 5116c81 Author: Justin Wong <1584142+wongjustin99@users.noreply.github.com> Date: Sun Sep 4 07:17:58 2022 -0400 fix save_original flag saving to the same filename (invoke-ai#360) * Update README.md with new Anaconda install steps (invoke-ai#347) pip3 version did not work for me and this is the recommended way to install Anaconda now it seems * fix save_original flag saving to the same filename Before this, the `--save_orig` flag was not working. The upscaled/GFPGAN would overwrite the original output image. Co-authored-by: greentext2 <112735219+greentext2@users.noreply.github.com> commit 91e826e Author: Sebastian Aigner <SebastianAigner@users.noreply.github.com> Date: Sun Sep 4 10:22:54 2022 +0200 Add CORS headers to dream server to ease integration with third-party web interfaces commit 6266d9e Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 3 15:45:20 2022 -0400 remove stray debugging message commit 138956e Author: greentext2 <112735219+greentext2@users.noreply.github.com> Date: Sat Sep 3 13:38:57 2022 -0500 Update README.md with new Anaconda install steps (invoke-ai#347) pip3 version did not work for me and this is the recommended way to install Anaconda now it seems commit 60be735 Author: Cora Johnson-Roberson <cora.johnson.roberson@gmail.com> Date: Sat Sep 3 14:28:34 2022 -0400 Switch to regular pytorch channel and restore Python 3.10 for Macs. (invoke-ai#301) * Switch to regular pytorch channel and restore Python 3.10 for Macs. Although pytorch-nightly should in theory be faster, it is currently causing increased memory usage and slower iterations: invoke-ai#283 (comment) This changes the environment-mac.yaml file back to the regular pytorch channel and moves the `transformers` dep into pip for now (since it cannot be satisfied until tokenizers>=0.11 is built for Python 3.10). * Specify versions for Pip packages as well. commit d0d95d3 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 3 14:10:31 2022 -0400 make initimg appear in web log commit b90a215 Merge: 1eee811 6270e31 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 3 13:47:15 2022 -0400 Merge branch 'prixt-seamless' into development commit 6270e31 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 3 13:46:29 2022 -0400 add credit to prixt for seamless circular tiling commit a01b7bd Merge: 1eee811 9d88abe Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 3 13:43:04 2022 -0400 add web interface for seamless option commit 1eee811 Merge: 64eca42 fb857f0 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 3 12:33:39 2022 -0400 Merge branch 'development' of github.com:lstein/stable-diffusion into development commit 64eca42 Merge: 9130ad7 21a1f68 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 3 12:33:05 2022 -0400 Merge branch 'main' into development * brings in small documentation fixes that were added directly to main during release tweaking. commit fb857f0 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 3 12:07:07 2022 -0400 fix typo in docs commit 9d88abe Author: prixt <paraxite@naver.com> Date: Sat Sep 3 22:42:16 2022 +0900 fixed typo commit a61e49b Author: prixt <paraxite@naver.com> Date: Sat Sep 3 22:39:35 2022 +0900 * Removed unnecessary code * Added description about --seamless commit 02bee4f Author: prixt <paraxite@naver.com> Date: Sat Sep 3 16:08:03 2022 +0900 added --seamless tag logging to normalize_prompt commit d922b53 Author: prixt <paraxite@naver.com> Date: Sat Sep 3 15:13:31 2022 +0900 added seamless tiling mode and commands

Any-Winter-4079 changed the title ~~MPS support for doggettx-optimizations branch https://github.com/lstein/stable-diffusion/issues/364~~ MPS support for doggettx-optimizations Sep 7, 2022

lstein mentioned this issue Sep 8, 2022

Stable Diffusion PR optimizes VRAM, generate 576x1280 images with 6 GB VRAM #364

Closed

Any-Winter-4079 closed this as completed Sep 13, 2022

codedealer mentioned this issue Sep 16, 2022

feat: optimizations to attention.py Sygil-Dev/sygil-webui#1177

Merged

Vargol mentioned this issue Sep 16, 2022

Remove model.AttnBlock and replace with attention.SpatialSelfAttention. #519

Closed

Any-Winter-4079 mentioned this issue Oct 11, 2022

1 leaked semaphore objects to clean up at shutdown #1016

Closed

austinbrown34 pushed a commit to cognidesign/InvokeAI that referenced this issue Dec 30, 2022

Update attention.py

9b041c1

Performance improvements to generate larger images in M1 invoke-ai#431 Update attention.py Added dtype=r1.dtype to softmax

MPS support for doggettx-optimizations #431

MPS support for doggettx-optimizations #431

Comments

Any-Winter-4079 commented Sep 7, 2022 • edited Loading

lstein commented Sep 8, 2022

lstein commented Sep 8, 2022

Vargol commented Sep 8, 2022 • edited Loading

Any-Winter-4079 commented Sep 8, 2022

Any-Winter-4079 commented Sep 8, 2022

lstein commented Sep 8, 2022

Any-Winter-4079 commented Sep 8, 2022 • edited Loading

Any-Winter-4079 commented Sep 8, 2022

Any-Winter-4079 commented Sep 8, 2022

Any-Winter-4079 commented Sep 8, 2022 • edited Loading

Any-Winter-4079 commented Sep 9, 2022 • edited Loading

heurihermilab commented Sep 9, 2022

Vargol commented Sep 9, 2022 • edited Loading

Any-Winter-4079 commented Sep 9, 2022

Any-Winter-4079 commented Sep 9, 2022 • edited Loading

Vargol commented Sep 9, 2022 • edited Loading

lstein commented Sep 9, 2022

Any-Winter-4079 commented Sep 9, 2022 • edited Loading

Any-Winter-4079 commented Sep 9, 2022 • edited Loading

Vargol commented Sep 9, 2022 • edited Loading

Any-Winter-4079 commented Sep 9, 2022

Any-Winter-4079 commented Sep 9, 2022 • edited Loading

Any-Winter-4079 commented Sep 9, 2022 • edited Loading

Any-Winter-4079 commented Sep 9, 2022

i3oc9i commented Sep 9, 2022 • edited Loading

ryudrigo commented Sep 9, 2022

Any-Winter-4079 commented Sep 9, 2022 • edited Loading

Doggettx commented Sep 9, 2022 • edited Loading

netsvetaev commented Sep 13, 2022

Any-Winter-4079 commented Sep 13, 2022 • edited Loading

Any-Winter-4079 commented Sep 13, 2022 • edited Loading

netsvetaev commented Sep 13, 2022 • edited Loading

Any-Winter-4079 commented Sep 13, 2022

Any-Winter-4079 commented Sep 13, 2022

netsvetaev commented Sep 13, 2022 • edited Loading

Any-Winter-4079 commented Sep 13, 2022

Any-Winter-4079 commented Sep 13, 2022 • edited Loading

netsvetaev commented Sep 13, 2022

Any-Winter-4079 commented Sep 13, 2022 • edited Loading

Any-Winter-4079 commented Sep 13, 2022 • edited Loading

netsvetaev commented Sep 13, 2022

Any-Winter-4079 commented Sep 13, 2022

i3oc9i commented Sep 13, 2022 • edited Loading

Any-Winter-4079 commented Sep 13, 2022 • edited Loading

netsvetaev commented Sep 13, 2022 • edited Loading

netsvetaev commented Sep 14, 2022 • edited Loading

lstein commented Oct 11, 2022 via email

Any-Winter-4079 commented Oct 11, 2022

Any-Winter-4079 commented Sep 7, 2022 •

edited

Loading

Vargol commented Sep 8, 2022 •

edited

Loading

Any-Winter-4079 commented Sep 8, 2022 •

edited

Loading

Any-Winter-4079 commented Sep 8, 2022 •

edited

Loading

Any-Winter-4079 commented Sep 9, 2022 •

edited

Loading

Vargol commented Sep 9, 2022 •

edited

Loading

Any-Winter-4079 commented Sep 9, 2022 •

edited

Loading

Vargol commented Sep 9, 2022 •

edited

Loading

Any-Winter-4079 commented Sep 9, 2022 •

edited

Loading

Any-Winter-4079 commented Sep 9, 2022 •

edited

Loading

Vargol commented Sep 9, 2022 •

edited

Loading

Any-Winter-4079 commented Sep 9, 2022 •

edited

Loading

Any-Winter-4079 commented Sep 9, 2022 •

edited

Loading

i3oc9i commented Sep 9, 2022 •

edited

Loading

Any-Winter-4079 commented Sep 9, 2022 •

edited

Loading

Doggettx commented Sep 9, 2022 •

edited

Loading

Any-Winter-4079 commented Sep 13, 2022 •

edited

Loading

Any-Winter-4079 commented Sep 13, 2022 •

edited

Loading

netsvetaev commented Sep 13, 2022 •

edited

Loading

netsvetaev commented Sep 13, 2022 •

edited

Loading

Any-Winter-4079 commented Sep 13, 2022 •

edited

Loading

Any-Winter-4079 commented Sep 13, 2022 •

edited

Loading

Any-Winter-4079 commented Sep 13, 2022 •

edited

Loading

i3oc9i commented Sep 13, 2022 •

edited

Loading

Any-Winter-4079 commented Sep 13, 2022 •

edited

Loading

netsvetaev commented Sep 13, 2022 •

edited

Loading

netsvetaev commented Sep 14, 2022 •

edited

Loading