Skip to content

Commit

Permalink
Merge pull request #20 from v0xie/dev
Browse files Browse the repository at this point in the history
New features - CFG Scheduler, CFG Interval, PAG Start/End Step, Fix T2I-0
  • Loading branch information
v0xie authored Apr 30, 2024
2 parents e23cbec + f202a57 commit 66899a7
Show file tree
Hide file tree
Showing 8 changed files with 760 additions and 99 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__pycache__/
72 changes: 63 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,12 @@
This extension implements multiple novel algorithms that enhance image quality, prompt following, and more.

## COMPATIBILITY NOTICES:
#### Currently incompatible with stable-diffusion-webui-forge https://github.com/lllyasviel/stable-diffusion-webui-forge
#### Currently incompatible with stable-diffusion-webui-forge
Use this extension with Forge: https://github.com/pamparamm/sd-perturbed-attention

May conflict with extensions that modify the CFGDenoiser
* Reported incompatible with Adetailer: https://github.com/v0xie/sd-webui-incantations/issues/21

* May conflict with extensions that modify the CFGDenoiser

---
## Perturbed Attention Guidance
Expand All @@ -14,6 +16,8 @@ An alternative/complementary method to CFG (Classifier-Free Guidance) that incre

#### Controls
* **PAG Scale**: Controls the intensity of effect of PAG on the generated image.
* **PAG Start Step**: Step to start using PAG.
* **PAG End Step**: Step to stop using PAG.

#### Results
Prompt: "a puppy and a kitten on the moon"
Expand All @@ -26,8 +30,40 @@ Prompt: "a puppy and a kitten on the moon"
#### Also check out the paper authors' official project page:
- https://ku-cvlab.github.io/Perturbed-Attention-Guidance/

---
## CFG Interval / CFG Scheduler
https://arxiv.org/abs/2404.07724 and https://arxiv.org/abs/2404.13040

Constrains the usage of CFG to within a specified noise interval. Allows usage of high CFG levels (>15) without drastic alteration of composition.

Adds controllable CFG schedules. For Clamp-Linear, use (c=2.0) for SD1.5 and (c=4.0) for SDXL. For PCS, use (s=1.0) for SD1.5 and (s=0.1) for SDXL.

#### Controls
* **Enable CFG Interval**: Enables the CFG Interval (PAG must be active! PAG scale can be set to 0.)
* **CFG Noise Interval Start**: Minimum noise level to use CFG with. SDXL recommended value: 0.28.
* **CFG Noise Interval End**: Maximum noise level to use CFG with. SDXL recommended value: >5.42.
* **CFG Scheduler**: Sets the schedule type to apply CFG.
- Constant: The default CFG method (constant value over all timesteps)
- Clamp-Linear: Clamps the CFG to the maximum of (c, Linear)
- Clamp-Cosine: Clamps the CFG to the maximum of (c, Cosine)
- PCS: Powered Cosine, lower values are better

#### Results
##### CFG Interval
Prompt: "A pointillist painting of a raccoon looking at the sea."
- SD XL
![image](./images/xyz_grid-3192-1-A%20pointillist%20painting%20of%20a%20raccoon%20looking%20at%20the%20sea.jpg)

##### CFG Schedule
Prompt: "An epic lithograph of a handsome salaryman carefully pouring coffee from a cup into an overflowing carafe, 4K, directed by Wong Kar Wai"
- SD XL
![image](./images/xyz_grid-3380-1-An%20epic%20lithograph%20of%20a%20handsome%20salaryman%20carefully%20pouring%20coffee%20from%20a%20cup%20into%20an%20overflowing%20carafe,%204K,%20directed%20by%20Wong.jpg)
---
## Multi-Concept T2I-Zero / Attention Regulation

#### Update: 29-04-2024
The algorithms previously implemented for T2I-Zero were incorrect. They should be working much more stably now. See the previous result in the 'images' folder for an informal comparison between old and new.

Implements Corrections by Similarities and Cross-Token Non-Maximum Suppression from https://arxiv.org/abs/2310.07419

Also implements some methods from "Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models" https://arxiv.org/abs/2403.06381
Expand All @@ -52,7 +88,7 @@ Can error out with image dimensions which are not a multiple of 64
#### Results:
Prompt: "A photo of a lion and a grizzly bear and a tiger in the woods"
SD XL
![image](./images/xyz_grid-2660-1590472902-A%20photo%20of%20a%20lion%20and%20a%20grizzly%20bear%20and%20a%20tiger%20in%20the%20woods.jpg)
![image](./images/xyz_grid-3348-1590472902-A%20photo%20of%20a%20lion%20and%20a%20grizzly%20bear%20and%20a%20tiger%20in%20the%20woods.jpg)

#### Also check out the paper authors' official project pages:
- https://multi-concept-t2i-zero.github.io/
Expand Down Expand Up @@ -137,12 +173,30 @@ SD XL
}

@misc{zhang2024enhancing,
title={Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models},
author={Yang Zhang and Teoh Tze Tzun and Lim Wei Hern and Tiviatis Sim and Kenji Kawaguchi},
year={2024},
eprint={2403.06381},
archivePrefix={arXiv},
primaryClass={cs.CV}
title={Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models},
author={Yang Zhang and Teoh Tze Tzun and Lim Wei Hern and Tiviatis Sim and Kenji Kawaguchi},
year={2024},
eprint={2403.06381},
archivePrefix={arXiv},
primaryClass={cs.CV}
}

@misc{kynkäänniemi2024applying,
title={Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models},
author={Tuomas Kynkäänniemi and Miika Aittala and Tero Karras and Samuli Laine and Timo Aila and Jaakko Lehtinen},
year={2024},
eprint={2404.07724},
archivePrefix={arXiv},
primaryClass={cs.CV}
}

@misc{wang2024analysis,
title={Analysis of Classifier-Free Guidance Weight Schedulers},
author={Xi Wang and Nicolas Dufour and Nefeli Andreou and Marie-Paule Cani and Victoria Fernandez Abrevaya and David Picard and Vicky Kalogeiton},
year={2024},
eprint={2404.13040},
archivePrefix={arXiv},
primaryClass={cs.CV}
}


Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
53 changes: 53 additions & 0 deletions scripts/incant_utils/plot_tools.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
import torch
import matplotlib.pyplot as plt

def plot_attention_map(attention_map: torch.Tensor, title, x_label="X", y_label="Y", save_path=None, plot_type="default"):
""" Plots an attention map using matplotlib.pyplot
Arguments:
attention_map: Tensor - The attention map to plot. Shape: (H, W)
title: str - The title of the plot
x_label: str (optional) - The x-axis label
y_label: str (optional) - The y-axis label
save_path: str (optional) - The path to save the plot
plot_type: str (optional) - The type of plot to create. Default is 'default'.
Other option is 'num' which will plot the attention map with arbitrary colors.
Returns:
None
"""

# Convert attention map to numpy array
attention_map = attention_map.detach().cpu().numpy()

# Create figure and axis
fig, ax = plt.subplots()

# Plot the attention map
if plot_type=='default':
ax.imshow(attention_map, cmap='viridis', interpolation='nearest')
elif plot_type == 'num':
ax.imshow(attention_map, cmap='tab20c', interpolation='nearest')
#for x in range(attention_map.shape[0]):
# for y in range(attention_map.shape[1]):
# fig.text(x, y, f"{attention_map[x, y]:.2f}", ha="center", va="center")
elements = list(set(attention_map.flatten()))
labels = [f"{x}" for x in elements]
fig.legend(elements, labels, loc='lower left')

# Set title and labels
ax.set_title(title)
ax.set_xlabel(x_label)
ax.set_ylabel(y_label)

# Save the plot if save_path is provided
if save_path:
plt.savefig(save_path)

plt.close(fig)

# Show the plot
# plt.show()

# Convert the plot to PIL image
#image = Image.fromarray(np.uint8(fig.canvas.tostring_rgb()))

#return image
Loading

0 comments on commit 66899a7

Please sign in to comment.