Merge pull request #20 from v0xie/dev

New features - CFG Scheduler, CFG Interval, PAG Start/End Step, Fix T2I-0
v0xie · Apr 30, 2024 · 66899a7 · 66899a7
2 parents e23cbec + f202a57
commit 66899a7
Show file tree

Hide file tree

Showing 8 changed files with 760 additions and 99 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1 @@
+__pycache__/
diff --git a/README.md b/README.md
@@ -2,10 +2,12 @@
 This extension implements multiple novel algorithms that enhance image quality, prompt following, and more.
 
 ## COMPATIBILITY NOTICES:
-####  Currently incompatible with stable-diffusion-webui-forge https://github.com/lllyasviel/stable-diffusion-webui-forge
+####  Currently incompatible with stable-diffusion-webui-forge 
 Use this extension with Forge: https://github.com/pamparamm/sd-perturbed-attention
 
-May conflict with extensions that modify the CFGDenoiser
+* Reported incompatible with Adetailer: https://github.com/v0xie/sd-webui-incantations/issues/21
+
+* May conflict with extensions that modify the CFGDenoiser
 
 ---
 ## Perturbed Attention Guidance
@@ -14,6 +16,8 @@ An alternative/complementary method to CFG (Classifier-Free Guidance) that incre
 
 #### Controls
 * **PAG Scale**: Controls the intensity of effect of PAG on the generated image.  
+* **PAG Start Step**: Step to start using PAG.
+* **PAG End Step**: Step to stop using PAG. 
 
 #### Results
 Prompt: "a puppy and a kitten on the moon"
@@ -26,8 +30,40 @@ Prompt: "a puppy and a kitten on the moon"
 #### Also check out the paper authors' official project page:
 - https://ku-cvlab.github.io/Perturbed-Attention-Guidance/
 
+---
+## CFG Interval / CFG Scheduler
+https://arxiv.org/abs/2404.07724 and https://arxiv.org/abs/2404.13040 
+
+Constrains the usage of CFG to within a specified noise interval. Allows usage of high CFG levels (>15) without drastic alteration of composition.  
+
+Adds controllable CFG schedules. For Clamp-Linear, use (c=2.0) for SD1.5 and (c=4.0) for SDXL. For PCS, use (s=1.0) for SD1.5 and (s=0.1) for SDXL.
+
+#### Controls
+* **Enable CFG Interval**: Enables the CFG Interval (PAG must be active! PAG scale can be set to 0.)
+* **CFG Noise Interval Start**: Minimum noise level to use CFG with. SDXL recommended value: 0.28.
+* **CFG Noise Interval End**: Maximum noise level to use CFG with. SDXL recommended value: >5.42.
+* **CFG Scheduler**: Sets the schedule type to apply CFG.
+    - Constant: The default CFG method (constant value over all timesteps)
+    - Clamp-Linear: Clamps the CFG to the maximum of (c, Linear)
+    - Clamp-Cosine: Clamps the CFG to the maximum of (c, Cosine)
+    - PCS: Powered Cosine, lower values are better
+
+#### Results
+##### CFG Interval
+Prompt: "A pointillist painting of a raccoon looking at the sea."
+- SD XL  
+![image](./images/xyz_grid-3192-1-A%20pointillist%20painting%20of%20a%20raccoon%20looking%20at%20the%20sea.jpg)
+
+##### CFG Schedule
+Prompt: "An epic lithograph of a handsome salaryman carefully pouring coffee from a cup into an overflowing carafe, 4K, directed by Wong Kar Wai"
+- SD XL  
+![image](./images/xyz_grid-3380-1-An%20epic%20lithograph%20of%20a%20handsome%20salaryman%20carefully%20pouring%20coffee%20from%20a%20cup%20into%20an%20overflowing%20carafe,%204K,%20directed%20by%20Wong.jpg)
 ---
 ## Multi-Concept T2I-Zero / Attention Regulation
+
+#### Update: 29-04-2024
+The algorithms previously implemented for T2I-Zero were incorrect. They should be working much more stably now. See the previous result in the 'images' folder for an informal comparison between old and new.
+
 Implements Corrections by Similarities and Cross-Token Non-Maximum Suppression from https://arxiv.org/abs/2310.07419
 
 Also implements some methods from "Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models" https://arxiv.org/abs/2403.06381
@@ -52,7 +88,7 @@ Can error out with image dimensions which are not a multiple of 64
 #### Results:
 Prompt: "A photo of a lion and a grizzly bear and a tiger in the woods"  
 SD XL  
-![image](./images/xyz_grid-2660-1590472902-A%20photo%20of%20a%20lion%20and%20a%20grizzly%20bear%20and%20a%20tiger%20in%20the%20woods.jpg)  
+![image](./images/xyz_grid-3348-1590472902-A%20photo%20of%20a%20lion%20and%20a%20grizzly%20bear%20and%20a%20tiger%20in%20the%20woods.jpg)
 
 #### Also check out the paper authors' official project pages:
 - https://multi-concept-t2i-zero.github.io/ 
@@ -137,12 +173,30 @@ SD XL
       }
 
       @misc{zhang2024enhancing,
-      title={Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models},
-      author={Yang Zhang and Teoh Tze Tzun and Lim Wei Hern and Tiviatis Sim and Kenji Kawaguchi},
-      year={2024},
-      eprint={2403.06381},
-      archivePrefix={arXiv},
-      primaryClass={cs.CV}
+       title={Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models},
+       author={Yang Zhang and Teoh Tze Tzun and Lim Wei Hern and Tiviatis Sim and Kenji Kawaguchi},
+       year={2024},
+       eprint={2403.06381},
+       archivePrefix={arXiv},
+       primaryClass={cs.CV}
+      }
+
+      @misc{kynkäänniemi2024applying,
+       title={Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models}, 
+       author={Tuomas Kynkäänniemi and Miika Aittala and Tero Karras and Samuli Laine and Timo Aila and Jaakko Lehtinen},
+       year={2024},
+       eprint={2404.07724},
+       archivePrefix={arXiv},
+       primaryClass={cs.CV}
+      }
+
+      @misc{wang2024analysis,
+       title={Analysis of Classifier-Free Guidance Weight Schedulers}, 
+       author={Xi Wang and Nicolas Dufour and Nefeli Andreou and Marie-Paule Cani and Victoria Fernandez Abrevaya and David Picard and Vicky Kalogeiton},
+       year={2024},
+       eprint={2404.13040},
+       archivePrefix={arXiv},
+       primaryClass={cs.CV}
       }
 
 

diff --git a/images/xyz_grid-3192-1-A pointillist painting of a raccoon looking at the sea.jpg b/images/xyz_grid-3192-1-A pointillist painting of a raccoon looking at the sea.jpg
diff --git a/...48-1590472902-A photo of a lion and a grizzly bear and a tiger in the woods.jpg b/...48-1590472902-A photo of a lion and a grizzly bear and a tiger in the woods.jpg
diff --git a/... pouring coffee from a cup into an overflowing carafe, 4K, directed by Wong.jpg b/... pouring coffee from a cup into an overflowing carafe, 4K, directed by Wong.jpg
diff --git a/scripts/incant_utils/plot_tools.py b/scripts/incant_utils/plot_tools.py
@@ -0,0 +1,53 @@
+import torch
+import matplotlib.pyplot as plt
+
+def plot_attention_map(attention_map: torch.Tensor, title, x_label="X", y_label="Y", save_path=None, plot_type="default"):
+        """ Plots an attention map using matplotlib.pyplot 
+                Arguments:
+                        attention_map: Tensor - The attention map to plot. Shape: (H, W)
+                        title: str - The title of the plot
+                        x_label: str (optional) - The x-axis label
+                        y_label: str (optional) - The y-axis label
+                        save_path: str (optional) - The path to save the plot
+                        plot_type: str (optional) - The type of plot to create. Default is 'default'. 
+                            Other option is 'num' which will plot the attention map with arbitrary colors.
+                Returns:
+                        None
+        """
+
+        # Convert attention map to numpy array
+        attention_map = attention_map.detach().cpu().numpy()
+
+        # Create figure and axis
+        fig, ax = plt.subplots()
+
+        # Plot the attention map
+        if plot_type=='default':
+                ax.imshow(attention_map, cmap='viridis', interpolation='nearest')
+        elif plot_type == 'num':
+                ax.imshow(attention_map, cmap='tab20c', interpolation='nearest')
+                #for x in range(attention_map.shape[0]):
+                #        for y in range(attention_map.shape[1]):
+                #                fig.text(x, y, f"{attention_map[x, y]:.2f}", ha="center", va="center")
+                elements = list(set(attention_map.flatten()))
+                labels = [f"{x}" for x in elements]
+                fig.legend(elements, labels, loc='lower left')
+
+        # Set title and labels
+        ax.set_title(title)
+        ax.set_xlabel(x_label)
+        ax.set_ylabel(y_label)
+
+        # Save the plot if save_path is provided
+        if save_path:
+                plt.savefig(save_path)
+
+        plt.close(fig)
+
+        # Show the plot
+        # plt.show()
+
+        # Convert the plot to PIL image
+        #image = Image.fromarray(np.uint8(fig.canvas.tostring_rgb()))
+
+        #return image