From 28a9967beca18ac24fb4ea7ce78699f04cf2cc7a Mon Sep 17 00:00:00 2001 From: h1t Date: Fri, 8 Mar 2024 16:53:43 +0000 Subject: [PATCH 1/5] add tcd intro --- docs/source/en/_toctree.yml | 2 + .../inference_with_tcd_lora.md | 433 ++++++++++++++++++ 2 files changed, 435 insertions(+) create mode 100644 docs/source/en/using-diffusers/inference_with_tcd_lora.md diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index ba94de59219c..2829fb9c05b9 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -104,6 +104,8 @@ title: Latent Consistency Model-LoRA - local: using-diffusers/inference_with_lcm title: Latent Consistency Model + - local: using-diffusers/inference_with_tcd_lora + title: Trajectory Consistency Distillation-LoRA - local: using-diffusers/svd title: Stable Video Diffusion title: Specific pipeline examples diff --git a/docs/source/en/using-diffusers/inference_with_tcd_lora.md b/docs/source/en/using-diffusers/inference_with_tcd_lora.md new file mode 100644 index 000000000000..53f0c0655881 --- /dev/null +++ b/docs/source/en/using-diffusers/inference_with_tcd_lora.md @@ -0,0 +1,433 @@ + + +[[open-in-colab]] + +# Performing inference with TCD-LoRA + +Trajecotroy Consistency Distillation (TCD) enables the model to generate higher quality, more detailed images with fewer steps. Additionally, TCD demonstrates superior performance even under conditions of high NFEs. + +From the [Official Project Page](https://mhh0318.github.io/tcd/), the major merit of TCD can be outlined as follows: + +> ***Better than Teacher:*** TCD maintains superior generative quality at both low NFEs and high NFEs, even exceeding the performance of DPM-Solver++(2S) with origin SDXL. It is worth noting that there is no additional discriminator or LPIPS supervision included during training. + +> ***Flexible NFEs:*** The NFEs for TCD sampling can be varied at will without adversely affecting the quality of the results. + +> ***Freely Change the Detailing:*** During inference, the level of detail in the image can be simply modified by adjusing one hyper-parameter gamma. This option does not require the introduction of any additional parameters. + +For more technical details of TCD, please refer to [the paper](https://arxiv.org/abs/2402.19159). + +Trajectory consistency distillation can directly place on top of a pre-trained diffusion model as a LoRA module. Such LoRA can be identified as a versatile acceleration module applicable to different fine-tuned models or LoRAs sharing the same base model without the need for additional training. + +TCD-LoRAs are available for [stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5), [stable-diffusion-2-1-base](https://huggingface.co/stabilityai/stable-diffusion-2-1-base), and [stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0). + +The corresponding checkpoints can be found at [TCD-SD15](https://huggingface.co/h1t/TCD-SD15-LoRA), [TCD-SD21-base](https://huggingface.co/h1t/TCD-SD21-base-LoRA) and [TCD-SDXL](https://huggingface.co/h1t/TCD-SDXL-LoRA), separately. + + +This guide shows how to perform inference with TCD-LoRAs for +- text-to-image +- inpainting +- community models +- style LoRA +- ControlNet +- IP-Adapter +- AnimateDiff + +TCD-LoRA can be considered an advanced method compared with [LCM-LoRA](https://latent-consistency-models.github.io/). The guide of TCD-LoRA workflow is: +- Load the task specific pipeline and model. +- Set the scheduler to [`TCDScheduler`]. +- Load the TCD-LoRA weights for the model. +- Set the `num_inference_steps` between [4, 50]. +- Set `eta` from [0, 1]. Larger `eta` in [`TCDScheduler`] will lead to blurrier images. +- Perform inference with the pipeline with the usual parameters. + +Let's look at how we can perform inference with TCD-LoRAs for different tasks. + +First, make sure you have [peft](https://github.com/huggingface/peft) installed, for better LoRA support. + +```bash +pip install -U peft +``` + +## Text-to-image + +You can use the [`StableDiffusionXLPipeline`] with the scheduler: [`TCDScheduler`] and then load the TCD-LoRA. Together with the TCD-LoRA and the TCDScheduler, the pipeline enables a fast inference workflow with high quality outputs. + +```python +import torch +from diffusers import StableDiffusionXLPipeline, TCDScheduler + +device = "cuda" +base_model_id = "stabilityai/stable-diffusion-xl-base-1.0" +tcd_lora_id = "h1t/TCD-SDXL-LoRA" + +pipe = StableDiffusionXLPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to(device) +pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config) + +pipe.load_lora_weights(tcd_lora_id) +pipe.fuse_lora() + +prompt = "Beautiful woman, bubblegum pink, lemon yellow, minty blue, futuristic, high-detail, epic composition, watercolor." + +image = pipe( + prompt=prompt, + num_inference_steps=4, + guidance_scale=0, + eta=0.3, + generator=torch.Generator(device=device).manual_seed(0), +).images[0] +``` + +![](https://github.com/jabir-zheng/TCD/raw/main/assets/t2i_tcd.png) + + + +Eta (referred to as `gamma` in the paper) is used to control the stochasticity in every step. +A value of 0.3 often yields good results, where eta = 0 means determinstic and eta = 1 is identity to Multi-step Consistency Sampler (as well as LCMScheduler). +We recommend using a higher eta when increasing the number of inference steps. + + +## TCD-LoRA is Versatile for Community Models + +As mentioned above, the TCD-LoRA is versatile for community models and plugins. We initially demonstrate the results with a community fine-tuned base model [animagine-xl-3.0](https://huggingface.co/cagliostrolab/animagine-xl-3.0). + +```python +import torch +from diffusers import StableDiffusionXLPipeline, TCDScheduler + +device = "cuda" +base_model_id = "cagliostrolab/animagine-xl-3.0" +tcd_lora_id = "h1t/TCD-SDXL-LoRA" + +pipe = StableDiffusionXLPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to(device) +pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config) + +pipe.load_lora_weights(tcd_lora_id) +pipe.fuse_lora() + +prompt = "A man, clad in a meticulously tailored military uniform, stands with unwavering resolve. The uniform boasts intricate details, and his eyes gleam with determination. Strands of vibrant, windswept hair peek out from beneath the brim of his cap." + +image = pipe( + prompt=prompt, + num_inference_steps=8, + guidance_scale=0, + eta=0.3, + generator=torch.Generator(device=device).manual_seed(0), +).images[0] +``` + +![](https://github.com/jabir-zheng/TCD/raw/main/assets/animagine_xl.png) + +Furthermore, TCD-LoRA also support other style LoRA. Here is an example with [Papercut](https://huggingface.co/TheLastBen/Papercut_SDXL). To learn more about how to combine LoRAs, refer to [this guide](https://huggingface.co/docs/diffusers/tutorials/using_peft_for_inference#combine-multiple-adapters). + +```python +import torch +from diffusers import StableDiffusionXLPipeline +from scheduling_tcd import TCDScheduler + +device = "cuda" +base_model_id = "stabilityai/stable-diffusion-xl-base-1.0" +tcd_lora_id = "h1t/TCD-SDXL-LoRA" +styled_lora_id = "TheLastBen/Papercut_SDXL" + +pipe = StableDiffusionXLPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to(device) +pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config) + +pipe.load_lora_weights(tcd_lora_id, adapter_name="tcd") +pipe.load_lora_weights(styled_lora_id, adapter_name="style") +pipe.set_adapters(["tcd", "style"], adapter_weights=[1.0, 1.0]) + +prompt = "papercut of a winter mountain, snow" + +image = pipe( + prompt=prompt, + num_inference_steps=4, + guidance_scale=0, + eta=0.3, + generator=torch.Generator(device=device).manual_seed(0), +).images[0] +``` + +![](https://github.com/jabir-zheng/TCD/raw/main/assets/styled_lora.png) + + +## Inpainting with TCD + + +```python +import torch +from diffusers import AutoPipelineForInpainting, TCDScheduler +from diffusers.utils import load_image, make_image_grid + +device = "cuda" +base_model_id = "diffusers/stable-diffusion-xl-1.0-inpainting-0.1" +tcd_lora_id = "h1t/TCD-SDXL-LoRA" + +pipe = AutoPipelineForInpainting.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to(device) +pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config) + +pipe.load_lora_weights(tcd_lora_id) +pipe.fuse_lora() + +img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" +mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" + +init_image = load_image(img_url).resize((1024, 1024)) +mask_image = load_image(mask_url).resize((1024, 1024)) + +prompt = "a tiger sitting on a park bench" + +image = pipe( + prompt=prompt, + image=init_image, + mask_image=mask_image, + num_inference_steps=8, + guidance_scale=0, + eta=0.3, # Eta (referred to as `gamma` in the paper) is used to control the stochasticity in every step. A value of 0.3 often yields good results. + strength=0.99, # make sure to use `strength` below 1.0 + generator=torch.Generator(device=device).manual_seed(0), +).images[0] + +grid_image = make_image_grid([init_image, mask_image, image], rows=1, cols=3) +``` + +![](https://github.com/jabir-zheng/TCD/raw/main/assets/inpainting_tcd.png) + + +## Compatibility with ControlNet + +For this example, we'll keep using the SDXL model and the TCD-LoRA for SDXL with depth and canny ControlNet. + +### Depth ControlNet +```python +import torch +import numpy as np +from PIL import Image +from transformers import DPTFeatureExtractor, DPTForDepthEstimation +from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline +from diffusers.utils import load_image, make_image_grid +from scheduling_tcd import TCDScheduler + +device = "cuda" +depth_estimator = DPTForDepthEstimation.from_pretrained("Intel/dpt-hybrid-midas").to(device) +feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-hybrid-midas") + +def get_depth_map(image): + image = feature_extractor(images=image, return_tensors="pt").pixel_values.to(device) + with torch.no_grad(), torch.autocast(device): + depth_map = depth_estimator(image).predicted_depth + + depth_map = torch.nn.functional.interpolate( + depth_map.unsqueeze(1), + size=(1024, 1024), + mode="bicubic", + align_corners=False, + ) + depth_min = torch.amin(depth_map, dim=[1, 2, 3], keepdim=True) + depth_max = torch.amax(depth_map, dim=[1, 2, 3], keepdim=True) + depth_map = (depth_map - depth_min) / (depth_max - depth_min) + image = torch.cat([depth_map] * 3, dim=1) + + image = image.permute(0, 2, 3, 1).cpu().numpy()[0] + image = Image.fromarray((image * 255.0).clip(0, 255).astype(np.uint8)) + return image + +base_model_id = "stabilityai/stable-diffusion-xl-base-1.0" +controlnet_id = "diffusers/controlnet-depth-sdxl-1.0" +tcd_lora_id = "h1t/TCD-SDXL-LoRA" + +controlnet = ControlNetModel.from_pretrained( + controlnet_id, + torch_dtype=torch.float16, + variant="fp16", +).to(device) +pipe = StableDiffusionXLControlNetPipeline.from_pretrained( + base_model_id, + controlnet=controlnet, + torch_dtype=torch.float16, + variant="fp16", +).to(device) +pipe.enable_model_cpu_offload() + +pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config) + +pipe.load_lora_weights(tcd_lora_id) +pipe.fuse_lora() + +prompt = "stormtrooper lecture, photorealistic" + +image = load_image("https://huggingface.co/lllyasviel/sd-controlnet-depth/resolve/main/images/stormtrooper.png") +depth_image = get_depth_map(image) + +controlnet_conditioning_scale = 0.5 # recommended for good generalization + +image = pipe( + prompt, + image=depth_image, + num_inference_steps=4, + guidance_scale=0, + eta=0.3, + controlnet_conditioning_scale=controlnet_conditioning_scale, + generator=torch.Generator(device=device).manual_seed(0), +).images[0] + +grid_image = make_image_grid([depth_image, image], rows=1, cols=2) +``` + +![](https://github.com/jabir-zheng/TCD/raw/main/assets/controlnet_depth_tcd.png) + +### Canny ControlNet +```python +import torch +from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline +from diffusers.utils import load_image, make_image_grid +from scheduling_tcd import TCDScheduler + +device = "cuda" +base_model_id = "stabilityai/stable-diffusion-xl-base-1.0" +controlnet_id = "diffusers/controlnet-canny-sdxl-1.0" +tcd_lora_id = "h1t/TCD-SDXL-LoRA" + +controlnet = ControlNetModel.from_pretrained( + controlnet_id, + torch_dtype=torch.float16, + variant="fp16", +).to(device) +pipe = StableDiffusionXLControlNetPipeline.from_pretrained( + base_model_id, + controlnet=controlnet, + torch_dtype=torch.float16, + variant="fp16", +).to(device) +pipe.enable_model_cpu_offload() + +pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config) + +pipe.load_lora_weights(tcd_lora_id) +pipe.fuse_lora() + +prompt = "ultrarealistic shot of a furry blue bird" + +canny_image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/bird_canny.png") + +controlnet_conditioning_scale = 0.5 # recommended for good generalization + +image = pipe( + prompt, + image=canny_image, + num_inference_steps=4, + guidance_scale=0, + eta=0.3, + controlnet_conditioning_scale=controlnet_conditioning_scale, + generator=torch.Generator(device=device).manual_seed(0), +).images[0] + +grid_image = make_image_grid([canny_image, image], rows=1, cols=2) +``` +![](https://github.com/jabir-zheng/TCD/raw/main/assets/controlnet_canny_tcd.png) + + +The inference parameters in this example might not work for all examples, so we recommend you to try different values for `num_inference_steps`, `guidance_scale`, `controlnet_conditioning_scale` and `cross_attention_kwargs` parameters and choose the best one. + + +## IP-Adapter + +This example shows how to use the TCD-LoRA with the [IP-Adapter](https://github.com/tencent-ailab/IP-Adapter/tree/main) and SDXL. + +```python +import torch +from diffusers import StableDiffusionXLPipeline +from diffusers.utils import load_image, make_image_grid + +from ip_adapter import IPAdapterXL +from scheduling_tcd import TCDScheduler + +device = "cuda" +base_model_path = "stabilityai/stable-diffusion-xl-base-1.0" +image_encoder_path = "sdxl_models/image_encoder" +ip_ckpt = "sdxl_models/ip-adapter_sdxl.bin" +tcd_lora_id = "h1t/TCD-SDXL-LoRA" + +pipe = StableDiffusionXLPipeline.from_pretrained( + base_model_path, + torch_dtype=torch.float16, + variant="fp16" +) +pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config) + +pipe.load_lora_weights(tcd_lora_id) +pipe.fuse_lora() + +ip_model = IPAdapterXL(pipe, image_encoder_path, ip_ckpt, device) + +ref_image = load_image("https://raw.githubusercontent.com/tencent-ailab/IP-Adapter/main/assets/images/woman.png").resize((512, 512)) + +prompt = "best quality, high quality, wearing sunglasses" + +image = ip_model.generate( + pil_image=ref_image, + prompt=prompt, + scale=0.5, + num_samples=1, + num_inference_steps=4, + guidance_scale=0, + eta=0.3, + seed=0, +)[0] + +grid_image = make_image_grid([ref_image, image], rows=1, cols=2) +``` + +![](https://github.com/jabir-zheng/TCD/raw/main/assets/ip_adapter.png) + + + +## AnimateDiff + +[`AnimateDiff`] allows animating images using Stable Diffusion models. TCD-LoRA can substantially accelerate the process without degrading image quality. The quality of animation with TCD-LoRA and AnimateDiff has a more lucid outcome. + +```python +import torch +from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler +from scheduling_tcd import TCDScheduler +from diffusers.utils import export_to_gif + +adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5") +pipe = AnimateDiffPipeline.from_pretrained( + "frankjoshua/toonyou_beta6", + motion_adapter=adapter, +).to("cuda") + +# set TCDScheduler +pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config) + +# load TCD LoRA +pipe.load_lora_weights("h1t/TCD-SD15-LoRA", adapter_name="tcd") +pipe.load_lora_weights("guoyww/animatediff-motion-lora-zoom-in", weight_name="diffusion_pytorch_model.safetensors", adapter_name="motion-lora") + +pipe.set_adapters(["tcd", "motion-lora"], adapter_weights=[1.0, 1.2]) + +prompt = "best quality, masterpiece, 1girl, looking at viewer, blurry background, upper body, contemporary, dress" +generator = torch.manual_seed(0) +frames = pipe( + prompt=prompt, + num_inference_steps=5, + guidance_scale=0, + cross_attention_kwargs={"scale": 1}, + num_frames=24, + eta=0.3, + generator=generator +).frames[0] +export_to_gif(frames, "animation.gif") +``` + +![](https://github.com/jabir-zheng/TCD/raw/main/assets/animation_example.gif) \ No newline at end of file From 50c89449b6b43db33d32551033734e0d28715914 Mon Sep 17 00:00:00 2001 From: h1t Date: Sat, 9 Mar 2024 09:06:06 +0000 Subject: [PATCH 2/5] resolve repos --- .../inference_with_tcd_lora.md | 24 ++++++++++--------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/docs/source/en/using-diffusers/inference_with_tcd_lora.md b/docs/source/en/using-diffusers/inference_with_tcd_lora.md index 53f0c0655881..447c7e93efd7 100644 --- a/docs/source/en/using-diffusers/inference_with_tcd_lora.md +++ b/docs/source/en/using-diffusers/inference_with_tcd_lora.md @@ -18,19 +18,19 @@ Trajecotroy Consistency Distillation (TCD) enables the model to generate higher From the [Official Project Page](https://mhh0318.github.io/tcd/), the major merit of TCD can be outlined as follows: -> ***Better than Teacher:*** TCD maintains superior generative quality at both low NFEs and high NFEs, even exceeding the performance of DPM-Solver++(2S) with origin SDXL. It is worth noting that there is no additional discriminator or LPIPS supervision included during training. +- ***Better than Teacher:*** TCD maintains superior generative quality at both small and large inference steps, even exceeding the performance of [DPM-Solver++(2S)](https://huggingface.co/docs/diffusers/api/schedulers/multistep_dpm_solver) with Stable Diffusion XL (SDXL). It is worth noting that there is no additional discriminator or LPIPS supervision is included during training. -> ***Flexible NFEs:*** The NFEs for TCD sampling can be varied at will without adversely affecting the quality of the results. +- ***Flexible NFEs:*** The NFEs for TCD sampling can be varied at will without adversely affecting the quality of the results. -> ***Freely Change the Detailing:*** During inference, the level of detail in the image can be simply modified by adjusing one hyper-parameter gamma. This option does not require the introduction of any additional parameters. +- ***Freely Change the Detailing:*** During inference, the level of detail in the image can be simply modified by adjusing the hyper-parameter gamma. This option does not require any additional parameters. For more technical details of TCD, please refer to [the paper](https://arxiv.org/abs/2402.19159). -Trajectory consistency distillation can directly place on top of a pre-trained diffusion model as a LoRA module. Such LoRA can be identified as a versatile acceleration module applicable to different fine-tuned models or LoRAs sharing the same base model without the need for additional training. +Trajectory consistency distillation can be directly placed on top of a pre-trained diffusion model as a [LoRA](https://huggingface.co/docs/diffusers/main/en/training/lora) module. Such a LoRA can be identified as a versatile acceleration module applicable to different fine-tuned models or LoRAs sharing the same base model without the need for additional training. TCD-LoRAs are available for [stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5), [stable-diffusion-2-1-base](https://huggingface.co/stabilityai/stable-diffusion-2-1-base), and [stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0). -The corresponding checkpoints can be found at [TCD-SD15](https://huggingface.co/h1t/TCD-SD15-LoRA), [TCD-SD21-base](https://huggingface.co/h1t/TCD-SD21-base-LoRA) and [TCD-SDXL](https://huggingface.co/h1t/TCD-SDXL-LoRA), separately. +The corresponding checkpoints can be found at [TCD-SD15](https://huggingface.co/h1t/TCD-SD15-LoRA), [TCD-SD21-base](https://huggingface.co/h1t/TCD-SD21-base-LoRA), and [TCD-SDXL](https://huggingface.co/h1t/TCD-SDXL-LoRA), respectively. This guide shows how to perform inference with TCD-LoRAs for @@ -42,7 +42,7 @@ This guide shows how to perform inference with TCD-LoRAs for - IP-Adapter - AnimateDiff -TCD-LoRA can be considered an advanced method compared with [LCM-LoRA](https://latent-consistency-models.github.io/). The guide of TCD-LoRA workflow is: +TCD-LoRA can be considered an advanced method compared with [LCM-LoRA](https://latent-consistency-models.github.io/). The main parts of the TCD-LoRA workflow are as follows:: - Load the task specific pipeline and model. - Set the scheduler to [`TCDScheduler`]. - Load the TCD-LoRA weights for the model. @@ -76,7 +76,7 @@ pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config) pipe.load_lora_weights(tcd_lora_id) pipe.fuse_lora() -prompt = "Beautiful woman, bubblegum pink, lemon yellow, minty blue, futuristic, high-detail, epic composition, watercolor." +prompt = "Painting of the orange cat Otto von Garfield, Count of Bismarck-Schönhausen, Duke of Lauenburg, Minister-President of Prussia. Depicted wearing a Prussian Pickelhaube and eating his favorite meal - lasagna." image = pipe( prompt=prompt, @@ -87,18 +87,20 @@ image = pipe( ).images[0] ``` -![](https://github.com/jabir-zheng/TCD/raw/main/assets/t2i_tcd.png) +![](https://github.com/jabir-zheng/TCD/raw/main/assets/demo_image.png) + Eta (referred to as `gamma` in the paper) is used to control the stochasticity in every step. A value of 0.3 often yields good results, where eta = 0 means determinstic and eta = 1 is identity to Multi-step Consistency Sampler (as well as LCMScheduler). We recommend using a higher eta when increasing the number of inference steps. + ## TCD-LoRA is Versatile for Community Models -As mentioned above, the TCD-LoRA is versatile for community models and plugins. We initially demonstrate the results with a community fine-tuned base model [animagine-xl-3.0](https://huggingface.co/cagliostrolab/animagine-xl-3.0). +As mentioned above, the TCD-LoRA is versatile for community models and plugins. To test-drive this, load a community fine-tuned base model [animagine-xl-3.0](https://huggingface.co/cagliostrolab/animagine-xl-3.0). ```python import torch @@ -127,7 +129,7 @@ image = pipe( ![](https://github.com/jabir-zheng/TCD/raw/main/assets/animagine_xl.png) -Furthermore, TCD-LoRA also support other style LoRA. Here is an example with [Papercut](https://huggingface.co/TheLastBen/Papercut_SDXL). To learn more about how to combine LoRAs, refer to [this guide](https://huggingface.co/docs/diffusers/tutorials/using_peft_for_inference#combine-multiple-adapters). +Furthermore, TCD-LoRA also supports LoRAs corresponding to other styles. Below is an example with [Papercut](https://huggingface.co/TheLastBen/Papercut_SDXL). To learn more about how to combine LoRAs, refer to [this guide](https://huggingface.co/docs/diffusers/tutorials/using_peft_for_inference#combine-multiple-adapters). ```python import torch @@ -205,7 +207,7 @@ grid_image = make_image_grid([init_image, mask_image, image], rows=1, cols=3) ## Compatibility with ControlNet -For this example, we'll keep using the SDXL model and the TCD-LoRA for SDXL with depth and canny ControlNet. +For this example, you'll keep using the SDXL model and the TCD-LoRA for SDXL with depth and canny ControlNets. ### Depth ControlNet ```python From f3c80904f2893be96c059cdcdec731a05662c5e9 Mon Sep 17 00:00:00 2001 From: Michael Date: Tue, 12 Mar 2024 11:19:13 +0800 Subject: [PATCH 3/5] Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- .../inference_with_tcd_lora.md | 90 +++++++++---------- 1 file changed, 44 insertions(+), 46 deletions(-) diff --git a/docs/source/en/using-diffusers/inference_with_tcd_lora.md b/docs/source/en/using-diffusers/inference_with_tcd_lora.md index 447c7e93efd7..d22ce9deff5a 100644 --- a/docs/source/en/using-diffusers/inference_with_tcd_lora.md +++ b/docs/source/en/using-diffusers/inference_with_tcd_lora.md @@ -12,55 +12,51 @@ specific language governing permissions and limitations under the License. [[open-in-colab]] -# Performing inference with TCD-LoRA +# Trajectory Consistency Distillation-LoRA -Trajecotroy Consistency Distillation (TCD) enables the model to generate higher quality, more detailed images with fewer steps. Additionally, TCD demonstrates superior performance even under conditions of high NFEs. +Trajectory Consistency Distillation (TCD) enables a model to generate higher quality and more detailed images with fewer steps. Additionally, TCD demonstrates superior performance even under conditions of high NFEs. -From the [Official Project Page](https://mhh0318.github.io/tcd/), the major merit of TCD can be outlined as follows: +The major advantages of TCD are: -- ***Better than Teacher:*** TCD maintains superior generative quality at both small and large inference steps, even exceeding the performance of [DPM-Solver++(2S)](https://huggingface.co/docs/diffusers/api/schedulers/multistep_dpm_solver) with Stable Diffusion XL (SDXL). It is worth noting that there is no additional discriminator or LPIPS supervision is included during training. +- Better than Teacher: TCD demonstrates superior generative quality at both small and large inference steps and exceeds the performance of [DPM-Solver++(2S)](../../api/schedulers/multistep_dpm_solver) with Stable Diffusion XL (SDXL). There is no additional discriminator or LPIPS supervision included during TCD training. -- ***Flexible NFEs:*** The NFEs for TCD sampling can be varied at will without adversely affecting the quality of the results. +- Flexible NFEs: The NFEs for TCD sampling can be freely adjusted without adversely affecting the image quality. -- ***Freely Change the Detailing:*** During inference, the level of detail in the image can be simply modified by adjusing the hyper-parameter gamma. This option does not require any additional parameters. +- Freely change detail level: During inference, the level of detail in the image can be adjusted with a single hyperparameter, *gamma*. -For more technical details of TCD, please refer to [the paper](https://arxiv.org/abs/2402.19159). +> [!TIP] +> For more technical details of TCD, please refer to the [paper](https://arxiv.org/abs/2402.19159) or official [project page](https://mhh0318.github.io/tcd/)). -Trajectory consistency distillation can be directly placed on top of a pre-trained diffusion model as a [LoRA](https://huggingface.co/docs/diffusers/main/en/training/lora) module. Such a LoRA can be identified as a versatile acceleration module applicable to different fine-tuned models or LoRAs sharing the same base model without the need for additional training. +For large models like SDXL, TCD is trained with [LoRA](https://huggingface.co/docs/peft/conceptual_guides/adapter#low-rank-adaptation-lora) to reduce memory usage. This is also useful because you can reuse LoRAs between different finetuned models, as long as they share the same base model, without further training. -TCD-LoRAs are available for [stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5), [stable-diffusion-2-1-base](https://huggingface.co/stabilityai/stable-diffusion-2-1-base), and [stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0). -The corresponding checkpoints can be found at [TCD-SD15](https://huggingface.co/h1t/TCD-SD15-LoRA), [TCD-SD21-base](https://huggingface.co/h1t/TCD-SD21-base-LoRA), and [TCD-SDXL](https://huggingface.co/h1t/TCD-SDXL-LoRA), respectively. +This guide will show you how to perform inference with TCD-LoRAs for a variety of tasks like text-to-image and inpainting, as well as how you can easily combine TCD-LoRAs with other adapters. Choose one of the supported base model and it's corresponding TCD-LoRA checkpoint from the table below to get started. -This guide shows how to perform inference with TCD-LoRAs for -- text-to-image -- inpainting -- community models -- style LoRA -- ControlNet -- IP-Adapter -- AnimateDiff +| Base model | TCD-LoRA checkpoint | +|-------------------------------------------------------------------------------------------------|----------------------------------------------------------------| +| [stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) | [TCD-SD15](https://huggingface.co/h1t/TCD-SD15-LoRA) | +| [stable-diffusion-2-1-base](https://huggingface.co/stabilityai/stable-diffusion-2-1-base) | [TCD-SD21-base](https://huggingface.co/h1t/TCD-SD21-base-LoRA) | +| [stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) | [TCD-SDXL](https://huggingface.co/h1t/TCD-SDXL-LoRA) | -TCD-LoRA can be considered an advanced method compared with [LCM-LoRA](https://latent-consistency-models.github.io/). The main parts of the TCD-LoRA workflow are as follows:: -- Load the task specific pipeline and model. -- Set the scheduler to [`TCDScheduler`]. -- Load the TCD-LoRA weights for the model. -- Set the `num_inference_steps` between [4, 50]. -- Set `eta` from [0, 1]. Larger `eta` in [`TCDScheduler`] will lead to blurrier images. -- Perform inference with the pipeline with the usual parameters. -Let's look at how we can perform inference with TCD-LoRAs for different tasks. - -First, make sure you have [peft](https://github.com/huggingface/peft) installed, for better LoRA support. +Make sure you have [PEFT](https://github.com/huggingface/peft) installed for better LoRA support. ```bash pip install -U peft ``` -## Text-to-image +## General tasks + +In this guide, let's use the [`StableDiffusionXLPipeline`] and the [`TCDScheduler`]. Use the [`~StableDiffusionPipeline.load_lora_weights`] method to load the SDXL-compatible TCD-LoRA weights. + +A few tips to keep in mind for TCD-LoRA inference are to: -You can use the [`StableDiffusionXLPipeline`] with the scheduler: [`TCDScheduler`] and then load the TCD-LoRA. Together with the TCD-LoRA and the TCDScheduler, the pipeline enables a fast inference workflow with high quality outputs. +- Keep the `num_inference_steps` between 4 and 50 +- Set `eta` (used to control stochasticity at each step) between 0 and 1. You should use a higher `eta` when increasing the number of inference steps, but the downside is that a larger `eta` in [`TCDScheduler`] leads to blurrier images. A value of 0.3 is recommended to produce good results. + + + ```python import torch @@ -90,17 +86,10 @@ image = pipe( ![](https://github.com/jabir-zheng/TCD/raw/main/assets/demo_image.png) - - -Eta (referred to as `gamma` in the paper) is used to control the stochasticity in every step. -A value of 0.3 often yields good results, where eta = 0 means determinstic and eta = 1 is identity to Multi-step Consistency Sampler (as well as LCMScheduler). -We recommend using a higher eta when increasing the number of inference steps. - - -## TCD-LoRA is Versatile for Community Models +## Community models -As mentioned above, the TCD-LoRA is versatile for community models and plugins. To test-drive this, load a community fine-tuned base model [animagine-xl-3.0](https://huggingface.co/cagliostrolab/animagine-xl-3.0). +TCD-LoRA also works with many community finetuned models and plugins. For example, load the [animagine-xl-3.0](https://huggingface.co/cagliostrolab/animagine-xl-3.0) checkpoint which is a community finetuned version of SDXL for generating anime images. ```python import torch @@ -129,7 +118,10 @@ image = pipe( ![](https://github.com/jabir-zheng/TCD/raw/main/assets/animagine_xl.png) -Furthermore, TCD-LoRA also supports LoRAs corresponding to other styles. Below is an example with [Papercut](https://huggingface.co/TheLastBen/Papercut_SDXL). To learn more about how to combine LoRAs, refer to [this guide](https://huggingface.co/docs/diffusers/tutorials/using_peft_for_inference#combine-multiple-adapters). +TCD-LoRA also supports other LoRAs trained on different styles. For example, let's load the [TheLastBen/Papercut_SDXL](https://huggingface.co/TheLastBen/Papercut_SDXL) LoRA and fuse it with the TCD-LoRA with the [`~loaders.UNet2DConditionLoadersMixin.set_adapters`] method. + +> [!TIP] +> Check out the [Merge LoRAs](merge_loras) guide to learn more about efficient merging methods. ```python import torch @@ -205,11 +197,12 @@ grid_image = make_image_grid([init_image, mask_image, image], rows=1, cols=3) ![](https://github.com/jabir-zheng/TCD/raw/main/assets/inpainting_tcd.png) -## Compatibility with ControlNet +## Adapters -For this example, you'll keep using the SDXL model and the TCD-LoRA for SDXL with depth and canny ControlNets. +TCD-LoRA is very versatile, and it can be combined with other adapter types like ControlNets, IP-Adapter, and AnimateDiff. -### Depth ControlNet + + ```python import torch import numpy as np @@ -341,7 +334,8 @@ grid_image = make_image_grid([canny_image, image], rows=1, cols=2) The inference parameters in this example might not work for all examples, so we recommend you to try different values for `num_inference_steps`, `guidance_scale`, `controlnet_conditioning_scale` and `cross_attention_kwargs` parameters and choose the best one. -## IP-Adapter + + This example shows how to use the TCD-LoRA with the [IP-Adapter](https://github.com/tencent-ailab/IP-Adapter/tree/main) and SDXL. @@ -393,7 +387,8 @@ grid_image = make_image_grid([ref_image, image], rows=1, cols=2) -## AnimateDiff + + [`AnimateDiff`] allows animating images using Stable Diffusion models. TCD-LoRA can substantially accelerate the process without degrading image quality. The quality of animation with TCD-LoRA and AnimateDiff has a more lucid outcome. @@ -432,4 +427,7 @@ frames = pipe( export_to_gif(frames, "animation.gif") ``` -![](https://github.com/jabir-zheng/TCD/raw/main/assets/animation_example.gif) \ No newline at end of file +![](https://github.com/jabir-zheng/TCD/raw/main/assets/animation_example.gif) + + + \ No newline at end of file From 329837654a495334cbfd56c33c705c1cb46e84e1 Mon Sep 17 00:00:00 2001 From: h1t Date: Tue, 12 Mar 2024 05:32:58 +0000 Subject: [PATCH 4/5] revise NFEs related --- docs/source/en/using-diffusers/inference_with_tcd_lora.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/source/en/using-diffusers/inference_with_tcd_lora.md b/docs/source/en/using-diffusers/inference_with_tcd_lora.md index d22ce9deff5a..1263a2b96181 100644 --- a/docs/source/en/using-diffusers/inference_with_tcd_lora.md +++ b/docs/source/en/using-diffusers/inference_with_tcd_lora.md @@ -14,13 +14,13 @@ specific language governing permissions and limitations under the License. # Trajectory Consistency Distillation-LoRA -Trajectory Consistency Distillation (TCD) enables a model to generate higher quality and more detailed images with fewer steps. Additionally, TCD demonstrates superior performance even under conditions of high NFEs. +Trajectory Consistency Distillation (TCD) enables a model to generate higher quality and more detailed images with fewer steps. Moreover, owing to the effective error mitigation during the distillation process, TCD demonstrates superior performance even under conditions of large inference steps. The major advantages of TCD are: - Better than Teacher: TCD demonstrates superior generative quality at both small and large inference steps and exceeds the performance of [DPM-Solver++(2S)](../../api/schedulers/multistep_dpm_solver) with Stable Diffusion XL (SDXL). There is no additional discriminator or LPIPS supervision included during TCD training. -- Flexible NFEs: The NFEs for TCD sampling can be freely adjusted without adversely affecting the image quality. +- Flexible Inference Steps: The inference steps for TCD sampling can be freely adjusted without adversely affecting the image quality. - Freely change detail level: During inference, the level of detail in the image can be adjusted with a single hyperparameter, *gamma*. @@ -203,6 +203,7 @@ TCD-LoRA is very versatile, and it can be combined with other adapter types like + ```python import torch import numpy as np From 46450f6c7a453eb07d70d74fdca690762c110ab7 Mon Sep 17 00:00:00 2001 From: h1t Date: Wed, 13 Mar 2024 03:45:09 +0000 Subject: [PATCH 5/5] change inpainting location --- .../inference_with_tcd_lora.md | 90 ++++++++++--------- 1 file changed, 47 insertions(+), 43 deletions(-) diff --git a/docs/source/en/using-diffusers/inference_with_tcd_lora.md b/docs/source/en/using-diffusers/inference_with_tcd_lora.md index 1263a2b96181..10ad674e73ac 100644 --- a/docs/source/en/using-diffusers/inference_with_tcd_lora.md +++ b/docs/source/en/using-diffusers/inference_with_tcd_lora.md @@ -85,7 +85,52 @@ image = pipe( ![](https://github.com/jabir-zheng/TCD/raw/main/assets/demo_image.png) + + + + +```python +import torch +from diffusers import AutoPipelineForInpainting, TCDScheduler +from diffusers.utils import load_image, make_image_grid + +device = "cuda" +base_model_id = "diffusers/stable-diffusion-xl-1.0-inpainting-0.1" +tcd_lora_id = "h1t/TCD-SDXL-LoRA" + +pipe = AutoPipelineForInpainting.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to(device) +pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config) + +pipe.load_lora_weights(tcd_lora_id) +pipe.fuse_lora() + +img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" +mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" + +init_image = load_image(img_url).resize((1024, 1024)) +mask_image = load_image(mask_url).resize((1024, 1024)) + +prompt = "a tiger sitting on a park bench" +image = pipe( + prompt=prompt, + image=init_image, + mask_image=mask_image, + num_inference_steps=8, + guidance_scale=0, + eta=0.3, + strength=0.99, # make sure to use `strength` below 1.0 + generator=torch.Generator(device=device).manual_seed(0), +).images[0] + +grid_image = make_image_grid([init_image, mask_image, image], rows=1, cols=3) +``` + +![](https://github.com/jabir-zheng/TCD/raw/main/assets/inpainting_tcd.png) + + + + ## Community models @@ -154,49 +199,6 @@ image = pipe( ![](https://github.com/jabir-zheng/TCD/raw/main/assets/styled_lora.png) -## Inpainting with TCD - - -```python -import torch -from diffusers import AutoPipelineForInpainting, TCDScheduler -from diffusers.utils import load_image, make_image_grid - -device = "cuda" -base_model_id = "diffusers/stable-diffusion-xl-1.0-inpainting-0.1" -tcd_lora_id = "h1t/TCD-SDXL-LoRA" - -pipe = AutoPipelineForInpainting.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to(device) -pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config) - -pipe.load_lora_weights(tcd_lora_id) -pipe.fuse_lora() - -img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" -mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" - -init_image = load_image(img_url).resize((1024, 1024)) -mask_image = load_image(mask_url).resize((1024, 1024)) - -prompt = "a tiger sitting on a park bench" - -image = pipe( - prompt=prompt, - image=init_image, - mask_image=mask_image, - num_inference_steps=8, - guidance_scale=0, - eta=0.3, # Eta (referred to as `gamma` in the paper) is used to control the stochasticity in every step. A value of 0.3 often yields good results. - strength=0.99, # make sure to use `strength` below 1.0 - generator=torch.Generator(device=device).manual_seed(0), -).images[0] - -grid_image = make_image_grid([init_image, mask_image, image], rows=1, cols=3) -``` - -![](https://github.com/jabir-zheng/TCD/raw/main/assets/inpainting_tcd.png) - - ## Adapters TCD-LoRA is very versatile, and it can be combined with other adapter types like ControlNets, IP-Adapter, and AnimateDiff. @@ -204,6 +206,8 @@ TCD-LoRA is very versatile, and it can be combined with other adapter types like +### Depth ControlNet + ```python import torch import numpy as np