[docs] Merge LoRAs #7213

stevhliu · 2024-03-04T21:14:45Z

Continuation of #7110 to implement option 3 for combining all merge methods into one doc and cleaning up the other docs to focus on loading or basic usage.

HuggingFaceDocBuilderDev · 2024-03-04T21:22:13Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

docs/source/en/_toctree.yml

sayakpaul · 2024-03-05T03:22:08Z

docs/source/en/tutorials/using_peft_for_inference.md

@@ -165,77 +147,6 @@ list_adapters_component_wise
 {"text_encoder": ["toy", "pixel"], "unet": ["toy", "pixel"], "text_encoder_2": ["toy", "pixel"]}
 ```

-## Compatibility with `torch.compile`


Why is this going away? 😱

Oops sorry I forgot about this section! Added it back in as a nested section under the fuse_lora section 😅

I am a little uncomfortable about how it's ported in the new section.

Left a comment here: https://github.com/huggingface/diffusers/pull/7213/files#r1513791539.

docs/source/en/tutorials/using_peft_for_inference.md

docs/source/en/using-diffusers/loading_adapters.md

docs/source/en/using-diffusers/merge_loras.md

sayakpaul

This looks so NEAT!

@pacman100 for awareness.

sayakpaul · 2024-03-06T04:09:02Z

docs/source/en/using-diffusers/merge_loras.md

+### torch.compile
+
+[torch.compile](../optimization/torch2.0#torchcompile) can speed up your pipeline even more, but you have to make sure the LoRA weights are fused first and then unloaded.
+
+```py
+pipeline = DiffusionPipeline.from_pretrained(
+    "username/fused-ikea-feng", torch_dtype=torch.float16,
+).to("cuda")
+
+pipeline.unet.to(memory_format=torch.channels_last)
+pipeline.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
+
+image = pipeline("A bowl of ramen shaped like a cute kawaii bear, by Feng Zikai", generator=torch.manual_seed(0)).images[0]
+image
+```


I think this section is quite incomplete.

It doesn't load any LoRA.

It doesn't apply fusion.

It doesn't display the generated image.

Could we please ensure the section is ported as closely as possible? torch.compile() compatibility with PEFT is super important for our users.

see changes here

About points 1-2:

Since the torch.compile section is already nested under the fuse_lora section, it is assumed that this is a subtopic of fuse_lora meaning you should read that section first. Adding the same code from the section above feels redundant. Do you think it'd be more helpful/clear if I added something saying "Make sure you read thefuse_lora section above first".

About point 3:

I think it'd be more impactful to show the speed up in time rather than another image. wdyt?

I feel relatively strongly about points 1 - 2. Even if a single step is missed torch.compile() can fail during inference.

In the fuse_lora() section, the step of the code shows pipeline.unfuse_lora(). This will make torch.compile() fail. The right sequence of the steps would be:

# provided `set_adapters()` was already called. pipe.fuse_lora() pipe.unload_lora_weights() pipe.unet.to(memory_format=torch.channels_last) pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)

So, I would appreciate it if the example code snippet for torch.compile() when doing LoRA fusion was as self-contained and as complete as possible.

For pt. 3, the timings report will be inappropriate since it needs warming up. So, without that it will give an improper view of the timing.

Sounds good! I went with your suggestion to include an entirely self-contained code snippet 🙂

Thank you Steven! Appreciate it.

sayakpaul

Let's fix the torch.compile() section and then we can ship!

sayakpaul · 2024-03-07T19:00:11Z

Feel free to merge it. Looking solid and thanks a mile for beating it with me.

stevhliu requested review from yiyixuxu and sayakpaul March 4, 2024 21:25

sayakpaul mentioned this pull request Mar 5, 2024

[Docs] add a guide for adapter merging. #7110

Closed