-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs] Merge LoRAs #7213
[docs] Merge LoRAs #7213
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@@ -165,77 +147,6 @@ list_adapters_component_wise | |||
{"text_encoder": ["toy", "pixel"], "unet": ["toy", "pixel"], "text_encoder_2": ["toy", "pixel"]} | |||
``` | |||
|
|||
## Compatibility with `torch.compile` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this going away? 😱
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops sorry I forgot about this section! Added it back in as a nested section under the fuse_lora
section 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little uncomfortable about how it's ported in the new section.
Left a comment here: https://github.com/huggingface/diffusers/pull/7213/files#r1513791539.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks so NEAT!
@pacman100 for awareness.
### torch.compile | ||
|
||
[torch.compile](../optimization/torch2.0#torchcompile) can speed up your pipeline even more, but you have to make sure the LoRA weights are fused first and then unloaded. | ||
|
||
```py | ||
pipeline = DiffusionPipeline.from_pretrained( | ||
"username/fused-ikea-feng", torch_dtype=torch.float16, | ||
).to("cuda") | ||
|
||
pipeline.unet.to(memory_format=torch.channels_last) | ||
pipeline.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True) | ||
|
||
image = pipeline("A bowl of ramen shaped like a cute kawaii bear, by Feng Zikai", generator=torch.manual_seed(0)).images[0] | ||
image | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this section is quite incomplete.
- It doesn't load any LoRA.
- It doesn't apply fusion.
- It doesn't display the generated image.
Could we please ensure the section is ported as closely as possible? torch.compile()
compatibility with PEFT is super important for our users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see changes here
About points 1-2:
Since the torch.compile
section is already nested under the fuse_lora
section, it is assumed that this is a subtopic of fuse_lora
meaning you should read that section first. Adding the same code from the section above feels redundant. Do you think it'd be more helpful/clear if I added something saying "Make sure you read thefuse_lora
section above first".
About point 3:
I think it'd be more impactful to show the speed up in time rather than another image. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel relatively strongly about points 1 - 2. Even if a single step is missed torch.compile()
can fail during inference.
In the fuse_lora()
section, the step of the code shows pipeline.unfuse_lora()
. This will make torch.compile()
fail. The right sequence of the steps would be:
# provided `set_adapters()` was already called.
pipe.fuse_lora()
pipe.unload_lora_weights()
pipe.unet.to(memory_format=torch.channels_last)
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
So, I would appreciate it if the example code snippet for torch.compile()
when doing LoRA fusion was as self-contained and as complete as possible.
For pt. 3, the timings report will be inappropriate since it needs warming up. So, without that it will give an improper view of the timing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I went with your suggestion to include an entirely self-contained code snippet 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Steven! Appreciate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's fix the torch.compile()
section and then we can ship!
Feel free to merge it. Looking solid and thanks a mile for beating it with me. |
Continuation of #7110 to implement option 3 for combining all merge methods into one doc and cleaning up the other docs to focus on loading or basic usage.