[LoRA] use the PyTorch classes wherever needed and start depcrecation cycles #7204

sayakpaul · 2024-03-04T05:58:44Z

What does this PR do?

Since we have shifted to the peft backend for all things LoRA, there's no need for us to use LoRACompatible* classes now.

We should also start the deprecation cycles for the LoRALinearLayer and LoRAConv2dLayer. This PR does that.

HuggingFaceDocBuilderDev · 2024-03-04T06:06:19Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yiyixuxu

Ohh great! Thanks so much

We can deprecate the scale argument everywhere now, too, right?
e.g., all the attention processors

src/diffusers/models/attention.py

src/diffusers/models/attention_processor.py

sayakpaul · 2024-03-05T18:19:12Z

@yiyixuxu up for another review. Rigorous review appreciated!

yiyixuxu

Thanks!
I have two main feedbacks:

let's deprecate the scale argument by removing it from the signature (see here [LoRA] use the PyTorch classes wherever needed and start depcrecation cycles #7204 (comment))
I see that you deprecated the scale argument from some more public classes, but for some less public classes, you simply removed or silently ignored it - maybe we should just deprecate it everywhere and remove them all together later? e.g. e.g. [LoRA] use the PyTorch classes wherever needed and start depcrecation cycles #7204 (comment) and [LoRA] use the PyTorch classes wherever needed and start depcrecation cycles #7204 (comment)

src/diffusers/models/attention.py

src/diffusers/models/resnet.py

src/diffusers/models/unets/unet_2d_blocks.py

src/diffusers/models/downsampling.py

yiyixuxu · 2024-03-08T00:49:41Z

cc @BenjaminBossan - we would appreciate it if you can give a review also

sayakpaul · 2024-03-08T04:55:58Z

@yiyixuxu addressed all your comments. They were very very helpful! Thank you!

BenjaminBossan

Very thorough PR, thanks a lot Sayak. This LGTM overall.

Just a suggestion for the deprecation. Currently, the message is:

Use of scale is deprecated. Please remove the argument

As a user, I might not know what to do with that info: Is this feature removed completely or can I still use it, but have to do it differently? Also, I might get the impression that I can still pass scale and it works, it's just deprecated, when in fact the argument doesn't do anything, right? Perhaps the message could be clarified.

Moreover, if we already have an idea in which diffusers version this will be removed (hence raise an error), it could be added to the warning. On top, we could add a comment like # TODO remove argument in diffusers X.Y to make it more likely that this will indeed be cleaned up when this version is released.

sayakpaul · 2024-03-11T14:57:03Z

Thanks, Benjamin!

As a user, I might not know what to do with that info: Is this feature removed completely or can I still use it, but have to do it differently? Also, I might get the impression that I can still pass scale and it works, it's just deprecated, when in fact the argument doesn't do anything, right? Perhaps the message could be clarified.

Very good point. I clarified that as much as I could.

Moreover, if we already have an idea in which diffusers version this will be removed (hence raise an error), it could be added to the warning. On top, we could add a comment like # TODO remove argument in diffusers X.Y to make it more likely that this will indeed be cleaned up when this version is released.

The depcrecate() utility I am using will automatically take care of that. So, once we hit 1.0.0, on any PR, it will raise an error unless handled accordingly. Here is an example: #6885. Does that work?

BenjaminBossan · 2024-03-11T15:04:50Z

Regarding the error message:

deprecation_message = f"Use of scale is deprecated. Please remove the argument. Even if you pass it to the forward() of the {attn.__class__.__name__} class, it won't have any effect."

I think it's almost too detailed, users will not normally pass the argument directly to the {attn.__class__.__name__}, right? Instead, the argument was probably passed along by something higher up. I think the message could be shortened to:

deprecation_message = f"The scale is deprecated and will be ignored. Please remove it, as passing it will raise an error in the future."

Can we also add a sentence on how to control the scale instead?

The depcrecate() utility I am using will automatically take care of that. So, once we hit 1.0.0, on any PR, it will raise an error unless handled accordingly. Here is an example: #6885. Does that work?

Cool, I didn't know 👍

sayakpaul · 2024-03-11T15:07:53Z

How about?

The scale is deprecated and will be ignored. Please remove it, as passing it will raise an error in the future. scale should directly be passed while calling the underlying pipeline component i.e., via cross_attention_kwargs.

BenjaminBossan · 2024-03-11T15:27:10Z

Yes, that sounds good, as it clarifies to the user what they need to do.

yiyixuxu

oh thanks!
I did another round of review,

I left a question about the deprecation message, and I think we should use same message everywhere (I saw you updated in some places but not others)
let's add a warning everywhere when the scale passed via cross_atten_kwargs is ignored - we can remove all these warnings all together at the same time in the future.

src/diffusers/models/attention.py

src/diffusers/models/attention_processor.py

src/diffusers/models/transformers/transformer_2d.py

src/diffusers/models/unets/unet_2d_blocks.py

sayakpaul · 2024-03-12T03:06:42Z

@yiyixuxu have resolved your comments, except for #7204. Thank you!

younesbelkada

Thanks so much ! IMO all good on PEFT end ! Great work @sayakpaul !

src/diffusers/models/attention_processor.py

src/diffusers/models/embeddings.py

src/diffusers/models/attention.py

src/diffusers/models/downsampling.py

src/diffusers/models/embeddings.py

yiyixuxu · 2024-03-12T21:35:26Z

src/diffusers/models/transformers/transformer_2d.py

@@ -327,31 +326,20 @@ def forward(
            encoder_attention_mask = (1 - encoder_attention_mask.to(hidden_states.dtype)) * -10000.0
            encoder_attention_mask = encoder_attention_mask.unsqueeze(1)

-        # Retrieve lora scale.
-        lora_scale = cross_attention_kwargs.get("scale", 1.0) if cross_attention_kwargs is not None else 1.0