Why your method is better than ip-adapter in text control capabilities? #7

zhangqizky · 2024-01-18T02:53:37Z

Hello, thanks for your nice work, I get confused about sth. Could you please help explain it?
ip adpater doesn't train UNet either, add a cross-attention just as yours, the big difference is you use landmark guide net, and they use clip image to control the structure(IP-Adapter Plus V2), I don't get why your method is better than ip-adapter in text control capabilities. Do you plan to release some training details, e.g. training data?
Thanks for you nice work!!!

haofanwang · 2024-01-18T02:57:10Z

@zhangqizky We also have a certain degree of degradation in text editing capabilities, mainly due to cross-attention. But since we introduced IdentityNet, we can maintain fidelity and set a lower weight for image cross-attention.

zhangqizky · 2024-01-18T06:30:27Z

@haofanwang ok, thanks for reply.

askerlee · 2024-01-23T17:08:05Z

Not sure if it's the reason but XL series have better text consistency. Moreover, custom models usually have better text consistency than the base models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why your method is better than ip-adapter in text control capabilities? #7

Why your method is better than ip-adapter in text control capabilities? #7

zhangqizky commented Jan 18, 2024

haofanwang commented Jan 18, 2024

zhangqizky commented Jan 18, 2024

askerlee commented Jan 23, 2024

Why your method is better than ip-adapter in text control capabilities? #7

Why your method is better than ip-adapter in text control capabilities? #7

Comments

zhangqizky commented Jan 18, 2024

haofanwang commented Jan 18, 2024

zhangqizky commented Jan 18, 2024

askerlee commented Jan 23, 2024