Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ip-adapter] fix problem using embeds with the plus version of ip adapters #7189

Merged
merged 6 commits into from
Mar 3, 2024

Conversation

asomoza
Copy link
Member

@asomoza asomoza commented Mar 2, 2024

What does this PR do?

Allows the use of 4D tensors to be able to pass embeds made with the IP Adapter PLUS versions

Fixes #7168

How to test:

import torch

from diffusers import AutoPipelineForText2Image
from diffusers.utils import load_image


pipeline = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16"
)

pipeline.load_ip_adapter(
    "h94/IP-Adapter",
    subfolder="sdxl_models",
    weight_name=[
        "ip-adapter_sdxl_vit-h.safetensors",
        "ip-adapter-plus_sdxl_vit-h.safetensors",
        "ip-adapter-plus-face_sdxl_vit-h.safetensors",
    ],
    image_encoder_folder="models/image_encoder",
)
pipeline.set_ip_adapter_scale([0.1, 0.7, 0.3])
pipeline.to("cuda")

face_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/women_input.png")
style_folder = "https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/style_ziggy"
style_images = [load_image(f"{style_folder}/img{i}.png") for i in range(10)]

prompt = "wonderwoman"
num_images_per_prompt = 1
guidance_scale = 7.5
do_classifier_free_guidance = guidance_scale > 1


with torch.no_grad():
    image_embeds = pipeline.prepare_ip_adapter_image_embeds(
        [face_image, style_images, face_image],
        None,
        "cuda",
        num_images_per_prompt,
        do_classifier_free_guidance,
    )

image = pipeline(
    prompt=prompt,
    ip_adapter_image_embeds=image_embeds,
    negative_prompt="",
    guidance_scale=guidance_scale,
    num_images_per_prompt=num_images_per_prompt,
).images[0]
image.save("result.png")

Who can review?

@yiyixuxu @sayakpaul

Also cc: @fabiorigano because of #7186

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! The changes look very clean and simple to me. Thank you!

Should we maybe also add a small note about this support in the IP-Adapter guide? @yiyixuxu WDYT?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@asomoza
Copy link
Member Author

asomoza commented Mar 3, 2024

Tested all the combinations I could think of with SDXL, the image isn't that nice but is faster to just use all three ip adapters at the same time ^^

Do the SD 1.5 versions have different dimensions? there's six of them.

@sayakpaul
Copy link
Member

Do the SD 1.5 versions have different dimensions? there's six of them.

Could do a quick check on the checkpoints maybe?

@asomoza
Copy link
Member Author

asomoza commented Mar 3, 2024

Could do a quick check on the checkpoints maybe?

I did and it worked with almost all of them except this one:
ip-adapter_sd15_vit-G.safetensors

RuntimeError: mat1 and mat2 shapes cannot be multiplied (32x1280 and 1024x3072)

This also reminded me that there's one with the big image encoder for the SDXL ones, so I tested it and it didn't work either:
ip-adapter_sdxl.safetensors

RuntimeError: mat1 and mat2 shapes cannot be multiplied (82240x1664 and 1280x1280)

but those errors are not related to this PR

@sayakpaul
Copy link
Member

This also reminded me that there's one with the big image encoder for the SDXL ones, so I tested it and it didn't work either:
ip-adapter_sdxl.safetensors

I see. I think that needs fixing then. Would you mind opening an issue for this and we can work on that in a separate PR?

@asomoza
Copy link
Member Author

asomoza commented Mar 3, 2024

I see. I think that needs fixing then. Would you mind opening an issue for this and we can work on that in a separate PR?

It was an obvious mistake on my part, since both of those use a different image encoder and I was using them in combination with the normal ones, that was the error, I can't mix the adapters that use different image encoders since we load one for all of them.

They work as expected if I use them alone.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

@yiyixuxu yiyixuxu merged commit 001b140 into huggingface:main Mar 3, 2024
15 checks passed
@asomoza asomoza deleted the fix-ip-adapter-plus-embeds branch March 5, 2024 05:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How can I get correct ip adapter image embeds? I got 4D tensors and I cannnot use it.
4 participants