Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I get correct ip adapter image embeds? I got 4D tensors and I cannnot use it. #7168

Closed
dai-ichiro opened this issue Mar 1, 2024 · 6 comments · Fixed by #7189
Closed
Labels
bug Something isn't working

Comments

@dai-ichiro
Copy link

dai-ichiro commented Mar 1, 2024

Describe the bug

IP Adapter image embed should be 3D tensors. But I got 4D tensors.

Reproduction

import torch
from diffusers import AutoPipelineForText2Image, DDIMScheduler
from diffusers.utils import load_image

pipeline = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16"
)
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
pipeline.load_ip_adapter(
    "h94/IP-Adapter",
    subfolder="sdxl_models",
    weight_name=[
        "ip-adapter-plus_sdxl_vit-h.safetensors",
        "ip-adapter-plus-face_sdxl_vit-h.safetensors"
    ] ,
    image_encoder_folder="models/image_encoder"
)
pipeline.set_ip_adapter_scale([0.7, 0.3])
pipeline.enable_model_cpu_offload()

face_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/women_input.png")
style_folder = "https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/style_ziggy"
style_images = [load_image(f"{style_folder}/img{i}.png") for i in range(10)]

image_embeds = pipeline.prepare_ip_adapter_image_embeds(
    ip_adapter_image=[style_images, face_image],
    ip_adapter_image_embeds=None,
    device="cuda",
    num_images_per_prompt=1,
    do_classifier_free_guidance=True
)
torch.save(image_embeds, "image_embeds.ipadpt")

print(f"type: {type(image_embeds)}")
print(f"len: {len(image_embeds)}")
for embeds in image_embeds:
    print(f"shape: {embeds.shape}")

outputs is

type: <class 'list'>
len: 2
shape: torch.Size([2, 10, 257, 1280])
shape: torch.Size([2, 1, 257, 1280])

3D tensors is preferred, but 4D can be obtained. And I cannot use it.

import torch
from diffusers import AutoPipelineForText2Image, DDIMScheduler

pipeline = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16"
)

pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)

pipeline.load_ip_adapter(
    "h94/IP-Adapter",
    subfolder="sdxl_models",
    weight_name=[
        "ip-adapter-plus_sdxl_vit-h.safetensors",
        "ip-adapter-plus-face_sdxl_vit-h.safetensors"
    ],
    image_encoder_folder=None
)
pipeline.set_ip_adapter_scale([0.7, 0.8])

pipeline.to("cuda")

image_embeds_fromfile =  torch.load("image_embeds.ipadpt")

generator = torch.Generator(device="cpu").manual_seed(2024)
image = pipeline(
    prompt="a woman",
    ip_adapter_image_embeds=image_embeds_fromfile,
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality", 
    num_inference_steps=50,
    guidance_scale = 0,
    num_images_per_prompt=1,
    generator=generator,
).images[0]
image.save("result_from_image_embeds.png")

Logs

ValueError: `ip_adapter_image_embeds` has to be a list of 3D tensors but is 4D

System Info

  • diffusers version: 0.27.0.dev0
  • Platform: Windows-10-10.0.22631-SP0
  • Python version: 3.11.6
  • PyTorch version (GPU?): 2.2.0+cu118 (True)
  • Huggingface_hub version: 0.21.3
  • Transformers version: 4.38.1
  • Accelerate version: 0.27.2
  • xFormers version: not installed
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Who can help?

@sayakpaul
@yiyixuxu

@dai-ichiro dai-ichiro added the bug Something isn't working label Mar 1, 2024
@asomoza
Copy link
Member

asomoza commented Mar 1, 2024

I just noticed that I didn't test the embeds with the PLUS versions, this issue is because the shapes are different for those, in the meantime the embeds will work only with the normal IP Adapter.

@sayakpaul
Copy link
Member

Hmm the examples here (#7016) are all 3D tensors. Did we expect to support Plus @yiyixuxu?

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Mar 1, 2024

@sayakpaul
yes we do and it's a bug i made

@asomoza
Copy link
Member

asomoza commented Mar 2, 2024

@yiyixuxu

I have a fix for this since I was using them for my post and wanted to try the latest changes, should I create a PR?

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Mar 2, 2024

@asomoza
yes sure!

@elismasilva
Copy link
Contributor

how can i convert an embed 4d to 3d tensor embed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants