Skip to content

Commit

Permalink
updated documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
matt3o committed Nov 14, 2023
1 parent 4cb9e97 commit 4995dfc
Show file tree
Hide file tree
Showing 2 changed files with 360 additions and 306 deletions.
17 changes: 10 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ The IPAdapter are very powerful models for image-to-image conditioning. Given a

**:rocket: [Advanced features video](https://www.youtube.com/watch?v=mJQ62ly7jrg)**

**:japanese_goblin: [Attention Masking](https://www.youtube.com/watch?v=vqG1VXKteQg)**

## Installation

Download or git clone this repository inside `ComfyUI/custom_nodes/` directory.
Expand All @@ -47,6 +49,7 @@ For SD1.5 you need:
- [ip-adapter_sd15_light.bin](https://huggingface.co/h94/IP-Adapter/blob/main/models/ip-adapter_sd15_light.bin), use this when text prompt is more important than reference images
- [ip-adapter-plus_sd15.bin](https://huggingface.co/h94/IP-Adapter/resolve/main/models/ip-adapter-plus_sd15.bin)
- [ip-adapter-plus-face_sd15.bin](https://huggingface.co/h94/IP-Adapter/resolve/main/models/ip-adapter-plus-face_sd15.bin)
- [ip-adapter-full-face_sd15.bin](https://huggingface.co/h94/IP-Adapter/resolve/main/models/ip-adapter-full-face_sd15.bin)

For SDXL you need:
- [ip-adapter_sdxl.bin](https://huggingface.co/h94/IP-Adapter/resolve/main/sdxl_models/ip-adapter_sdxl.bin)
Expand Down Expand Up @@ -81,7 +84,7 @@ Basically the IPAdapter sends two pictures for the conditioning, one is the refe
What I'm doing is to send a very noisy image instead of an empty one. The `noise` parameter determines the amount of noise that is added. A value of `0.01` adds a lot of noise (more noise == less impact becaue the model doesn't get it); a value of `1.0` removes most of noise so the generated image gets conditioned more.
</details>

### IMPORTANT: Preparing the reference image
### Preparing the reference image

The reference image needs to be encoded by the CLIP vision model. The encoder resizes the image to 224×224 **and crops it to the center!**. It's not an IPAdapter thing, it's how the clip vision works. This means that if you use a portrait or landscape image and the main attention (eg: the face of a character) is not in the middle you'll likely get undesired results. Use square pictures as reference for more predictable results.

Expand All @@ -93,11 +96,9 @@ In the image below you can see the difference between prepped and not prepped im

### KSampler configuration suggestions

The IPAdapter generally requires a few more `steps` than usual, if the result is underwhelming try to add 10+ steps. `ddmin`, `ddpm` and `euler` seem to perform better than others.

The model tends to burn the images a little. If needed lower the CFG scale.
The IPAdapter generally requires a few more `steps` than usual, if the result is underwhelming try to add 10+ steps. The model tends to burn the images a little. If needed lower the CFG scale.

The SDXL models are weird but the `noise` option sometimes helps.
The `noise` option generally grants better results, experiment with it.

### IPAdapter + ControlNet

Expand All @@ -111,6 +112,8 @@ IPAdapter offers an interesting model for a kind of "face swap" effect. [The wor

<img src="./examples/face_swap.jpg" width="50%" alt="face swap" />

**Note:** there's a new `full-face` model available that's arguably better.

### Masking

The most effective way to apply the IPAdapter to a region is by an [inpainting workflow](./examples/IPAdapter_inpaint.json). Remeber to use a specific checkpoint for inpainting otherwise it won't work. Even if you are inpainting a face I find that the *IPAdapter-Plus* (not the *face* one), works best.
Expand Down Expand Up @@ -167,9 +170,9 @@ In the examples directory you'll find a couple of masking workflows: [simple](ex

You are using an old version of ComfyUI. Update and you'll be fine. **Please note** that on Windows for a full update you might need to re-download the latest standalone version.

**Error with Tensor size mismatch**
**size mismatch for proj_in.weight: copying a param with shape torch.Size([..., ...]) from checkpoint, the shape in current model is torch.Size([..., ...])**

You are using the wrong CLIP encoder+IPAdapter Model+Checkpoint combo. Remember that you need to select the CLIP encoder v1.5 for all v1.5 IPAdapter models AND for all models ending with `vit-h` (even if they are for SDXL).
You are using the wrong image encoder+IPAdapter Model+Checkpoint combo. Remember that you need to select the CLIP encoder v1.5 for all v1.5 IPAdapter models AND for all models ending with `vit-h` (even if they are for SDXL).

**Is it true that the input reference image must have the same size of the output image?**

Expand Down
Loading

0 comments on commit 4995dfc

Please sign in to comment.