We propose MM-Diff, a unified and tuning-free image personalization framework capable of generating high-fidelity images of both single and multiple subjects in seconds. On the left, the vision-augmented text embeddings and a small set of detail-rich subject embeddings are injected into the diffusion model through the well-designed multi-modal cross-attention. On the right, we illustrate the details of the innovative implementation of cross-attention with LoRAs, as well as the attention constraints that facilitate multi-subject generation.
conda create -n mmdiff python=3.9
conda activate mmdiff
pip install -r requirements.txt
We provide the pretrained checkpoints. One can download and put them in the root path of the current project. To run the demo, you should also download the following models:
- stabilityai/stable-diffusion-xl-base-1.0
- madebyollin/sdxl-vae-fp16-fix
- openai/clip-vit-large-patch14
We provide the demo code for training data annotation in data_annotation. To avoid package conflicts, it is best to configure a new conda or docker environment.
python data_labeling_imagenet.py --data_path="path_to_data"
Currently, we provide two ways to customize your images as follows. We also provide some reference images in demo_data.
- mmdiff_demo, image generation with single reference image.
- mmdiff_multiple_reference_demo, image generation with multiple reference images.
- mmdiff_id_mixing_demo, image generation with identity mixing.
python mmdiff_gradio_demo.py
- [2024/05/30] Fuse lora weights into orignal weights to improve inference speed.
- [2024/05/29] Release an enhanced version of MM-Diff for portrait generation, employing face embeddings to improve subject fidelity.
If you find MM-Diff useful for your research, please cite our paper:
@article{wei2024mm,
title={MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration},
author={Wei, Zhichao and Su, Qingkun and Qin, Long and Wang, Weizhi},
journal={arXiv preprint arXiv:2403.15059},
year={2024}
}
This code is built on some excellent repos, including diffusers, FastComposer, PhotoMaker and IP-Adapter. Thanks for their great work!