Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation
by Zhengyuan Yang, Jianfeng Wang, Linjie Li, Kevin Lin, Chung-Ching Lin, Zicheng Liu, and Lijuan Wang
Built upon GPT-4V(ision), Idea2Img is a multimodal iterative self-refinement system that enhances any T2I model for automatic image design and generation, enabling various new image creation functionalities togther with better visual qualities.
- Obtain the public OpenAI GPT-4V API key and setup T2I inference accordingly, e.g., SDXL.
-
Clone the repository
git clone https://github.com/zyang-ur/idea2img.git
-
Inference prompts will be read from
--testfile
.<IMG>
is a separator token inserted between image-image and image-text.mkdir output python idea2img_pipeline.py --api_key OAI_GPT4V_Key --testfile testsample.txt --fewshot --select_fewshot
- Generated results and intermediate steps will be saved to
output
folder.
@article{yang2023idea2img,
title={Idea2img: Iterative self-refinement with gpt-4v (ision) for automatic image design and generation},
author={Yang, Zhengyuan and Wang, Jianfeng and Li, Linjie and Lin, Kevin and Lin, Chung-Ching and Liu, Zicheng and Wang, Lijuan},
journal={arXiv preprint arXiv:2310.08541},
year={2023}
}