Stable Flow: Vital Layers for Training-Free Image Editing

Stable Flow: Vital Layers for Training-Free Image Editing

Omri Avrahami, Or Patashnik, Ohad Fried, Egor Nemchinov, Kfir Aberman, Dani Lischinski, Daniel Cohen-Or

Abstract: Diffusion models have revolutionized the field of content synthesis and editing. Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT), and employed flow-matching for improved training and sampling. However, they exhibit limited generation diversity. In this work, we leverage this limitation to perform consistent image edits via selective injection of attention features. The main challenge is that, unlike the UNet-based models, DiT lacks a coarse-to-fine synthesis structure, making it unclear in which layers to perform the injection. Therefore, we propose an automatic method to identify "vital layers" within DiT, crucial for image formation, and demonstrate how these layers facilitate a range of controlled stable edits, from non-rigid modifications to object addition, using the same mechanism. Next, to enable real-image editing, we introduce an improved image inversion method for flow models. Finally, we evaluate our approach through qualitative and quantitative comparisons, along with a user study, and demonstrate its effectiveness across multiple applications.

A training-free editing method that is able to perform various types of image editing operations, including non-rigid editing, object addition, object removal, and global scene editing. These different edits are done using the same mechanism.

Applications

Incremental Editing

Consistent Style

Text Editing

Installation

Install the conda virtual environment:

conda env create -f environment.yml
conda activate stable-flow

You may need to update to your own pytorch-cuda version.

Usage

Generated Images Editing

You need to provide a list of prompts, where the first prompt describes the original scene, and the rest of the prompts describe the edited scene. For example:

[
    "A photo of a dog in standing the street",
    "A photo of a dog sitting in the street",
    "A photo of a dog in standing and wearing a straw hat the street",
    "A photo of a mink",
]

Then, you can generate a batch of the image and its edited versions using:

python run_stable_flow.py \
--hf_token YOUR_PERSONAL_HUGGINGFACE_TOKEN \
--prompts "A photo of a dog in standing the street" \
"A photo of a dog sitting in the street" \
"A photo of a dog in standing and wearing a straw hat the street" \
"A photo of a mink"

where YOUR_PERSONAL_HUGGINGFACE_TOKEN is your personal HuggingFace user access token.

Then, the results will be saved to outputs/result.jpg, or any other path you specify under --output_path.

Low GPU VRAM Usage

All the experiments in our paper were conducted using a single NVIDIA V100 GPU of 80GB VRAM. If you are using a GPU with smaller VRAM and getting CUDA out of memory errors, you can use CPU offloading by adding the --cpu_offload flag. It will reduce memory consumption but it will increase the inference time significantly.

Real Images Editing (Optional)

If you want to edit a real image, you still need to provide a list of prompts, where the first prompt describes the input image and the rest of the prompts describe the edited scene. For example:

python run_stable_flow.py \
--hf_token YOUR_PERSONAL_HUGGINGFACE_TOKEN \
--input_img_path inputs/bottle.jpg \
--prompts "A photo of a bottle" \
"A photo of a bottle next to an apple"

where YOUR_PERSONAL_HUGGINGFACE_TOKEN is your personal HuggingFace user access token, and --input_img_path is the path to the input image.

Then, the results will be saved to outputs/result.jpg, or any other path you specify under --output_path.

Citation

If you find this useful for your research, please cite the following:

@misc{avrahami2024stableflow,
    title={Stable Flow: Vital Layers for Training-Free Image Editing}, 
    author={Omri Avrahami and Or Patashnik and Ohad Fried and Egor Nemchinov and Kfir Aberman and Dani Lischinski and Daniel Cohen-Or},
    year={2024},
    eprint={2411.14430},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2411.14430}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
inputs		inputs
src/diffusers		src/diffusers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
run_stable_flow.py		run_stable_flow.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stable Flow: Vital Layers for Training-Free Image Editing

Applications

Incremental Editing

Consistent Style

Text Editing

Installation

Usage

Generated Images Editing

Low GPU VRAM Usage

Real Images Editing (Optional)

Citation

About

Releases

Packages

Languages

License

snap-research/stable-flow

Folders and files

Latest commit

History

Repository files navigation

Stable Flow: Vital Layers for Training-Free Image Editing

Applications

Incremental Editing

Consistent Style

Text Editing

Installation

Usage

Generated Images Editing

Low GPU VRAM Usage

Real Images Editing (Optional)

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages