Skip to content

[CVPR 2025] FaithDiff for Classic Film Rejuvenation, Old Photo Revival, Social Media Restoration, Image Enhancement and AIGC Enhancement.

Notifications You must be signed in to change notification settings

JyChen9811/FaithDiff

Repository files navigation

(CVPR 2025) FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolution

Hugging Face Models visitors

[Project Page]   [Paper]

Junyang Chen, Jinshan Pan, Jiangxin Dong
IMAG Lab, Nanjing University of Science and Technology

If FaithDiff is helpful for you, please help star the GitHub Repo. Thanks!


🚩 New Features/Updates

  • ✅ April 3, 2025. The code has been integrated into Diffusers. Respect to Eliseu Silva!!!
  • ✅ April 1, 2025. Supports FP8 inference and CPU offloading, significantly reducing memory usage. Thanks Eliseu Silva!!!
  • ✅ March 28, 2025. Update a nice gradio demo.
  • ✅ March 24, 2025. Release the training code.
  • ✅ February 09, 2025. Support ultra-high-resolution (8K and above) image restoration on 24GB GPUs.
  • ✅ February 08, 2025. Release RealDeg. It includes 238 images with unknown degradations, consisting of old photographs, social media images, and classic film stills.
  • ✅ February 07, 2025. Release the testing code and pre-trained model.
  • ✅ November 25, 2024. Creat the repository and the project page.

To do

  • FaithDiff-SD3-Large
  • Release the training code
  • Release the testing code and pre-trained model

📷 Real-World Enhancement Results


🌈 AIGC Enhancement Results


🎁 Gradio Demo

python gradio_demo.py

#### Additional parameters
You can add the following parameters to the gradio application.
```Shell
--cpu_offload = Offloads the weights of the pipeline components to the CPU RAM. If you have a GPU with less than 12GB it would be a good idea to use this parameter.
--use_fp8 = Changes the diffusion model precision from FP16 to FP8, significantly reducing GPU memory requirements. This option in conjunction with **--cpu_offload** will require only 5GB VRAM for a 2x upscale.

# FP8 Inference and CPU offloading
python gradio_demo.py --cpu_offload --use_fp8
# FP8 Inference, CPU offloading and without LLaVA
python gradio_demo.py --cpu_offload --use_fp8 --no_llava

faithdiff


⚡ How to train

Environment

conda env create --name faithdiff -f environment.yml

Training Script

# Stage 1
bash train_stage_1.sh

# After Stage 1 training, enter the checkpoints folder.
cd ./train_FaithDiff_stage_1_offline/checkpoint-6000
python zero_to_fp32.py ./ ./pretrain.bin

# Stage 2
bash train_stage_2.sh

# After Stage 2 training, enter the checkpoints folder.
cd ./train_FaithDiff_stage_2_offline/checkpoint
python zero_to_fp32.py ./ ./FaithDiff.bin

Tips for Human Face data preparation

  • To quickly filter out low-quality data in the FFHQ dataset, we recommend using topiq to assess image quality. Here are the official results. We empirically selected images with a metric above 0.72.
  • During training, we recommend resizing the face image resolution to a range between 768 and 512.
  • If you need to improve the restoration performance of portrait images, Unsplash offers high-quality portrait images. You can search for different clothing names to obtain full-body portrait data.

🚀 How to evaluate

Download Dependent Models

Val Dataset

RealDeg: Google Drive

To evaluate the performance of our method in real-world scenarios, we collect a dataset of 238 images with unknown degradations, consisting of old photographs, social media images, and classic film stills. The category of old photographs includes black-and-white images, faded photographs, and colorized versions. Social media images are uploaded by us to various social media platforms (e.g., WeChat, RedNote, Sina Weibo and Zhihu), undergoing one or multiple rounds of cross-platform processing. The classic film stills are selected from iconic films spanning the 1980s to 2000s, such as The Shawshank Redemption, Harry Potter, and Spider-Man, etc. The images feature diverse content, including people, buildings, animals, and various natural elements. In addition, the shortest side of the image resolution is at least 720 pixels.

Inference Script

# Script that support two GPUs. 
CUDA_VISIBLE_DEVICES=0,1 python test.py --img_dir='./dataset/RealDeg' --save_dir=./save/RealDeg --upscale=2 --guidance_scale=5 --num_inference_steps=20 --load_8bit_llava 

# Scripts that support only one GPU.
CUDA_VISIBLE_DEVICES=0 python test_generate_caption.py --img_dir='./dataset/RealDeg' --save_dir=./save/RealDeg_caption --load_8bit_llava
CUDA_VISIBLE_DEVICES=0 python test_wo_llava.py --img_dir='./dataset/RealDeg' --json_dir=./save/RealDeg_caption --save_dir=./save/RealDeg --upscale=2 --guidance_scale=5 --num_inference_steps=20

# If attempting ultra-high-resolution image restoration, add --use_tile_vae in the scripts. The same applies to test_wo_llava.
CUDA_VISIBLE_DEVICES=0,1 python test.py --img_dir='./dataset/RealDeg' --save_dir=./save/RealDeg --use_tile_vae --upscale=8 --guidance_scale=5 --num_inference_steps=20 --load_8bit_llava 

BibTeX

@inproceedings{chen2024faithdiff,
title={FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolution},
author={Chen, Junyang and Pan, Jinshan and Dong, Jiangxin},
booktitle={CVPR},
year={2025}
}

Contact

If you have any questions, please feel free to reach me out at jychen9811@gmail.com.


Acknowledgments

Our project is based on diffusers, SUPIR, TLC and SimpleTuner. Thanks for their awesome works.