Visual Style Prompt Learning Using Diffusion Models for Blind Face Restoration (VSPBFR) Pattern Recognition 2025
Official PyTorch implementation of VSPBFR.
Pattern Recognition 2025 π₯π₯π₯ [arXiv] [paper] [Hugging Face model card]
Corresponding packages are in the requirements.txt file.
- Note that other versions of PyTorch (e.g., higher than 1.7) also work well, but you have to install the corresponding CUDA version.
- Tip: please make sure that CUDA, CUDNN, and PyTorch are aligned well with each other; see here PyTorch.
git clone
cd VSPBFR
conda create -n VSPBFR
conda activate VSPBFR
pip install -r requirements.txt
- Training and testing codes
- Pre-trained models
- Training data: our model was trained with FFHQ, which was attained from the FFHQ repository. The original size of the images in FFHQ are 1024x1024. We resize them to 512x512 with bilinear interpolation in our work.
- Test data: CelebA-Test, LFW-Test, WebPhoto-Test, and CelebChild-Test.
- Utilize all images for training. The folder structure of training and testing data is shown below:
root/
train/
xxx.png
...
xxz.png
- Prepare pre-trained checkpoints: Arcface.pth, style_encoder_decoder.pt (put models in ./pre-train), code_diffuser.pt (put models in ./pre-train), restoration_net.pt (put models in ./pre-train)
- We utilize e4e to train our style encoder, for more details, please refer to e4e_trainer.
- We have prepared the pre-trained checkpoints for you: style_encoder_decoder.pt (put models in ./pre-train)
python code_diffuser_train.py
--path [training img folder]
--psp_checkpoint_path [style encoder checkpoint]
--arcface_path [Arcface checkpoint]
python restoration_train.py
--path [training img folder]
--batch 4
--psp_checkpoint_path [style encoder checkpoint]
--arcface_path [Arcface checkpoint]
--ddpm_ckpt [code diffuser checkpoint]
--size 512
--percept_loss_weight 0.5
--iter 500000
- Prepare pre-trained checkpoints: Arcface.pth, style_encoder_decoder.pt (put models in ./pre-train), code_diffuser.pt (put models in ./pre-train), restoration_net.pt (put models in ./pre-train)
Note: Training the code diffuser will generate a ./checkpoint/recent_code_diffuser.pt
file, which is used for --ddpm_ckpt
.
python restoration_test.py
--ckpt_root [restoration network checkpoint]
--lq_data_list [low-quality image data path list. For example: ./patha,./pathb,...]
--hq_data_list [high-quality image data path list. For example: ./patha,None,... ] (if there is no ground truth images, just put None)
--data_name_list [dataname list. For example: dataset_name_a, dataset_name_b,...]
--psp_checkpoint_path [style encoder checkpoint]
--arcface_path [Arcface checkpoint]
--ddpm_ckpt [code diffuser checkpoint]
--size 512
--batch 4
- If you find our code useful, please cite our paper:
@article{LU2025111312, title = {Visual style prompt learning using diffusion models for blind face restoration}, journal = {Pattern Recognition}, volume = {161}, pages = {111312}, year = {2025}, issn = {0031-3203}, doi = {https://doi.org/10.1016/j.patcog.2024.111312}, url = {https://www.sciencedirect.com/science/article/pii/S003132032401063X}, author = {Wanglong Lu and Jikai Wang and Tao Wang and Kaihao Zhang and Xianta Jiang and Hanli Zhao}, keywords = {Denoising diffusion probabilistic models, Generative adversarial networks, Blind face restoration}, abstract = {Blind face restoration aims to recover high-quality facial images from various unidentified sources of degradation, posing significant challenges due to the minimal information retrievable from the degraded images. Prior knowledge-based methods, leveraging geometric priors and facial features, have led to advancements in face restoration but often fall short of capturing fine details. To address this, we introduce a visual style prompt learning framework that utilizes diffusion probabilistic models to explicitly generate visual prompts within the latent space of pre-trained generative models. These prompts are designed to guide the restoration process. To fully utilize the visual prompts and enhance the extraction of informative and rich patterns, we introduce a style-modulated aggregation transformation layer. Extensive experiments and applications demonstrate the superiority of our method in achieving high-quality blind face restoration.} }
Incorporating our restoration method significantly improves facial landmark detection and ace emotion recognition by enhancing the clarity of facial features in the restored images.
- EXE-GAN: EXE-GAN
- pSp encoder: pSp
- RestoreFormer++: RestoreFormer++