Skip to content

Latest commit

 

History

History
162 lines (132 loc) · 8.57 KB

README.md

File metadata and controls

162 lines (132 loc) · 8.57 KB

Forks Stargazers Issues

HOIGen

Official code of ACM MM2024 paper- Unseen No More: Unlocking the Potential of CLIP for Generative Zero-shot HOI Detection.paper. 产品截图

Dataset

Follow the process of UPT.

The downloaded files should be placed as follows. Otherwise, please replace the default path to your custom locations.

|- HOIGen
|   |- hicodet
|   |   |- hico_20160224_det
|   |       |- annotations
|   |       |- images
:   :      

Dependencies

  1. Follow the environment setup in UPT.

  2. Our code is built upon CLIP. Install the local package of CLIP:

cd CLIP && python setup.py develop && cd ..
  1. Download the CLIP weights to checkpoints/pretrained_clip.
|- HOIGen
|   |- checkpoints
|   |   |- pretrained_clip
|   |       |- ViT-B-16.pt
:   :      
  1. Download the weights of DETR and put them in checkpoints/.
Dataset DETR weights
HICO-DET weights
|- HOIGen
|   |- checkpoints
|   |   |- detr-r50-hicodet.pth
:   :   :

Pre-extracted Features

Download the pre-extracted features from HERE. The downloaded files have to be placed as follows.

|- HOIGen
|   |- hicodet_pkl_files
|   |   |- union_embeddings_cachemodel_crop_padding_zeros_vitb16.p
:   :      

Training and Testing

Feature Generation

If you want to train the feature generator yourself, process the image and run the following code, otherwise load the weights we provide and put them in checkpoints/.

python main_coop_vae.py --data hoi_data/human_data/object_data
python finetune_ship.py --data hoi_data/human_data/object_data

HICO-DET

Fully-supervised:

python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt 
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --eval --resume CKPT_PATH

UC:

python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type uc0/uc1/uc2/uc3/uc4 --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type uc0/uc1/uc2/uc3/uc4 --eval --resume CKPT_PATH

RF-UC:

python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type rare_first --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type rare_first --eval --resume CKPT_PATH

NF-UC:

python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type non_rare_first --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type non_rare_first --eval --resume CKPT_PATH

UV:

python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type unseen_verb --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type unseen_verb --eval --resume CKPT_PATH

UO:

python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type unseen_object --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type unseen_object --eval --resume CKPT_PATH

Model Zoo

Setting Full Seen Unseen Weights
UC 33.44 34.23 30.26 weights
RF-UC 33.86 34.57 31.01 weights
NF-UC 33.08 32.86 33.98 weights
UO 33.48 32.90 36.35 weights
UV 32.34 34.31 20.27 weights

Citation

If you find our paper and/or code helpful, please consider citing:

@inproceedings{
guo2024unseen,
title={Unseen No More: Unlocking the Potential of {CLIP} for Generative Zero-shot {HOI} Detection},
author={Yixin Guo and Yu Liu and Jianghao Li and Weimin Wang and Qi Jia},
booktitle={ACM Multimedia 2024},
year={2024},
url={https://openreview.net/forum?id=mAQ2fK2myX}
}

Acknowledgement

We gratefully thank the authors from UPT, ADA-CM, SHIP and CaFo for open-sourcing their code.

Tips

Since in order to open source the code as soon as possible, there is a lot of redundancy in the code and there will be some bugs, which I will update and fix in subsequent releases.