Official code of ACM MM2024 paper- Unseen No More: Unlocking the Potential of CLIP for Generative Zero-shot HOI Detection.paper.
Follow the process of UPT.
The downloaded files should be placed as follows. Otherwise, please replace the default path to your custom locations.
|- HOIGen
| |- hicodet
| | |- hico_20160224_det
| | |- annotations
| | |- images
: :
-
Follow the environment setup in UPT.
-
Our code is built upon CLIP. Install the local package of CLIP:
cd CLIP && python setup.py develop && cd ..
- Download the CLIP weights to
checkpoints/pretrained_clip
.
|- HOIGen
| |- checkpoints
| | |- pretrained_clip
| | |- ViT-B-16.pt
: :
- Download the weights of DETR and put them in
checkpoints/
.
Dataset | DETR weights |
---|---|
HICO-DET | weights |
|- HOIGen
| |- checkpoints
| | |- detr-r50-hicodet.pth
: : :
Download the pre-extracted features from HERE. The downloaded files have to be placed as follows.
|- HOIGen
| |- hicodet_pkl_files
| | |- union_embeddings_cachemodel_crop_padding_zeros_vitb16.p
: :
If you want to train the feature generator yourself, process the image and run the following code, otherwise load the weights we provide and put them in checkpoints/
.
python main_coop_vae.py --data hoi_data/human_data/object_data
python finetune_ship.py --data hoi_data/human_data/object_data
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type uc0/uc1/uc2/uc3/uc4 --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type uc0/uc1/uc2/uc3/uc4 --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type rare_first --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type rare_first --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type non_rare_first --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type non_rare_first --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type unseen_verb --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type unseen_verb --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type unseen_object --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type unseen_object --eval --resume CKPT_PATH
Setting | Full | Seen | Unseen | Weights |
---|---|---|---|---|
UC | 33.44 | 34.23 | 30.26 | weights |
RF-UC | 33.86 | 34.57 | 31.01 | weights |
NF-UC | 33.08 | 32.86 | 33.98 | weights |
UO | 33.48 | 32.90 | 36.35 | weights |
UV | 32.34 | 34.31 | 20.27 | weights |
If you find our paper and/or code helpful, please consider citing:
@inproceedings{
guo2024unseen,
title={Unseen No More: Unlocking the Potential of {CLIP} for Generative Zero-shot {HOI} Detection},
author={Yixin Guo and Yu Liu and Jianghao Li and Weimin Wang and Qi Jia},
booktitle={ACM Multimedia 2024},
year={2024},
url={https://openreview.net/forum?id=mAQ2fK2myX}
}
We gratefully thank the authors from UPT, ADA-CM, SHIP and CaFo for open-sourcing their code.
Since in order to open source the code as soon as possible, there is a lot of redundancy in the code and there will be some bugs, which I will update and fix in subsequent releases.