NEW! Check out our most recent work on transformer-based HOI detection here.
This repository contains the official PyTorch implementation for ICCV 2021 paper
Frederic Z. Zhang, Dylan Campbell and Stephen Gould. Spatially Conditioned Graphs for Detecting Human-Object Interactions. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 13319-13327, October 2021.
[paper] [supp] [preprint] [video]
If you find this repository useful for your research, please kindly cite our paper:
@inproceedings{zhang2021scg,
author = {Frederic Z. Zhang, Dylan Campbell and Stephen Gould},
title = {Spatially Conditioned Graphs for Detecting Human–Object Interactions},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2021},
pages = {13319-13327}
}
- Download the repository with
git clone https://github.com/fredzzhang/spatially-conditioned-graphs
- Install the lightweight deep learning library Pocket
- Make sure the environment you created for Pocket is activated. You are good to go!
To generate qualitative results shown in the paper, please follow instructions in the diagnosis package at spatially-conditioned-graphs/diagnosis/
.
The HICO-DET and V-COCO repos have been incorporated as submodules for convenience. To download relevant data utilities, run the following commands.
cd /path/to/spatially-conditioned-graphs
git submodule init
git submodule update
- Download the HICO-DET dataset
- If you have not downloaded the dataset before, run the following script
cd /path/to/spatially-conditioned-graphs/hicodet bash download.sh
- If you have previously downloaded the dataset, simply create a soft link
cd /path/to/spatially-conditioned-graphs/hicodet ln -s /path/to/hico_20160224_det ./hico_20160224_det
- Run a Faster R-CNN pre-trained on MS COCO to generate detections
cd /path/to/spatially-conditioned-graphs/hicodet/detections
python preprocessing.py --partition train2015
python preprocessing.py --partition test2015
- Download fine-tuned detections
cd /path/to/spatially-conditioned-graphs/download
bash download_finetuned_detections.sh
- Generate ground truth detections (optional)
cd /path/to/spatially-conditioned-graphs/hicodet/detections
python generate_gt_detections.py --partition test2015
- Download the
train2014
andval2014
partitions of the COCO dataset- If you have not downloaded the dataset before, run the following script
cd /path/to/spatially-conditioned-graphs/vcoco bash download.sh
- If you have previously downloaded the dataset, simply create a soft link. Note that
cd /path/to/spatially-conditioned-graphs/vcoco ln -s /path/to/coco ./mscoco2014
- Run a Faster R-CNN pre-trained on MS COCO to generate detections
cd /path/to/spatially-conditioned-graphs/vcoco/detections
python preprocessing.py --partition trainval
python preprocessing.py --partition test
- Download the checkpoint of our trained model
cd /path/to/spatially-conditioned-graphs/download
bash download_checkpoint.sh
- Test a model
cd /path/to/spatially-conditioned-graphs
CUDA_VISIBLE_DEVICES=0 python test.py --model-path checkpoints/scg_1e-4_b32h16e7_hicodet_e2e.pt
By default, detections from a pre-trained detector is used. To change sources of detections, use the argument --detection-dir
, e.g. --detection-dir hicodet/detections/test2015_gt
to select ground truth detections. Fine-tuned detections (if you downloaded them) are available under hicodet/detections
.
- Cache detections for Matlab evaluation following HO-RCNN (optional)
cd /path/to/spatially-conditioned-graphs
CUDA_VISIBLE_DEVICES=0 python cache.py --model-path checkpoints/scg_1e-4_b32h16e7_hicodet_e2e.pt
By default, 80 .mat
files, one for each object class, will be cached in a directory named matlab
. Use the --cache-dir
argument to change the cache directory. To change sources of detections, refer to the use of --detection-dir
in the previous section.
As a reference, the performance of the provided model is shown in the table below.
Detections | Default Setting | Known Object Setting |
---|---|---|
Pre-trained on MS COCO | (21.85 , 18.11 , 22.97 ) |
(25.53 , 21.79 , 26.64 ) |
* |
31.33 , 24.72 , 33.31 ) |
34.37 , 27.18 , 36.52 ) |
Fine-tuned DETR-R101 (here) | (29.26 , 24.61 , 30.65 ) |
(32.87 , 27.89 , 34.35 ) |
Ground truth detections | (51.53 , 41.02 , 54.67 ) |
(51.75 , 41.40 , 54.84 ) |
*The detections provided by the DRG repo were produced by a Cascaded R-CNN with ResNeXt-152 backbone, which is not directly comparable to the commonly used object detectors in the literature.
We did not implement evaluation utilities for V-COCO, and instead use the utilities provided by Gupta. To generate the required pickle file, run the following script by correctly specifying the path to a model with --model-path
cd /path/to/spatially-conditioned-graphs
CUDA_VISIBLE_DEVICES=0 python cache.py --dataset vcoco --data-root vcoco \
--detection-dir vcoco/detections/test \
--cache-dir vcoco_cache --partition test \
--model-path /path/to/a/model
This will generate a file named vcoco_results.pkl
under vcoco_cache
in the current directory. Please refer to the v-coco repo (not to be confused with vcoco, the submodule) for further instructions. Note that loading the pickle file requires a particular class CacheTemplate
, which is shown below in its entirety.
from collections import defaultdict
class CacheTemplate(defaultdict):
"""A template for VCOCO cached results """
def __init__(self, **kwargs):
super().__init__()
for k, v in kwargs.items():
self[k] = v
def __missing__(self, k):
seg = k.split('_')
# Assign zero score to missing actions
if seg[-1] == 'agent':
return 0.
# Assign zero score and a tiny box to missing <action,role> pairs
else:
return [0., 0., .1, .1, 0.]
You can either add it into the evaluation code or save it as a seperate file to import from.
cd /path/to/spatially-conditioned-graphs
python main.py --world-size 8 --cache-dir checkpoints/hicodet &>log &
Specify the number of GPUs to use with the argument --world-size
. The default sub-batch size is 4
(per GPU). The provided model was trained with 8 GPUs, with an effective batch size of 32
. Reducing the effective batch size could result in slightly inferior performance. The default learning rate for batch size of 32 is 0.0001
. As a rule of thumb, scale the learning rate proportionally when changing the batch size, e.g. 0.00005
for batch size of 16
. It is recommended to redirect stdout
and stderr
to a file to save the training log (as indicated by &>log
). To check the progress, run cat log | grep mAP
, or alternatively you can go through the log with vim log
. Also, the mAP logged follows a slightly different protocol. It does NOT necessarily correlate with the mAP that the community reports. It only serves as a diagnostic tool. The true performance of the model requires running a seperate test as shown in the previous section. By default, checkpoints will be saved under checkpoints
in the current directory. For more arguments, run python main.py --help
to find out. We follow the early stopping training strategy, and have concluded (using a validation set split from the training set) that the model at epoch 7
should be picked. Training on 8 GeForce GTX TITAN X devices takes about 5
hours.
cd /path/to/spatially-conditioned-graphs
python main.py --world-size 8 \
--dataset vcoco --partitions trainval val --data-root vcoco \
--train-detection-dir vcoco/detections/trainval \
--val-detection-dir vcoco/detections/trainval \
--print-interval 20 --cache-dir checkpoints/vcoco &>log &
If you have any questions regarding our paper or the repo, please post them in discussions. If you ran into issues related to the code, feel free to open an issue. Alternatively, you can contact me at frederic.zhang@anu.edu.au