Structure Inference Net: Object Detection Using Scene-level Context and Instance-level Relationships. In CVPR 2018.(
Requirements for Tensorflow 1.3.0 (see: Tensorflow)
Python packages you might not have:
- Clone the SIN repository
# Make sure to clone with --recursive
git clone --recursive
- Build the Cython modules
cd $SIN_ROOT/lib
After successfully completing basic installation, you'll be ready to run the demo.
Wait ...
Download the training, validation, test data and VOCdevkit
wget wget wget
Extract all of these tars into one directory named
tar xvf VOCtrainval_06-Nov-2007.tar tar xvf VOCtest_06-Nov-2007.tar tar xvf VOCdevkit_08-Jun-2007.tar
It should have this basic structure
$VOCdevkit/ # development kit $VOCdevkit/VOCcode/ # VOC utility code $VOCdevkit/VOC2007 # image sets, annotations, etc. # ... and several other directories ...
Create symlinks for the PASCAL VOC dataset
cd $SIN_ROOT/data ln -s $VOCdevkit VOCdevkit
Download the pre-trained ImageNet models [Google Drive] [Dropbox]
mv VGG_imagenet.npy $SIN_ROOT/data/pretrain_model/VGG_imagenet.npy
[optional] Set learning rate and max iter
vim experiments/scripts/ # ITERS vim lib/fast/ # LR cd lib # if you edit the code, make best make
Set your GPU id, then run script to train and test model
Test your dataset
AP for aeroplane = 0.7853
AP for bicycle = 0.8045
AP for bird = 0.7456
AP for boat = 0.6657
AP for bottle = 0.6144
AP for bus = 0.8424
AP for car = 0.8663
AP for cat = 0.8894
AP for chair = 0.5803
AP for cow = 0.8466
AP for diningtable = 0.7171
AP for dog = 0.8578
AP for horse = 0.8626
AP for motorbike = 0.7802
AP for person = 0.7857
AP for pottedplant = 0.4869
AP for sheep = 0.7599
AP for sofa = 0.7351
AP for train = 0.8199
AP for tvmonitor = 0.7683
Mean AP = 0.7607
Yong Liu, Ruiping Wang, Shiguang Shan, and Xilin Chen. Structure Inference Net: Object Detection Using Scene-level Context and Instance-level Relationships. In CVPR 2018.