This repository implements the Semantic Data Augmentation (SDA) technique for Animal 2D Pose Estimation. The SDA plugin is based on the "Adversarial Semantic Data Augmentation for Human Pose Estimation" paper [1]. Check the Video for more explanations.
The SDA Plugin works by iterating over the dataset and applying cropping to different body parts using keypoint data. Each body part is masked and extracted individually, although the quality of the cropped body parts may not be perfect. Random rotation and scaling are also applied to the individual body parts. The positioning of the body parts is currently random, but in the future, Adversarial Positioning will be implemented. Adversarial Positioning aims to add leg parts next to the ground truth legs to confuse the model and improve its generalization capabilities.
In the original paper [1], the technique was applied to a top-down single-person detection network. However, in OpenPifPaf, which is a bottom-up multi-person detection model, modifications are made to the ground truth of the samples. The keypoints of the added body parts are included in the ground truth to incorporate them into the original image.
Therefore, during the cropping phase, we extract the bodyparts and their local keypoints, which are then added to the samples and ground truth, randomly, during training.
The main goal of this approach is to enhance the model's robustness to occlusion, even when one animal occludes another.
The Plugin has been tested on Paperspace.com, training the OpenPifPaf model for 200 epochs, with the following hyperparameters :
- Learning Rate : 0.0003
- Momentum : 0.95
- Batch Size : 4 (or 8 depending on the available GPU's on Paperspace)
- Learning Rate Warm Start at 200 with shufflenetv2k30 checkpoint
python3 -m openpifpaf.train --lr=0.0003 --momentum=0.95 --clip-grad-value=10.0 --b-scale=10.0 --batch-size=8 --loader-workers=12 --epochs=600 --lr-decay 280 300 --lr-decay-epochs=10 --val-interval 5 --checkpoint=shufflenetv2k30 --lr-warm-up-start-epoch=200 --dataset=custom_animal --animal-square-edge=385 --animal-upsample=2 --animal-bmin=2 --animal-extended-scale --weight-decay=1e-5
The dataset used is the Cross-domain Adaptation For Animal Pose Estimation. The test.py script downloads, unzips, converts the dataset into COCO format and splits the data. Then, the SDA module crops the dataset into bodyparts and finally the training process is launched.
python3 train.py
The training was conducted on Paperspace. The Virtual Machine can be access here : Paperspace VM
Depending on the session, OpenPifPaf and PyCocoTools need to be installed :
$ pip install openpifpaf & pip install pycocotools
Then by running the train.py script, we can execute the whole pipeline up until training:
def main():
# 1. Download dataset
download_dataset()
# 2. Convert to COCO format
adapt_to_coco()
# 3. Split data into train and val
split_data()
# 4. Initialize SDA and crop the dataset, creating a body part pool
sda = SDA()
sda.crop_dataset()
# 5. Configure plugins
config = openpifpaf.plugin.register()
# 6. Train the model
subprocess.run(train_cmd, shell=True)
Bodyparts look like this:
And their Masks look like this:
Which then allows us to extract them with less background than just getting the contours. The SDA augmentation result looks like this:
Some Results are better than others...
The SDA plugin had three different versions. All of the models had a learning rate warm up starting from the shufflenetv2k30 weights at epoch 200 :
- No SDA: Normal training without augmentation.
- SDAv1: Aggressive augmentation. No check for Image/bodypart size ratio, 3 bodyparts added per sample, no ground truth change, so the bodyparts added were purely visual, they do not modify the ground truth of the samples. Hard on the model, performance was not great at all.
- SDAv2: Lighter augmentation. Image/bodypart size ratio check to avoid too much occlusion that would not make sense in the real life deployement, 3 bodyparts added per sample,. No ground truth modification either, not ideal for bottom-up PAF model's such as OpenPifPaf.
- SDAv3: Lighter augmentation. Image/bodypart size ratio check, 3 bodyparts per sample added. Ground truth is modified to include the keypoints of the cropped bodyparts to it, relative to the sample's origin axis.
The statistics are:
- AP: Average Precision
- AR: Average Recall
The different model checkpoints and statistics can be found in this google drive
[https://drive.google.com/drive/folders/1b5Vhk8N5ZT8sN4ZXlk3D1lu7IP6pWe_i?usp=sharing](Animal Pose Estimation)
[1] Yanrui Bin, Xuan Cao, Xinya Chen, Yanhao Ge, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Changxin Gao, and Nong Sang. (2020). Adversarial Semantic Data Augmentation for Human Pose Estimation. arXiv preprint arXiv:2008.00697. Link to Paper