Simple baselines for human pose estimation and tracking

Introduction

[ALGORITHM]

@inproceedings{xiao2018simple,
  title={Simple baselines for human pose estimation and tracking},
  author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
  booktitle={Proceedings of the European conference on computer vision (ECCV)},
  pages={466--481},
  year={2018}
}

Results and models

2d Human Pose Estimation

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_resnet_50	256x192	0.718	0.898	0.795	0.773	0.937	ckpt	log
pose_resnet_50	384x288	0.731	0.900	0.799	0.783	0.931	ckpt	log
pose_resnet_101	256x192	0.726	0.899	0.806	0.781	0.939	ckpt	log
pose_resnet_101	384x288	0.748	0.905	0.817	0.798	0.940	ckpt	log
pose_resnet_152	256x192	0.735	0.905	0.812	0.790	0.943	ckpt	log
pose_resnet_152	384x288	0.750	0.908	0.821	0.800	0.942	ckpt	log

Results on OCHuman test dataset with ground-truth bounding boxes

Following the common setting, the models are trained on COCO train dataset, and evaluate on OCHuman dataset.

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_resnet_50	256x192	0.546	0.726	0.593	0.592	0.755	ckpt	log
pose_resnet_50	384x288	0.539	0.723	0.574	0.588	0.756	ckpt	log
pose_resnet_101	256x192	0.559	0.724	0.606	0.605	0.751	ckpt	log
pose_resnet_101	384x288	0.571	0.715	0.615	0.615	0.748	ckpt	log
pose_resnet_152	256x192	0.570	0.725	0.617	0.616	0.754	ckpt	log
pose_resnet_152	384x288	0.582	0.723	0.627	0.627	0.752	ckpt	log

Results on AIC val set with ground-truth bounding boxes

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_resnet_101	256x192	0.294	0.736	0.174	0.337	0.763	ckpt	log

Results on MHP v2.0 val set

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_resnet_101	256x192	0.583	0.897	0.669	0.636	0.918	ckpt	log

Note that, the evaluation metric used here is mAP (adapted from COCO), which may be different from the official evaluation codes. Please be cautious if you use the results in papers.

Results on MPII val set

Arch	Input Size	Mean	Mean@0.1	ckpt	log
pose_resnet_50	256x256	0.882	0.286	ckpt	log
pose_resnet_101	256x256	0.888	0.290	ckpt	log
pose_resnet_152	256x256	0.889	0.303	ckpt	log

Results on MPII-TRB val set

Arch	Input Size	Skeleton Acc	Contour Acc	Mean Acc	ckpt	log
pose_resnet_50	256x256	0.887	0.858	0.868	ckpt	log
pose_resnet_101	256x256	0.890	0.863	0.873	ckpt	log
pose_resnet_152	256x256	0.897	0.868	0.879	ckpt	log

Results on CrowdPose test with YOLOv3 human detector

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AP (E)	AP (M)	AP (H)	ckpt	log
pose_resnet_50	256x192	0.637	0.808	0.692	0.739	0.650	0.506	ckpt	log
pose_resnet_101	256x192	0.647	0.810	0.703	0.744	0.658	0.522	ckpt	log
pose_resnet_101	320x256	0.661	0.821	0.714	0.759	0.671	0.536	ckpt	log
pose_resnet_152	256x192	0.656	0.818	0.712	0.754	0.666	0.532	ckpt	log

Results on PoseTrack2018 val with ground-truth bounding boxes

Arch	Input Size	Head	Shou	Elb	Wri	Hip	Knee	Ankl	Total	ckpt	log
pose_resnet_50	256x192	86.5	87.5	82.3	75.6	79.9	78.6	74.0	81.0	ckpt	log

The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.

Results on PoseTrack2018 val with MMDetection pre-trained Cascade R-CNN (X-101-64x4d-FPN) human detector

Arch	Input Size	Head	Shou	Elb	Wri	Hip	Knee	Ankl	Total	ckpt	log
pose_resnet_50	256x192	78.9	81.9	77.8	70.8	75.3	73.2	66.4	75.2	ckpt	log

The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.

Results on Sub-JHMDB dataset

The models are pre-trained on MPII dataset only. NO test-time augmentation (multi-scale /rotation testing) is used.

Normalized by Person Size

Split	Arch	Input Size	Head	Sho	Elb	Wri	Hip	Knee	Ank	Mean	ckpt	log
Sub1	pose_resnet_50	256x256	99.1	98.0	93.8	91.3	99.4	96.5	92.8	96.1	ckpt	log
Sub2	pose_resnet_50	256x256	99.3	97.1	90.6	87.0	98.9	96.3	94.1	95.0	ckpt	log
Sub3	pose_resnet_50	256x256	99.0	97.9	94.0	91.6	99.7	98.0	94.7	96.7	ckpt	log
Average	pose_resnet_50	256x256	99.2	97.7	92.8	90.0	99.3	96.9	93.9	96.0	-	-
Sub1	pose_resnet_50 (2 Deconv.)	256x256	99.1	98.5	94.6	92.0	99.4	94.6	92.5	96.1	ckpt	log
Sub2	pose_resnet_50 (2 Deconv.)	256x256	99.3	97.8	91.0	87.0	99.1	96.5	93.8	95.2	ckpt	log
Sub3	pose_resnet_50 (2 Deconv.)	256x256	98.8	98.4	94.3	92.1	99.8	97.5	93.8	96.7	ckpt	log
Average	pose_resnet_50 (2 Deconv.)	256x256	99.1	98.2	93.3	90.4	99.4	96.2	93.4	96.0	-	-

Normalized by Torso Size

Split	Arch	Input Size	Head	Sho	Elb	Wri	Hip	Knee	Ank	Mean	ckpt	log
Sub1	pose_resnet_50	256x256	93.3	83.2	74.4	72.7	85.0	81.2	78.9	81.9	ckpt	log
Sub2	pose_resnet_50	256x256	94.1	74.9	64.5	62.5	77.9	71.9	78.6	75.5	ckpt	log
Sub3	pose_resnet_50	256x256	97.0	82.2	74.9	70.7	84.7	83.7	84.2	82.9	ckpt	log
Average	pose_resnet_50	256x256	94.8	80.1	71.3	68.6	82.5	78.9	80.6	80.1	-	-
Sub1	pose_resnet_50 (2 Deconv.)	256x256	92.4	80.6	73.2	70.5	82.3	75.4	75.0	79.2	ckpt	log
Sub2	pose_resnet_50 (2 Deconv.)	256x256	93.4	73.6	63.8	60.5	75.1	68.4	75.5	73.7	ckpt	log
Sub3	pose_resnet_50 (2 Deconv.)	256x256	96.1	81.2	72.6	67.9	83.6	80.9	81.5	81.2	ckpt	log
Average	pose_resnet_50 (2 Deconv.)	256x256	94.0	78.5	69.9	66.3	80.3	74.9	77.3	78.0	-	-

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Simple baselines for human pose estimation and tracking

Introduction

Results and models

2d Human Pose Estimation

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset

Results on OCHuman test dataset with ground-truth bounding boxes

Results on AIC val set with ground-truth bounding boxes

Results on MHP v2.0 val set

Results on MPII val set

Results on MPII-TRB val set

Results on CrowdPose test with YOLOv3 human detector

Results on PoseTrack2018 val with ground-truth bounding boxes

Results on PoseTrack2018 val with MMDetection pre-trained Cascade R-CNN (X-101-64x4d-FPN) human detector

Results on Sub-JHMDB dataset

Normalized by Person Size

Normalized by Torso Size

Files

README.md

Latest commit

History

README.md

File metadata and controls

Simple baselines for human pose estimation and tracking

Introduction

Results and models

2d Human Pose Estimation

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset

Results on OCHuman test dataset with ground-truth bounding boxes

Results on AIC val set with ground-truth bounding boxes

Results on MHP v2.0 val set

Results on MPII val set

Results on MPII-TRB val set

Results on CrowdPose test with YOLOv3 human detector

Results on PoseTrack2018 val with ground-truth bounding boxes

Results on PoseTrack2018 val with MMDetection pre-trained Cascade R-CNN (X-101-64x4d-FPN) human detector

Results on Sub-JHMDB dataset

Normalized by Person Size

Normalized by Torso Size