[ALGORITHM]
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.718 | 0.898 | 0.795 | 0.773 | 0.937 | ckpt | log |
pose_resnet_50 | 384x288 | 0.731 | 0.900 | 0.799 | 0.783 | 0.931 | ckpt | log |
pose_resnet_101 | 256x192 | 0.726 | 0.899 | 0.806 | 0.781 | 0.939 | ckpt | log |
pose_resnet_101 | 384x288 | 0.748 | 0.905 | 0.817 | 0.798 | 0.940 | ckpt | log |
pose_resnet_152 | 256x192 | 0.735 | 0.905 | 0.812 | 0.790 | 0.943 | ckpt | log |
pose_resnet_152 | 384x288 | 0.750 | 0.908 | 0.821 | 0.800 | 0.942 | ckpt | log |
Following the common setting, the models are trained on COCO train dataset, and evaluate on OCHuman dataset.
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.546 | 0.726 | 0.593 | 0.592 | 0.755 | ckpt | log |
pose_resnet_50 | 384x288 | 0.539 | 0.723 | 0.574 | 0.588 | 0.756 | ckpt | log |
pose_resnet_101 | 256x192 | 0.559 | 0.724 | 0.606 | 0.605 | 0.751 | ckpt | log |
pose_resnet_101 | 384x288 | 0.571 | 0.715 | 0.615 | 0.615 | 0.748 | ckpt | log |
pose_resnet_152 | 256x192 | 0.570 | 0.725 | 0.617 | 0.616 | 0.754 | ckpt | log |
pose_resnet_152 | 384x288 | 0.582 | 0.723 | 0.627 | 0.627 | 0.752 | ckpt | log |
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_101 | 256x192 | 0.294 | 0.736 | 0.174 | 0.337 | 0.763 | ckpt | log |
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_101 | 256x192 | 0.583 | 0.897 | 0.669 | 0.636 | 0.918 | ckpt | log |
Note that, the evaluation metric used here is mAP (adapted from COCO), which may be different from the official evaluation codes. Please be cautious if you use the results in papers.
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.882 | 0.286 | ckpt | log |
pose_resnet_101 | 256x256 | 0.888 | 0.290 | ckpt | log |
pose_resnet_152 | 256x256 | 0.889 | 0.303 | ckpt | log |
Arch | Input Size | Skeleton Acc | Contour Acc | Mean Acc | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.887 | 0.858 | 0.868 | ckpt | log |
pose_resnet_101 | 256x256 | 0.890 | 0.863 | 0.873 | ckpt | log |
pose_resnet_152 | 256x256 | 0.897 | 0.868 | 0.879 | ckpt | log |
Results on CrowdPose test with YOLOv3 human detector
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.637 | 0.808 | 0.692 | 0.739 | 0.650 | 0.506 | ckpt | log |
pose_resnet_101 | 256x192 | 0.647 | 0.810 | 0.703 | 0.744 | 0.658 | 0.522 | ckpt | log |
pose_resnet_101 | 320x256 | 0.661 | 0.821 | 0.714 | 0.759 | 0.671 | 0.536 | ckpt | log |
pose_resnet_152 | 256x192 | 0.656 | 0.818 | 0.712 | 0.754 | 0.666 | 0.532 | ckpt | log |
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 86.5 | 87.5 | 82.3 | 75.6 | 79.9 | 78.6 | 74.0 | 81.0 | ckpt | log |
The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.
Results on PoseTrack2018 val with MMDetection pre-trained Cascade R-CNN (X-101-64x4d-FPN) human detector
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 78.9 | 81.9 | 77.8 | 70.8 | 75.3 | 73.2 | 66.4 | 75.2 | ckpt | log |
The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.
The models are pre-trained on MPII dataset only. NO test-time augmentation (multi-scale /rotation testing) is used.
Split | Arch | Input Size | Head | Sho | Elb | Wri | Hip | Knee | Ank | Mean | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sub1 | pose_resnet_50 | 256x256 | 99.1 | 98.0 | 93.8 | 91.3 | 99.4 | 96.5 | 92.8 | 96.1 | ckpt | log |
Sub2 | pose_resnet_50 | 256x256 | 99.3 | 97.1 | 90.6 | 87.0 | 98.9 | 96.3 | 94.1 | 95.0 | ckpt | log |
Sub3 | pose_resnet_50 | 256x256 | 99.0 | 97.9 | 94.0 | 91.6 | 99.7 | 98.0 | 94.7 | 96.7 | ckpt | log |
Average | pose_resnet_50 | 256x256 | 99.2 | 97.7 | 92.8 | 90.0 | 99.3 | 96.9 | 93.9 | 96.0 | - | - |
Sub1 | pose_resnet_50 (2 Deconv.) | 256x256 | 99.1 | 98.5 | 94.6 | 92.0 | 99.4 | 94.6 | 92.5 | 96.1 | ckpt | log |
Sub2 | pose_resnet_50 (2 Deconv.) | 256x256 | 99.3 | 97.8 | 91.0 | 87.0 | 99.1 | 96.5 | 93.8 | 95.2 | ckpt | log |
Sub3 | pose_resnet_50 (2 Deconv.) | 256x256 | 98.8 | 98.4 | 94.3 | 92.1 | 99.8 | 97.5 | 93.8 | 96.7 | ckpt | log |
Average | pose_resnet_50 (2 Deconv.) | 256x256 | 99.1 | 98.2 | 93.3 | 90.4 | 99.4 | 96.2 | 93.4 | 96.0 | - | - |
Split | Arch | Input Size | Head | Sho | Elb | Wri | Hip | Knee | Ank | Mean | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sub1 | pose_resnet_50 | 256x256 | 93.3 | 83.2 | 74.4 | 72.7 | 85.0 | 81.2 | 78.9 | 81.9 | ckpt | log |
Sub2 | pose_resnet_50 | 256x256 | 94.1 | 74.9 | 64.5 | 62.5 | 77.9 | 71.9 | 78.6 | 75.5 | ckpt | log |
Sub3 | pose_resnet_50 | 256x256 | 97.0 | 82.2 | 74.9 | 70.7 | 84.7 | 83.7 | 84.2 | 82.9 | ckpt | log |
Average | pose_resnet_50 | 256x256 | 94.8 | 80.1 | 71.3 | 68.6 | 82.5 | 78.9 | 80.6 | 80.1 | - | - |
Sub1 | pose_resnet_50 (2 Deconv.) | 256x256 | 92.4 | 80.6 | 73.2 | 70.5 | 82.3 | 75.4 | 75.0 | 79.2 | ckpt | log |
Sub2 | pose_resnet_50 (2 Deconv.) | 256x256 | 93.4 | 73.6 | 63.8 | 60.5 | 75.1 | 68.4 | 75.5 | 73.7 | ckpt | log |
Sub3 | pose_resnet_50 (2 Deconv.) | 256x256 | 96.1 | 81.2 | 72.6 | 67.9 | 83.6 | 80.9 | 81.5 | 81.2 | ckpt | log |
Average | pose_resnet_50 (2 Deconv.) | 256x256 | 94.0 | 78.5 | 69.9 | 66.3 | 80.3 | 74.9 | 77.3 | 78.0 | - | - |