γCVPR 2024γDelving into the Trajectory Long-tail Distribution for Muti-object Tracking
Sijia Chen, En Yu, Jinyang Li, Wenbing Tao
[ArXiv] Paper (http://arxiv.org/abs/2403.04700)
[CVPR] Paper (Delving_into_the_Trajectory_Long-tail_Distribution_for_Muti-object_Tracking_CVPR_2024_paper))
YouTube (https://www.youtube.com/watch?v=ohgIesSNgaQ)
If you have any problems with our work, please issue me. We will promptly reply it.
If you cite our method for experimental comparison, you can use the method name TLTDMOT.
Thanks for your attention! If you are interested in our work, please give us a star βοΈ.
Multiple Object Tracking (MOT) is a critical area within computer vision, with a broad spectrum of practical implementations. Current research has primarily focused on the development of tracking algorithms and enhancement of post-processing techniques. Yet, there has been a lack of thorough examination concerning the nature of tracking data it self. In this study, we pioneer an exploration into the distribution patterns of tracking data and identify a pronounced long-tail distribution issue within existing MOT datasets. We note a significant imbalance in the distribution of trajectory lengths across different pedestrians, a phenomenon we refer to as βpedestrians trajectory long-tail distributionβ. Addressing this challenge, we introduce a bespoke strategy designed to mitigate the effects of this skewed distribution. Specifically, we propose two data augmentation strategies, including Stationary Camera View Data Augmentation (SVA) and Dynamic Camera View Data Augmentation (DVA) , designed for viewpoint states and the Group Softmax (GS) module for Re-ID. SVA is to backtrack and predict the pedestrian trajectory of tail classes, and DVA is to use diffusion model to change the background of the scene. GS divides the pedestrians into unrelated groups and performs softmax operation on each group individually. Our proposed strategies can be integrated into numerous existing tracking systems, and extensive experimentation validates the efficacy of our method in reducing the influence of long-tail distribution on multi-object tracking performance.
I'm Sijia Chen. I'm very sorry. There is a small error in Figure 1 in the paper of official CVPR. Figure 1 in the paper of ArXiv is correct.
We made a mistake when submitting the camera-ready version of CVPR. Although we found this error in May 2024 and contacted the publisher immediately, we were unable to correct it because the deadline for the camera-ready version of CVPR had passed.
- (2024.06.17) The DVA code is opened!
- (2024.04.19) Our poster is selected for the 3th China3DV presentation!
- (2024.02.27) Our paper is accepted by CVPR 2024!
- Note: We use a NVIDIA GeForce RTX 3090 GPU and cuda 11.1.
- Clone this repo, and we'll call the directory that you cloned as ${Trajectory-Long-tail-Distribution-for-MOT_ROOT}
- Install dependencies. We use python 3.8 and pytorch >= 1.7.0
conda create -n TLDTMOT python=3.8
conda activate TLDTMOT
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch
cd ${Trajectory-Long-tail-Distribution-for-MOT_ROOT}
pip install cython # Optional addition: -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install -r requirements.txt # Optional addition: -i https://pypi.tuna.tsinghua.edu.cn/simple/
- We use DCNv2_pytorch_1.7 in our backbone network (pytorch_1.7 branch). Previous versions can be found in DCNv2.
git clone -b pytorch_1.7 https://github.com/ifzhang/DCNv2.git
cd DCNv2
./make.sh
- In order to run the code for demos, you also need to install ffmpeg.
conda install ffmpeg
pip install ffmpy
- 2DMOT15 , MOT16, MOT17 and MOT20 2DMOT15, MOT16, MOT17 and MOT20 can be downloaded from the official webpage of MOT challenge. The After downloading, you should prepare the data in the following structure:
dataset
|
|
|ββββββMOT15
| |ββββββimages
| | βββββββtrain
| | βββββββtest
| βββββββlabels_with_ids
| βββββββtrain(empty)
|ββββββMOT16
| |ββββββimages
| | βββββββtrain
| | βββββββtest
| βββββββlabels_with_ids
| βββββββtrain(empty)
|ββββββMOT17
| |ββββββimages
| | βββββββtrain
| | βββββββtest
| βββββββlabels_with_ids
| βββββββtrain(empty)
|ββββββMOT20
|ββββββimages
| βββββββtrain
| βββββββtest
βββββββlabels_with_ids
βββββββtrain(empty)
Then, you can change the seq_root and label_root in src/gen_labels_15.py , src/gen_labels_16.py, src/gen_labels_17.py and src/gen_labels_20.py and run:
cd src
python gen_labels_15.py
python gen_labels_16.py
python gen_labels_17.py
python gen_labels_20.py
to generate the labels of 2DMOT15 , MOT16, MOT17 and MOT20. The seqinfo.ini files of 2DMOT15 can be downloaded here [Google], [Baidu],code:8o0w.
Note: Each time you run, you need to delete the labels_with_ids folder.
- CrowdHuman The CrowdHuman dataset can be downloaded from their official webpage. After downloading, you should prepare the data in the following structure:
dataset
|
|
|ββββββcrowdhuman
|ββββββimages
| βββββββtrain
| βββββββval
βββββββlabels_with_ids
| βββββββtrain(empty)
| βββββββval(empty)
β------annotation_train.odgt
β------annotation_val.odgt
If you want to pretrain on CrowdHuman (we train Re-ID on CrowdHuman), you can change the paths in src/gen_labels_crowd_id.py and run:
cd src
python gen_labels_crowd_id.py
If you want to add CrowdHuman to the MIX dataset (we do not train Re-ID on CrowdHuman), you can change the paths in src/gen_labels_crowd_det.py and run:
cd src
python gen_labels_crowd_det.py
- MIX We use the same training data as JDE in this part and we call it "MIX". Please refer to their DATA ZOO to download and prepare all the training data including Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16.
- Pretrained models
DLA-34 official COCO pretrained model: ctdet_coco_dla_2x.pth can be downloaded here [Baidu, code:hust], [Google]. HRNetV2 ImageNet pretrained model: HRNetV2-W18 official, HRNetV2-W32 official. After downloading, you should put the pretrained models in the following structure:
${Trajectory-Long-tail-Distribution-for-MOT_ROOT}
βββββββmodels
βββββββctdet_coco_dla_2x.pth
βββββββhrnetv2_w32_imagenet_pretrained.pth
βββββββhrnetv2_w18_imagenet_pretrained.pth
- Baseline model
Our baseline FairMOT model (DLA-34 backbone) is pretrained on the CrowdHuman for 60 epochs with the self-supervised learning approach and then trained on the MIX dataset for 30 epochs. The models can be downloaded here: crowdhuman_dla34.pth [Google] [Baidu, code:ggzx ] [Onedrive]. fairmot_dla34.pth [Google] [Baidu, code:uouv] [Onedrive]. After downloading, you should put the baseline model in the following structure:
${Trajectory-Long-tail-Distribution-for-MOT_ROOT}
βββββββmodels
βββββββfairmot_dla34.pth
βββββββ...
Our processed MOT17 dataset by SVA and DVA can be downloaded here [Baidu, code:hust].
Our models can be downloaded here [Baidu, code:hust].
- Download the training data
- Change the dataset root directory 'root' in src/lib/cfg/data.json and 'data_dir' in src/lib/opts.py
- Only train on MOT15:
Baseline(+Ours):
bash experiments/MOT15_add_our_method_dla34.sh
Baseline:
bash experiments/MOT15_baseline.sh
- Only train on MOT16:
Baseline(+Ours):
bash experiments/MOT16_add_our_method_dla34.sh
Baseline:
bash experiments/MOT16_baseline.sh
- Only train on MOT17:
Baseline(+Ours):
bash experiments/MOT17_add_our_method_dla34.sh
Baseline:
bash experiments/MOT17_baseline.sh
- Only Train on MOT20:
The data annotation of MOT20 is a little different from MOT17, the coordinates of the bounding boxes are all inside the image, so we need to uncomment line 313 to 316 in the dataset file src/lib/datasets/dataset/jde.py:
#np.clip(xy[:, 0], 0, width, out=xy[:, 0])
#np.clip(xy[:, 2], 0, width, out=xy[:, 2])
#np.clip(xy[:, 1], 0, height, out=xy[:, 1])
#np.clip(xy[:, 3], 0, height, out=xy[:, 3])
Then, we can train on MOT20:
Baseline(+Ours):
bash experiments/MOT20_add_our_method_dla34.sh
Baseline:
bash experiments/MOT20_baseline.sh
- Train on MIX and MOT20: The data annotation of MOT20 is a little different from MOT17, the coordinates of the bounding boxes are all inside the image, so we need to uncomment line 313 to 316 in the dataset file src/lib/datasets/dataset/jde.py:
#np.clip(xy[:, 0], 0, width, out=xy[:, 0])
#np.clip(xy[:, 2], 0, width, out=xy[:, 2])
#np.clip(xy[:, 1], 0, height, out=xy[:, 1])
#np.clip(xy[:, 3], 0, height, out=xy[:, 3])
Then, we can train on MOT20:
Baseline(+Ours):
bash experiments/MOT20_ft_mix_add_our_method_dla34.sh
- For ablation study,
bash experiments/ablation_study.sh
- To get the txt results of the test set of MOT15 or MOT16 or MOT17 or MOT20, you should modify the '--load_model' in the sh file and run it:
MOT15:
bash experiments/MOT15_track.sh
MOT16:
bash experiments/MOT16_track.sh
MOT17:
bash experiments/MOT17_track.sh
MOT20:
bash experiments/MOT20_track.sh
- For ablation study
we evaluate on the other half of the training set of MOT17, you can run:
All classes(default):
bash experiments/ablation_study_track.sh
If you want to evaluate head classes and tail classes, you need to run tackle_module/head_tail_classes_division/val_id_num_count.py. Then you need to place the generated gt_headclasses.txt and gt_tailclasses.txt file in the corresponding gt location of the MOT17 training dataset, like below:
dataset
|
|
|ββββββMOT17
|
|ββββββimages
|
|ββββββtrain
|
|ββββββMOT17-02-SDP
| |
| |ββββββgt
| βββββββgt.txt
| βββββββgt_headclasses.txt
| βββββββgt_tailclasses.txt
|ββββββMOT17-04-SDP
| |
| |ββββββgt
| βββββββgt.txt
| βββββββgt_headclasses.txt
| βββββββgt_tailclasses.txt
|ββββββMOT17-05-SDP
| |
| |ββββββgt
| βββββββgt.txt
| βββββββgt_headclasses.txt
| βββββββgt_tailclasses.txt
|ββββββMOT17-09-SDP
| |
| |ββββββgt
| βββββββgt.txt
| βββββββgt_headclasses.txt
| βββββββgt_tailclasses.txt
|ββββββMOT17-10-SDP
| |
| |ββββββgt
| βββββββgt.txt
| βββββββgt_headclasses.txt
| βββββββgt_tailclasses.txt
|ββββββMOT17-11-SDP
| |
| |ββββββgt
| βββββββgt.txt
| βββββββgt_headclasses.txt
| βββββββgt_tailclasses.txt
|ββββββMOT17-13-SDP
|
|ββββββgt
βββββββgt.txt
βββββββgt_headclasses.txt
βββββββgt_tailclasses.txt
Then you can run:
Head classes or tail classes:
bash experiments/ablation_study_classes_track.sh
You can input a raw video and get the demo video by running src/demo.py and get the mp4 format of the demo video:
cd src
python demo.py mot --load_model ../models/fairmot_dla34.pth --conf_thres 0.4
You can change --input-video and --output-root to get the demos of your own videos. --conf_thres can be set from 0.3 to 0.7 depending on your own videos.
The part of the code are borrowed from the follow work:
Thanks for their wonderful works.
@InProceedings{Chen_2024_CVPR,
author = {Chen, Sijia and Yu, En and Li, Jinyang and Tao, Wenbing},
title = {Delving into the Trajectory Long-tail Distribution for Muti-object Tracking},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {19341-19351}
}