This repository contains the implementation of the master's thesis project Camera-Radar Sensor Fusion using Deep Learning from Johannes Kübel and Julian Brandes.
The thesis is available for download in the Chalmers Open Digital Repository.
This work is based on the frustum-proposal based radar and camera sensor fusion approach CenterFusion proposed by Nabati et al. We introduce two major changes to the existing network architecture:
- Early Fusion (EF) as a projection of the radar point cloud into the image plane. The projected radar point image features (default: depth, velocity components in x and z and RCS value) are then concatenated to the RGB image channels as a new input to the image-based backbone of the network architecture. EF introduces robustness against camera sensor failure and challenging environmental conditions (e.g. rain/night/fog).
- Learned Frustum Association (LFANet): The Second major change to the architecture regards the frustum-proposal based association between camera and radar point cloud. Instead of selecting the closest radar point to associate it to the detection obtained from the backbone & primary heads, we propose a network termed LFANet that outputs an artifical radar point r* representing all the radar points in the frustum. LFANet is trained to output the depth to the center of the bounding box associated with the radar point as well as the corresponding radial velocity. The outputs of LFANet are then used as the new channels in the heatmap introduced by Nabati et al.
We combine these two changes to obtain CenterFusion++.
The following figure displays the modified network architecture on a high level:
The code has been tested on Ubuntu 20.04 with Python 3.7.11, CUDA 11.3.1 and PyTorch 1.10.2.
We used conda for the package management, the conda environment file is provided here.
For installation, follow these steps:
-
Clone the repository with the
--recursive
option. We'll call the directory that you cloned thesisdlfusion into CFPP_ROOT:git clone --recursive https://github.com/brandesjj/centerfusionpp
-
Install conda, following the instructions on their website. We use Anaconda, Installation can be done via:
./<Anaconda_file.sh>
-
Create a new conda environment (optional):
conda env create -f <CFPP_ROOT>/experiments/centerfusionpp.yml
Restarg the shell and activate the conda environment
conda activate centerfusionpp
-
Build the deformable convolution library:
cd <CFPP_ROOT>/src/lib/model/networks/DCNv2 ./make.sh
Note: If the DCNv2 folder does not exist in the
networks
directory, it can be downloaded using this command:cd <CFPP_ROOT>/src/lib/model/networks git clone https://github.com/lbin/DCNv2/
Note that this repository uses a slightly different DCNv2 repository than CenterFusion since this caused some problems for our CUDA/pytorch version.
Additionally, the docker file to build a docker container with all the necessary packages is located here.
CenterFusion++ was trained and validated using the nuScenes dataset only. Previous work (e.g. CenterTrack) uses other dataset (e.g. KITTI etc.) as well. This is not implemented within CenterFusion++. However, the original files that can be used to convert these datasets into the correct dataformat are not removed from this repository.
To download the dataset to your local machine follow these steps:
-
Download the nuScenes dataset from nuScenes website.
-
Extract the downloaded files in the
<CFPP_ROOT>\data\nuscenes
directory. You should have the following directory structure after extraction:<CFPP_ROOT> `-- data `-- nuscenes |-- maps |-- samples | |-- CAM_BACK | | | -- xxx.jpg | | ` -- ... | |-- CAM_BACK_LEFT | |-- CAM_BACK_RIGHT | |-- CAM_FRONT | |-- CAM_FRONT_LEFT | |-- CAM_FRONT_RIGHT | |-- RADAR_BACK_LEFT | | | -- xxx.pcd | | ` -- ... | |-- RADAR_BACK_RIGHT | |-- RADAR_FRON | |-- RADAR_FRONT_LEFT | `-- RADAR_FRONT_RIGHT |-- sweeps |-- v1.0-mini |-- v1.0-test `-- v1.0-trainval `-- annotations
In this work, not all the data available from nuScenes is required. To save disk space, you can skip the LiDAR data in
/samples
and/sweeps
and theCAM_(..)
folders in/sweeps
.
Now you can create the necessary annotations. To create the annotations, run the convert_nuScenes.py script to convert the nuScenes dataset to the required COCO format:
cd <CFPP_ROOT>/src/tools
python convert_nuScenes.py
The script contains several settings that can be used. They are explained in the first block of the code.
The pre-trained models can be downloaded from the links given in the following table:
Model | GPUs | Backbone | Val NDS | Val mAP |
---|---|---|---|---|
centerfusionpp.pth | 2x NVIDIA A100 | EF | 0.4512 | 0.3209 |
centerfusion_lfa.pth | 2x NVIDIA A100 | CenterNet170 | 0.4407 | 0.3219 |
earlyfusion.pth | 2x NVIDIA A100 | DLA34 | 0.3954 | 0.3159 |
Notes:
- for the
CenterNet170
backbone, we refer to the CenterFusion repository.
The scripts in <CFPP_ROOT>/experiments/
can be used to train the network. There is one for the training on 1 GPU and another for the training on 2 GPUs.
cd <CFPP_ROOT>
bash experiments/train.sh
The --train_split
parameter determines the training set, which could be mini_train
or train
. the --load_model
parameter can be set to continue training from a pretrained model, or removed to start training from scratch. You can modify the parameters in the script as needed, or add more supported parameters from <CFPP_ROOT>/src/lib/opts.py
.
The script creates a log folder in
<CFPP_ROOT>/exp/ddd/<exp_id>/logs_<time_stamp>
where <time_stamp>
is the time stamp and the default for <exp_id>
is centerfusionpp
.
The log folder contains an event file for Tensorboard, a log.txt
with a brief summary of the training process and a opt.txt
file containing the specified options.
Download the pre-trained model into the <CFPP_ROOT>/models
directory and use the <CFPP_ROOT>/experiments/test.sh
script to run the evaluation:
cd <CFPP_ROOT>
bash experiments/test.sh
Make sure the --load_model
parameter in the script provides the path to the downloaded pre-trained model. The --val_split
parameter determines the validation set, which could be mini_val
, val
or test
. You can modify the parameters in the script as needed, or add more supported parameters from <CFPP_ROOT>/src/lib/opts.py
.
To reference to this work, please use the following:
@mastersthesis{CenterFusionPP,
author = {K{\"u}bel, Johannes and Brandes, Julian},
title = {Camera-Radar Sensor Fusion using Deep Learning},
school = {Chalmers University of Technology},
year = 2022,
note = {Available: https://hdl.handle.net/20.500.12380/305503}}
The following works have been used by CenterFusion++.
@INPROCEEDINGS{9423268,
author={Nabati, Ramin and Qi, Hairong},
booktitle={2021 IEEE Winter Conference on Applications of Computer Vision (WACV)},
title={CenterFusion: Center-based Radar and Camera Fusion for 3D Object Detection},
year={2021},
volume={},
number={},
pages={1526-1535},
doi={10.1109/WACV48630.2021.00157}}
@inproceedings{zhou2019objects,
title={Objects as Points},
author={Zhou, Xingyi and Wang, Dequan and Kr{\"a}henb{\"u}hl, Philipp},
booktitle={arXiv preprint arXiv:1904.07850},
year={2019}
}
@article{zhou2020tracking,
title={Tracking Objects as Points},
author={Zhou, Xingyi and Koltun, Vladlen and Kr{\"a}henb{\"u}hl, Philipp},
journal={ECCV},
year={2020}
}
@inproceedings{nuscenes2019,
title={{nuScenes}: A multimodal dataset for autonomous driving},
author={Holger Caesar and Varun Bankiti and Alex H. Lang and Sourabh Vora and Venice Erin Liong and Qiang Xu and Anush Krishnan and Yu Pan and Giancarlo Baldan and Oscar Beijbom},
booktitle={CVPR},
year={2020}
}
CenterFusion++ is based on CenterFusion and is released under the MIT License. See NOTICE for license information on other libraries used in this project.