-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi-gpu training #6
Comments
I also notice this. I consider this a bug. I guess the problem is that multi-GPU training changes the relative weights between batches (batches on different GPUs are simply averaged while batches on the same GPU weight depending on num_gt, and some batches are skipped). I have not tested to debug this, because I am not that familiar with APIs on multi-GPUs training. |
I changed
into
and half the batch size, Empirically, the gap gets smaller, but the gap still exists |
请问multi-gpu会对mono_depth的训练产生影响吗? |
In my test, depth prediction is fine with multi-gpu |
For now, in the new update, with the distributed sampler from detectron2, we are able to train with multi-GPU and obtain reasonable performance. Without tuning the learning rate and batch size, the result goes like this: Car AP(Average Precision)@0.70, 0.70, 0.70:
bbox AP:97.24, 86.90, 67.03
bev AP:29.68, 20.48, 15.73
3d AP:21.56, 15.00, 11.16
aos AP:96.23, 84.25, 64.92
Car AP(Average Precision)@0.70, 0.50, 0.50:
bbox AP:97.24, 86.90, 67.03
bev AP:65.20, 46.35, 35.98
3d AP:58.84, 41.06, 32.49
aos AP:96.23, 84.25, 64.92 |
Hi, thanks for your great work!
I have trained GroundAwareYolo3D model and get results as below:
Car AP(Average Precision)@0.70, 0.70, 0.70
bbox AP: 97.29, 84.55, 64.65
bev AP: 29.53, 20.15, 15.53
3d AP: 22.90, 15.26, 11.33
aos AP: 96.52, 82.52, 63.05
seems comparable with paper report (23.63 16.16 12.06) in Car AP@0.70 validation set.
However if training with multi-gpu e.g. 4-GPU, we get poor result as below:
Car AP(Average Precision)@0.70, 0.70, 0.70
bbox AP: 97.08, 86.41, 66.67
bev AP: 20.56, 15.16, 11.22
3d AP: 15.17, 10.81, 8.22
aos AP: 95.50, 83.36, 64.24
training command:
bash ./launchers/train.sh config/$CONFIG_FILE.py 0,1,2,3 multi-gpu-train
bash ./launchers/train.sh config/$CONFIG_FILE.py 0 single-gpu-train
I trained twice with 'multi-gpu' and both results are similar and lower than 'single-gpu', so do you have some suggestions about this case? What about your multi-gpu training performance?
The text was updated successfully, but these errors were encountered: