论文中提到batch_size=48,使用两张TitanXp，是否能够完成相应训练，并复现论文中指标 #3

lai-pf · 2022-08-16T06:25:34Z

您好，请问论文中提到batch_size=48,使用两张TitanXp完成所有模型的训练
但在repo中您提到seedFormer使用两张V100,batch_size=48完成训练。
请问两张TitanXp或者单张24G3090是否能够完成相应训练，并复现论文中指标，因为V100单卡32G显存,TitanXp单卡12G显存，我不清楚哪个描述能够作为复现实验的基本显存需求。
希望得到您的回复，万分感谢

hrzhou2 · 2022-08-17T07:31:18Z

We use two V100 gpus in the original implementation. You could change the batch size for your own settings.

lai-pf · 2022-08-17T07:40:29Z

I have tried it today. I used two 24G P40GPUs, it seems that I can only run in batch_size 32. And It cost near half hour an epoch,that means 400epochs need 8days? can you tell me the epoch time on your device or all training time.
I really like your code style, Simple and elegant. But it seems that I can't follow this work with my devices for it's long training time.

hrzhou2 · 2022-08-30T02:04:40Z

It takes about 3 days on my device with the original settings.

Changing the batch_size to 32 often requires other modification on lr or decay. You can train it for fewer epochs since batch_size is smaller or you can use the pretrained model.

Nevak1016 · 2023-08-21T02:47:15Z

@hrzhou2 Hi, I'm also experimenting with this method as a sota recently, but is it really possible to experiment with batchsize=48 using 2×V100? At present, I use 2×3090 and it prompts "CUDA out of memory".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

论文中提到batch_size=48,使用两张TitanXp，是否能够完成相应训练，并复现论文中指标 #3

论文中提到batch_size=48,使用两张TitanXp，是否能够完成相应训练，并复现论文中指标 #3

lai-pf commented Aug 16, 2022

hrzhou2 commented Aug 17, 2022

lai-pf commented Aug 17, 2022

hrzhou2 commented Aug 30, 2022

Nevak1016 commented Aug 21, 2023

论文中提到batch_size=48,使用两张TitanXp，是否能够完成相应训练，并复现论文中指标 #3

论文中提到batch_size=48,使用两张TitanXp，是否能够完成相应训练，并复现论文中指标 #3

Comments

lai-pf commented Aug 16, 2022

hrzhou2 commented Aug 17, 2022

lai-pf commented Aug 17, 2022

hrzhou2 commented Aug 30, 2022

Nevak1016 commented Aug 21, 2023