Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

论文中提到batch_size=48,使用两张TitanXp,是否能够完成相应训练,并复现论文中指标 #3

Open
lai-pf opened this issue Aug 16, 2022 · 4 comments

Comments

@lai-pf
Copy link

lai-pf commented Aug 16, 2022

您好,请问论文中提到batch_size=48,使用两张TitanXp完成所有模型的训练
但在repo中您提到seedFormer使用两张V100,batch_size=48完成训练。
请问两张TitanXp或者单张24G3090是否能够完成相应训练,并复现论文中指标,因为V100单卡32G显存,TitanXp单卡12G显存,我不清楚哪个描述能够作为复现实验的基本显存需求。
希望得到您的回复,万分感谢

@hrzhou2
Copy link
Owner

hrzhou2 commented Aug 17, 2022

We use two V100 gpus in the original implementation. You could change the batch size for your own settings.

@lai-pf
Copy link
Author

lai-pf commented Aug 17, 2022

I have tried it today. I used two 24G P40GPUs, it seems that I can only run in batch_size 32. And It cost near half hour an epoch,that means 400epochs need 8days? can you tell me the epoch time on your device or all training time.
I really like your code style, Simple and elegant. But it seems that I can't follow this work with my devices for it's long training time.

@hrzhou2
Copy link
Owner

hrzhou2 commented Aug 30, 2022

It takes about 3 days on my device with the original settings.

Changing the batch_size to 32 often requires other modification on lr or decay. You can train it for fewer epochs since batch_size is smaller or you can use the pretrained model.

@Nevak1016
Copy link

@hrzhou2 Hi, I'm also experimenting with this method as a sota recently, but is it really possible to experiment with batchsize=48 using 2×V100? At present, I use 2×3090 and it prompts "CUDA out of memory".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants