-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About muti-GPU on a big dataset. #10
Comments
I'm also struggling to get multi-gpu to work with reasonable speeds. If there's a bottleneck in the dataloader, you can try to increase the number of workers with --workers or you can preprocess your data. For training the first unet, that would involve resizing and padding the images to 64x64. If you find any other ways to improve multi-gpu, let me know. |
I found most transform method in DMs is like, |
padding is necessary if the images aren't already square, otherwise they will distort. CenterCrop also achieves this but you will lose data. |
update: I am working on switching to use webdataset as an optional alternative to ImageLabelDataset. So far I have observed it is much faster, but I haven't gotten it working with multi-gpu yet. Once I do, I'll push the change (or maybe I'll push it sooner since it's a non-default option). |
Pushed webdataset. Multi-gpu now works fast, although, I'm not sure everything is well. When training unet2, I see loss=0.0, which isn't right. Debugging continues... |
When I reduced the dataset from 7M to 100k, the training speed is fast, about 0.5h an epoch, however, it will cost 200h for 7M. |
Is that with webdatasets or the default? |
The default way. |
Hi, for 100k image-txt pairs, I find during the first few epochs, the loss drops significantly(from 0.6-0.02). After 5 epochs, the loss almost won't decline. But the sampling quality still increases. So when should I choose to stop training? |
My loss is similar to yours. For longer training , the loss is still difficult to reduce. I have tried to reduce the LR. But I found the sampling quality will be better for longer training. So except to loss, what can be used as a standard for convergence? |
Image sample quality is the best method I know of |
Hi, Thanks for your script.
When I train imagen with multi gpu on a subset of laion, about 7M.
I found that use 'CUDA_VISIBLE_DEVICES=0,1,2,3,4 python3 imagen.py --train...'
is faster than 'accelerate launch imagen.py'
Although the utilization of GPU in the first way is lower than the second way.
And I found there is a bottleneck in data processing (class ImageLabelDataset of data_generator.py), GPU always needs to wait for data processing.
Now, the training speed of all those two ways are too slow. Do you have some advice?
Thanks again.
The text was updated successfully, but these errors were encountered: