-
Notifications
You must be signed in to change notification settings - Fork 637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Chatllama]: MultiGPU support for training #254
Comments
Hi @TejaGollapudi, thank you very much for reaching out. We are currently working on supporting the Accelerate library. You can check directly the updates on the PR #233. |
I added accelerate in the code as #233 ,bug got error: |
@leonselina We will be releasing support for Accelerate very soon! We are currently testing the code and will keep you updated when we merge the code! |
when would this MultiGPU support be available? Really looking forward to it. |
Also looking forward to it! |
Hi Everyone @bin123apple @balcklive @TejaGollapudi . |
Hi @PierpaoloSorbellini , I trained Llama 7B with deepspeed, but got error: "MP=1 but world size is 2". |
@PierpaoloSorbellini hey I try to try llama with hf format and I use deepseep with --num_gpus =2. The model was loaded twice and they were all loaded to the rank0 gpu which caused cuda oom. do you have ideas to fix this problem? |
I'm trying to train the actor model (BLOOM 1.5B) on a multi-GPU setup (3-V100s).
When I observe the GPU usage, only the GPU:0 is utilized and I run out of memory if I increase the batch_size.
Could you add multi-GPU support using HuggingFace's accelerate to facilitate the training of larger models with a larger batch size?
Thank you
The text was updated successfully, but these errors were encountered: