You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello accelerate team,
I'm looking to pretrain on v4-32 TPU pod, using a HF dataset and HF Trainer. I have no problems running on a single TPU.
I already found this issue #501 and answer https://github.com/huggingface/accelerate/issues/501 , but it's 2 years old. I successfully install accelerate and xla to all workers, however at the step 2 looks like we need the file xla_dist.py , which doesn't exist any more in xla/master branch. What are the steps to train on TPU pods then? Thnx in advance!
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hello accelerate team,
I'm looking to pretrain on v4-32 TPU pod, using a HF dataset and HF Trainer. I have no problems running on a single TPU.
I already found this issue #501 and answer https://github.com/huggingface/accelerate/issues/501 , but it's 2 years old. I successfully install accelerate and
xla
to all workers, however at the step 2 looks like we need the filexla_dist.py
, which doesn't exist any more inxla/master
branch. What are the steps to train on TPU pods then? Thnx in advance!The text was updated successfully, but these errors were encountered: