-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce TPU pod launcher #815
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
…erate into tpu-pod-launch
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
85adc2f
to
e8b694b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work, thanks a lot for working on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great Work 🤗! Left a comment.
This is a heavy POC actively in development and currently is awaiting on pytorch/xla#4149 to see if we can push forward, however this PR is out here so that the community can know that it's being worked on and almost there :)
Proposed API:
accelerate launch
now allows for a configured pod setup through three new params/config items:use_cluster
, whether to use a TPU clustervm
, this mimicsxla_dist
'svm
argument and is a list of single compute VM names if you are not using an instance group. (generally not needed)env
, this is a list of environment variables to set on each of the compute VM instancesCurrently we only support non-Docker, as GCP doesn't support docker yet on the larger pods.
To launch a script on a TPU pod, the API will look like such:
Fully configured:
No configuration:
TODO:
Write tests
Closes #501 and closes #471