Can PyTorch/XLA wheel for release branch build with cxx_abi disabled? #5325

vanbasten23 · 2023-07-19T23:47:57Z

I wonder if the new wheel build process (with ansible) can disable cxx_abi when it builds a torch_xla wheel for a release branch (such as r2.0). We recently build a torch_xla wheel (on pt/xla branch r2.0, cuda 11.8, python=3.10). From the log, it seems it still enables cxx_abi (I see -D_GLIBCXX_USE_CXX11_ABI=1 in the log above, which make me think it enables cxx_abi. Please correct me if I'm wrong.). Building an official torch_xla with cxx_abi enabled causes torch_xla wheel to be incompatible with torch's wheel.

What we used to do in the release branch, is to first apply a torch patch (as in this pr), then disable the cxx_abi (as in this pr). So my question is

With ansible, does the wheel have cxx abi enabled?
If so for above question, with ansible, is it possible to apply the torch patch file first and then set the flag to be false?

Thanks.

cc: @JackCaoG @miladm

The text was updated successfully, but these errors were encountered:

mateuszlewko · 2023-07-20T12:08:07Z

Hey,

First some background information.

Building process

All build related tasks are present in this role:
https://github.com/pytorch/xla/blob/master/infra/ansible/roles/build_srcs/tasks/main.yaml.
They should be self-descriptive, but please reach out if something is unclear.
For non-nightly releases, you should look at Ansible setup version at a given tag, branch or commit, so in this case branch r2.0: infra/ansible/roles/build_srcs/tasks/main.yaml#L32-L36 - this is also the step that builds torch_xla.
Each ansible.builtin.command task in Ansible has a separate shell
environment, i.e. previous tasks are not polluting env vars of other tasks.
Having said that, most tasks are loading env_vars
https://github.com/pytorch/xla/blob/r2.0/infra/ansible/roles/build_srcs/tasks/main.yaml#L36 Ansible dict, which is a combination of vars from the config: infra/ansible/config/env.yaml#L21-L48
(depending on the arch and accelerator, in this case: common + amd64 + cuda). Implementation detail: the dicts are combined here.
Task that builds XLA computation client library, always sets a parameter -D_GLIBCXX_USE_CXX11_ABI=1 (in addition to env_vars). This was based on the pre-Ansible setup. It can be changed easily. This task is not running on master branch (nightly releases) thanks to Bazel migration.

To answer your first question, cxx_abi is explicitly enabled for the XLA computation client library. It's not explicitly set for PyTorch_XLA, but I see in the logs it's set anyway (search for Determined_GLIBCXX_USE_CXX11_ABI=1 in the logs).
I think you need to set it explicitly to false 0.
Now the question is, do you want to disable it for all builds or just 2.0? If just 2.0 then push a new commit to the r2.0 branch with the following modifications:

Remove -D_GLIBCXX_USE_CXX11_ABI=1 from the "Build XLA computation client
library" task: infra/ansible/roles/build_srcs/tasks/main.yaml#L27.
Add common env var _GLIBCXX_USE_CXX11_ABI=0 in https://github.com/pytorch/xla/blob/r2.0/infra/ansible/config/env.yaml#L22.
This will be picked up by all tasks that have environment: "{{ env_vars }}".

Applying patches

Sure, it's easy to apply any patches with Ansible. Example of applying TF
patches: https://github.com/pytorch/xla/blob/r2.0/infra/ansible/roles/fetch_srcs/tasks/main.yaml#L29-L40.
Simply add another task there with the correct directory.

Testing your changes locally

You can test your changes locally (assuming you have docker installed) by
running the same docker build command as in the cloud build step:
https://screenshot.googleplex.com/6imoM249wTWp2NF.

In the infra/ansible directory run

docker build -f=Dockerfile . --build-arg=accelerator=cuda \
--build-arg=arch=amd64 --build-arg=cuda_version=11.8 \
--build-arg=git_tag=v2.0.0 --build-arg=package_version=2.0 \
--build-arg=python_version=3.10 \
--build-arg=ansible_vars='{"accelerator":"cuda","arch":"amd64","cuda_version":"11.8","git_tag":"v2.0.0","package_version":"2.0","python_version":"3.10","pytorch_git_rev":"v2.0.0","xla_git_rev":"v2.0.0"}' -t=local_image

Hope it helps,
Mateusz

vanbasten23 · 2023-07-20T14:03:36Z

Thanks @mateuszlewko . I'll give it a try.

vanbasten23 · 2023-07-24T04:59:44Z

The pr has been merged and we started the build. But the build seems failing.

vanbasten23 · 2023-07-25T00:22:30Z

I looks I need to update the tag v2.0.0 which I did. But the build still failed: log:

At first, it was able to check out the correct commit:

Initialized empty Git repository in /workspace/.git/
From https://github.com/pytorch/xla
 * branch            3b7798db3dd6ee1fc0550a332f13d06db3e8d169 -> FETCH_HEAD
HEAD is now at 3b7798d Disable cxx abi in ansible when building pt/xla for branch r2.0 (#5332)
BUILD
Starting Step #0 - "git_fetch"
Step #0 - "git_fetch": Already have image (with digest): gcr.io/cloud-builders/git

Notice commit 3b7798db3dd6ee1fc0550a332f13d06db3e8d169 is the one I recently pushed to r2.0 branch.

Then somehow the commit changed: "HEAD is now at 901d154 Sharding should be per output of IR Node, instead of per IR Node (Sharding should be per output of IR Node, instead of per IR Node #5330)". Commit "901d154ac" is the current commit merged on the master branch at 07/24/2023.

cc: @ManfeiBai

vanbasten23 · 2023-07-26T23:43:26Z

I'm able to create a new r2.0 wheel now.

vanbasten23 mentioned this issue Jul 21, 2023

Disable cxx_abi when building PyTorch/XLA in r2.0. #5332

Merged

This was referenced Jul 25, 2023

Revert "Checkout Ansible setup at origin's master, not local, untracked master." #5343

Closed

Check out the release branch instead of origin/master #5344

Merged

vanbasten23 closed this as completed Aug 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can PyTorch/XLA wheel for release branch build with cxx_abi disabled? #5325

Can PyTorch/XLA wheel for release branch build with cxx_abi disabled? #5325

vanbasten23 commented Jul 19, 2023

mateuszlewko commented Jul 20, 2023

vanbasten23 commented Jul 20, 2023

vanbasten23 commented Jul 24, 2023

vanbasten23 commented Jul 25, 2023 •

edited

Loading

vanbasten23 commented Jul 26, 2023

Can PyTorch/XLA wheel for release branch build with cxx_abi disabled? #5325

Can PyTorch/XLA wheel for release branch build with cxx_abi disabled? #5325

Comments

vanbasten23 commented Jul 19, 2023

mateuszlewko commented Jul 20, 2023

vanbasten23 commented Jul 20, 2023

vanbasten23 commented Jul 24, 2023

vanbasten23 commented Jul 25, 2023 • edited Loading

vanbasten23 commented Jul 26, 2023

vanbasten23 commented Jul 25, 2023 •

edited

Loading