Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xpu: Support new PyTorch XPU backend (>=2.4) #31237

Closed
dvrogozh opened this issue Jun 4, 2024 · 4 comments · Fixed by huggingface/accelerate#2825 or #31238
Closed

xpu: Support new PyTorch XPU backend (>=2.4) #31237

dvrogozh opened this issue Jun 4, 2024 · 4 comments · Fixed by huggingface/accelerate#2825 or #31238

Comments

@dvrogozh
Copy link
Contributor

dvrogozh commented Jun 4, 2024

XPU backend is a new backend in PyTorch which targets to enabled hardware acceleration on Intel GPUs via sycl. It's being actively worked on at the moment with first set of patches landed in PyTorch upstream and support disclosed in documentation [1]. Initial version should be available starting from PyTorch 2.4, with 2.5 release as a target point of maturity. Current focus of the effort is on functional aspect to identify and close API gaps, if any, and populate set of offloadable aten operations. Some models and scenarios can already be tried out with the caveat of the low performance due to CPU fallbacks on some operations. Overall, [2] outlines upsrreaming process for XPU backend. Note also some relevant XPU related issues opened on PyTorch side [3].

Previously Intel GPU support in PyTorch was only available via Intel Extension for PyTorch (IPEX). Effectively this support is what is getting now upstreamed to the stock PyTorch.

Here I would like to request Huggingface to enable stock Pytorch XPU backend. Considering that IPEX is actually already enabled in Huggingface repos, this should be fairly trivial to extend it to cover XPU backend since the latter reuses XPU device and operations naming from IPEX era.

I did prototype XPU backend support in Huggingface. Please, check these PRs:

[1] https://github.com/pytorch/pytorch?tab=readme-ov-file#intel-gpu-support
[2] pytorch/pytorch#114842
[3] https://github.com/pytorch/pytorch/issues?q=is%3Aissue+is%3Aopen+xpu+in%3Atitle

CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5 @sywangyi @kding1

dvrogozh added a commit to dvrogozh/accelerate that referenced this issue Jun 4, 2024
Fixes: huggingface/transformers#31237

XPU backend is available in the stock PyTorch starting from
version 2.4, see [1]. This commit extends huggingface accelerate
to support XPU from both IPEX and the stock pytorch. IPEX is being
tried first.

See: pytorch/pytorch#114842
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
@dvrogozh
Copy link
Contributor Author

dvrogozh commented Jun 4, 2024

As of pytorch/pytorch@21144ce, huggingface/accelerate@b7fa2fa and 485d913 with applied PRs:

Below are my try out results for Huggingface examples (https://github.com/huggingface/transformers/tree/main/examples/pytorch) running with XPU backend on ATS-M (requires export OverrideDefaultFP64Settings=1 && export IGC_EnableDPEmulation=1 at the moment). I tried all the samples except 2: contrastive-image-text and semantic-segmentation.

Overall, Huggingface examples can run on XPU backend with the low performance at the moment due to range of operations falling back to CPU. Effectively one of the goal was to identify these ops for future prioritization. The only example which failed due to missing of some uAPI is speech-pretraining. See details below.

op image-classification image-detection translation token-classification text-classification summarization instance-segmentation multiple-choice question-answering
aten::_cdist_forward explicit DETR
aten::foreach_addcdiv.ScalarList manual ViT DETR OPUS_MT BERT MRPC
aten::foreach_addcmul.Scalar manual ViT DETR OPUS_MT BERT
aten::foreach_div.ScalarList manual ViT DETR OPUS_MT BERT MRPC
aten::foreach_lerp.Scalar manual ViT DETR OPUS_MT BERT
aten::foreach_mul.Scalar manual ViT DETR OPUS_MT BERT MRPC
aten::foreach_mul.Tensor manual ViT DETR OPUS_MT BERT MRPC
aten::_foreach_norm.Scalar manual ViT DETR OPUS_MT BERT MRPC
aten::_foreach_sqrt manual ViT DETR OPUS_MT BERT MRPC
aten::addcdiv.out explicit SWIN ROBERTA
aten::addcmul.out explicit GOOGLE-T5 SWIN ROBERTA
aten::all.all_out explicit DETR BERT MRPC
aten::floor.out explicit SWIN
aten::grid_sampler_2d_backward explicit SWIN
aten::lerp.Scalar_out explicit GOOGLE-T5 SWIN ROBERTA
aten::linalg_vector_norm.out explicit ViT OPUS_MT MRPC GOOGLE-T5 SWIN ROBERTA
aten::linspace.out explicit SWIN
aten::native_batch_norm explicit SWIN
aten::native_group_norm_backward explicit SWIN
aten::nll_loss2d_backward manual DETR SWIN
aten::nll_loss2d_forward manual DETR SWIN
aten::max_pool2d_with_indices.out explicit DETR
aten::prod.int_out explicit SWIN
aten::roll explicit SWIN
aten::sgn.out explicit DETR
aten::sigmoid.out explicit DETR OPUS_MT SWIN
aten::sigmoid_backward.grad_input explicit DETR SWIN
aten::silu.out explicit OPUS_MT
aten::topk.values explicit SWIN
aten::upsample_bilinear2d.out explicit SWIN
aten::upsample_bilinear2d_backward.grad_input explicit SWIN
aten::upsample_nearest2d.out explicit DETR

dvrogozh added a commit to dvrogozh/accelerate that referenced this issue Jun 5, 2024
Fixes: huggingface/transformers#31237

XPU backend is available in the stock PyTorch starting from
version 2.4, see [1]. This commit extends huggingface accelerate
to support XPU from both IPEX and the stock pytorch. IPEX is being
tried first.

See: pytorch/pytorch#114842
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
dvrogozh added a commit to dvrogozh/accelerate that referenced this issue Jun 5, 2024
Fixes: huggingface/transformers#31237

XPU backend is available in the stock PyTorch starting from
version 2.4, see [1]. This commit extends huggingface accelerate
to support XPU from both IPEX and the stock pytorch. IPEX is being
tried first.

See: pytorch/pytorch#114842
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
@amyeroberts
Copy link
Collaborator

amyeroberts commented Jun 6, 2024

@dvrogozh Thank you for such an extensive write up, diving into how it affects the library functionality and opening up draft PRs for enabling this ❤️

It's OK if there isn't full coverage of operations - we support the mps backend despite there not being full coverage yet. It's great that you've investigated and we have an idea how much the fallback can slow things down.

Overall, I don't see any reason why this shouldn't be something we enable. Similar to mps, it's not something we'll probably test on our side though at the moment

cc @ydshieh @muellerzr

@muellerzr
Copy link
Contributor

Yep agreed :) We are working towards getting this in accelerate first, then the Trainer in terms of which PRs to merge when

dvrogozh added a commit to dvrogozh/accelerate that referenced this issue Jun 7, 2024
Fixes: huggingface/transformers#31237

XPU backend is available in the stock PyTorch starting from
version 2.4, see [1]. This commit extends huggingface accelerate
to support XPU from both IPEX and the stock pytorch. IPEX is being
tried first.

See: pytorch/pytorch#114842
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
dvrogozh added a commit to dvrogozh/accelerate that referenced this issue Jun 7, 2024
Fixes: huggingface/transformers#31237

XPU backend is available in the stock PyTorch starting from
version 2.4, see [1]. This commit extends huggingface accelerate
to support XPU from both IPEX and the stock pytorch. IPEX is being
tried first.

See: pytorch/pytorch#114842
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
dvrogozh added a commit to dvrogozh/accelerate that referenced this issue Jun 7, 2024
Fixes: huggingface/transformers#31237

XPU backend is available in the stock PyTorch starting from
version 2.4, see [1]. This commit extends huggingface accelerate
to support XPU from both IPEX and the stock pytorch. IPEX is being
tried first.

See: pytorch/pytorch#114842
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
dvrogozh added a commit to dvrogozh/accelerate that referenced this issue Jun 10, 2024
Fixes: huggingface/transformers#31237

XPU backend is available in the stock PyTorch starting from
version 2.4, see [1]. This commit extends huggingface accelerate
to support XPU from both IPEX and the stock pytorch. IPEX is being
tried first.

See: pytorch/pytorch#114842
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
@dvrogozh
Copy link
Contributor Author

dvrogozh commented Jun 12, 2024

I filed one more issue affecting some (not all) examples and tests - cuda path is wrongly hit sometimes on loss.backward():

dvrogozh added a commit to dvrogozh/accelerate that referenced this issue Jun 13, 2024
Fixes: huggingface/transformers#31237

XPU backend is available in the stock PyTorch starting from
version 2.4, see [1]. This commit extends huggingface accelerate
to support XPU from both IPEX and the stock pytorch. IPEX is being
tried first.

See: pytorch/pytorch#114842
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
ydshieh pushed a commit to dvrogozh/transformers that referenced this issue Jun 14, 2024
Fixes: huggingface#31237

XPU backend is available in the stock PyTorch starting from
version 2.4, see [1]. This commit extends huggingface transformers
to support XPU from both IPEX and the stock pytorch. IPEX is being
tried first.

See: pytorch/pytorch#114842
Requires: huggingface/accelerate#2825
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
ydshieh pushed a commit that referenced this issue Jun 14, 2024
* xpu: support xpu backend from stock pytorch (>=2.4)

Fixes: #31237

XPU backend is available in the stock PyTorch starting from
version 2.4, see [1]. This commit extends huggingface transformers
to support XPU from both IPEX and the stock pytorch. IPEX is being
tried first.

See: pytorch/pytorch#114842
Requires: huggingface/accelerate#2825
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

* xpu: enable gpt2 and decision_transformer tests for xpu pytorch backend

Note that running xpu tests requires TRANSFORMERS_TEST_DEVICE_SPEC=spec.py
passed to the test runner:

  import torch
  DEVICE_NAME = 'xpu'
  MANUAL_SEED_FN = torch.xpu.manual_seed
  EMPTY_CACHE_FN = torch.xpu.empty_cache
  DEVICE_COUNT_FN = torch.xpu.device_count

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

---------

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants