xpu: support xpu backend from stock pytorch (>=2.4) #31238

dvrogozh · 2024-06-04T17:06:57Z

XPU backend is available in the stock PyTorch starting from version 2.4, see [1]. This commit extends huggingface transformers to support XPU from both IPEX and the stock pytorch. IPEX is being tried first.

Raising this PR as WIP and Draft to facilitate further discussion around XPU backend enabling in huggingface and be able to communicate observed XPU issues back to PyTorch.

See: pytorch/pytorch#114842
Requires: huggingface/accelerate#2825

cc: @muellerzr, @EikanWang, @jgong5, @kding1, @sywangyi

src/transformers/utils/import_utils.py

muellerzr

Great! This PR should be merged in-tandem with the accelerate one here: huggingface/accelerate#2825

ydshieh · 2024-06-06T12:56:59Z

Happy to see this addition 🚀 ! Just wondering if now we should be careful with the names like Pytorch XPU and IPEX XPU. (and sorry if this doesn't make sense 😅 )

src/transformers/testing_utils.py

src/transformers/utils/import_utils.py

dvrogozh · 2024-06-07T17:04:31Z

I tried this PR (+ huggingface/accelerate#2825 on which it depends) as much as I could in the IPEX-CPU, IPEX-XPU, Pytorch-XPU, Pytorch-CPU scenarios. Tried to run some tests from accelerate and transformers and some examples from transformers. All seem to work engaging with XPU when expected. I promote these PRs from drafts for the qualified review. Let me know if any concerns or any feedback needs to be addressed.

dvrogozh · 2024-06-12T16:59:19Z

I added one more commit to enable some tests for xpu backend.

dvrogozh · 2024-06-13T15:06:12Z

Applied python utils/check_copies.py --fix_and_overwrite to propagate change in gpt2 to decision_transformer. This fixes failure noted by ci. Test for the later passes for xpu backend.

muellerzr

Thanks! Overall this looks fine to make, and makes sense why we need to adjust models/decision_transformer/... (to get the ipex patches in).

PR has been merged on the accelerate side, overall this seems good to me however: should we limit the accelerate version required for the xpu support to the new accelerate version? (to come out next month)

cc @amyeroberts

HuggingFaceDocBuilderDev · 2024-06-13T15:30:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

dvrogozh · 2024-06-13T15:45:15Z

however: should we limit the accelerate version required for the xpu support to the new accelerate version? (to come out next month)

@muellerzr : By bumping accelerate version to 0.32.0?

transformers/setup.py

Line 99 in c624d5b

"accelerate>=0.21.0",

muellerzr · 2024-06-13T16:18:36Z

By bumping accelerate version to 0.32.0?

We most certainly shouldn't do that :)

Actually, I think we're fairly okay, as accelerate will do a passthrough and IIUC this PR doesn't break old behavior, correct? (Basically, per my understanding if users run an old accelerate version nothing will break, right?)

The question is if we should have a flag for a minimum accelerate version if they are on the xpu branch/logic

dvrogozh · 2024-06-13T16:32:19Z

Actually, I think we're fairly okay, as accelerate will do a passthrough and IIUC this PR doesn't break old behavior, correct?

Yes, i think so. Till users are within previous usages (with IPEXes) nothing should change for them and be compatible.

The question is if we should have a flag for a minimum accelerate version if they are on the xpu branch/logic

New accelerate is indeed required on xpu branch, otherwise there will be runtime error. So check will be useful. I will add.

src/transformers/utils/import_utils.py

amyeroberts

Thanks for adding this!

Just one question on the availability of torch.xpu which we might have to take care of in the testing utils

src/transformers/testing_utils.py

dvrogozh · 2024-06-13T17:20:30Z

src/transformers/utils/import_utils.py

    import torch

+    if is_ipex_available():
+        import intel_extension_for_pytorch  # noqa: F401
+    elif not is_accelerate_available("0.33.0.dev"):


@muellerzr : I added this check, since python version comparison evaluates 0.32.0.dev0 >= 0.32.0 as False, so I compared with 0.32.0.dev and I am not sure that this is correct way. Please, advice.

Also, I can't raise exception here since is_torch_xpu_available is on a generic path and will fail non-xpu cases. And without exception the error which end user gets looks quite similar to what he will get running with wrong accelerate version. Do you have an idea where to raise exception notifying user that accelerate version is wrong?

Ok, so, I remove accelerate check from is_torch_xpu_available() since I thought this function does its job and no need to add this here. And I added check with raising exception in here:

transformers/src/transformers/training_args.py

Lines 2138 to 2140 in b767282

elif is_torch_xpu_available():

device = torch.device("xpu:0")

torch.xpu.set_device(device)

amyeroberts

LGTM - thanks for enabling this!

muellerzr

Thanks! LG2M as well :) (We can revisit the 0.32.0.dev after 0.32.0 is out, I'll keep it in my notes)

src/transformers/testing_utils.py

ydshieh · 2024-06-14T13:42:33Z

Hi @dvrogozh

Thank you for this support. As mentioned by @faaany and my comment, it would be better not to include things like

BACKEND_MANUAL_SEED["xpu"]
BACKEND_EMPTY_CACHE["xpu"]

in this PR (so far) and let the user to have them define when they need to use other device.

Happy to revise that design in a separate PR.

dvrogozh · 2024-06-14T15:00:31Z

As mentioned by @faaany and #31402 (comment), it would be better not to include things like

@ydshieh : removed. Used TRANSFORMERS_TEST_DEVICE_SPEC=spec.py on my side:

import torch

DEVICE_NAME = 'xpu'

MANUAL_SEED_FN = torch.xpu.manual_seed
EMPTY_CACHE_FN = torch.xpu.empty_cache
DEVICE_COUNT_FN = torch.xpu.device_count

dvrogozh · 2024-06-14T15:08:40Z

fyi, I think CI failure is unrelated to this PR. Needs re-triggering?

FAILED examples/tensorflow/test_tensorflow_examples.py::ExamplesTests::test_run_image_classification - ValueError: The repository for hf-internal-testing/cats_vs_dogs_sample contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/hf-internal-testing/cats_vs_dogs_sample.

ydshieh · 2024-06-14T15:25:08Z

For

contains custom code which must be executed to correctly load the dataset

not related to this PR.

And you can rebase on main to include #31407 that will make that failure disappear

ydshieh

Thanks again. I think everything runs smooth on your side, right? Will merge once I get a confirmation from your side 💯 !

dvrogozh · 2024-06-14T15:46:14Z

And you can rebase on main to include #31407 that will make that failure disappear

Hm. Code is already on top of latest master and includes #31407.

I think everything runs smooth on your side, right?

For xpu backend w/ spec.py? yes. And I ran non-xpu stuff as much as I locally could.

ydshieh · 2024-06-14T18:10:16Z

OK, I will check CI. Thank you again for contributing 💯 !

Fixes: huggingface#31237 XPU backend is available in the stock PyTorch starting from version 2.4, see [1]. This commit extends huggingface transformers to support XPU from both IPEX and the stock pytorch. IPEX is being tried first. See: pytorch/pytorch#114842 Requires: huggingface/accelerate#2825 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

Note that running xpu tests requires TRANSFORMERS_TEST_DEVICE_SPEC=spec.py passed to the test runner: import torch DEVICE_NAME = 'xpu' MANUAL_SEED_FN = torch.xpu.manual_seed EMPTY_CACHE_FN = torch.xpu.empty_cache DEVICE_COUNT_FN = torch.xpu.device_count Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

dvrogozh · 2024-06-14T20:07:31Z

Thank you for the merge. If all will be good, should land in transformers==4.42.0 as far as I could tell.

dvrogozh mentioned this pull request Jun 4, 2024

xpu: Support new PyTorch XPU backend (>=2.4) #31237

Closed

jgong5 reviewed Jun 5, 2024

View reviewed changes

src/transformers/utils/import_utils.py Outdated Show resolved Hide resolved

dvrogozh force-pushed the xpu branch from 68a1c24 to 53a0d21 Compare June 5, 2024 00:20

muellerzr reviewed Jun 5, 2024

View reviewed changes

ydshieh reviewed Jun 6, 2024

View reviewed changes

src/transformers/testing_utils.py Show resolved Hide resolved

ydshieh reviewed Jun 6, 2024

View reviewed changes

src/transformers/utils/import_utils.py Show resolved Hide resolved

dvrogozh mentioned this pull request Jun 7, 2024

xpu: support xpu backend from stock pytorch (>=2.4) huggingface/accelerate#2825

Merged

dvrogozh changed the title ~~[WIP] xpu: support xpu backend from stock pytorch (>=2.4)~~ xpu: support xpu backend from stock pytorch (>=2.4) Jun 7, 2024

dvrogozh marked this pull request as ready for review June 7, 2024 17:04

dvrogozh mentioned this pull request Jun 12, 2024

xpu: gradient checkpointing wrongly hits cuda path running on non-cuda devices pytorch/pytorch#128478

Closed

dvrogozh force-pushed the xpu branch from 53a0d21 to 6beba08 Compare June 12, 2024 16:58

dvrogozh force-pushed the xpu branch from 6beba08 to 7d1b0fb Compare June 13, 2024 15:03

muellerzr reviewed Jun 13, 2024

View reviewed changes

dvrogozh force-pushed the xpu branch from 7d1b0fb to a51ff1d Compare June 13, 2024 17:05

dvrogozh commented Jun 13, 2024

View reviewed changes

src/transformers/utils/import_utils.py Outdated Show resolved Hide resolved

dvrogozh force-pushed the xpu branch from a51ff1d to d32e502 Compare June 13, 2024 17:16

amyeroberts reviewed Jun 13, 2024

View reviewed changes

src/transformers/testing_utils.py Outdated Show resolved Hide resolved

dvrogozh commented Jun 13, 2024

View reviewed changes

dvrogozh force-pushed the xpu branch from d32e502 to 8b00a26 Compare June 13, 2024 19:01

amyeroberts approved these changes Jun 13, 2024

View reviewed changes

muellerzr approved these changes Jun 13, 2024

View reviewed changes

faaany reviewed Jun 14, 2024

View reviewed changes

src/transformers/testing_utils.py Outdated Show resolved Hide resolved

dvrogozh force-pushed the xpu branch from 8b00a26 to 92d7989 Compare June 14, 2024 14:57

ydshieh approved these changes Jun 14, 2024

View reviewed changes

dvrogozh added 2 commits June 14, 2024 21:00

ydshieh force-pushed the xpu branch from 92d7989 to 5ecf513 Compare June 14, 2024 19:01

ydshieh merged commit eed9ed6 into huggingface:main Jun 14, 2024
23 checks passed

simonlui mentioned this pull request Jun 22, 2024

Intel support for Pixart-Sigma city96/ComfyUI_ExtraModels#64

Merged

amyeroberts mentioned this pull request Jun 24, 2024

Fix is_torch_xpu_available for torch < 2.3 #31573

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xpu: support xpu backend from stock pytorch (>=2.4) #31238

xpu: support xpu backend from stock pytorch (>=2.4) #31238

dvrogozh commented Jun 4, 2024 •

edited

Loading

muellerzr left a comment

ydshieh commented Jun 6, 2024

dvrogozh commented Jun 7, 2024

dvrogozh commented Jun 12, 2024

dvrogozh commented Jun 13, 2024

muellerzr left a comment

HuggingFaceDocBuilderDev commented Jun 13, 2024

dvrogozh commented Jun 13, 2024

muellerzr commented Jun 13, 2024 •

edited

Loading

dvrogozh commented Jun 13, 2024

amyeroberts left a comment

dvrogozh Jun 13, 2024

dvrogozh Jun 13, 2024

amyeroberts left a comment

muellerzr left a comment

ydshieh commented Jun 14, 2024

dvrogozh commented Jun 14, 2024

dvrogozh commented Jun 14, 2024

ydshieh commented Jun 14, 2024

ydshieh left a comment

dvrogozh commented Jun 14, 2024

ydshieh commented Jun 14, 2024

dvrogozh commented Jun 14, 2024

	elif is_torch_xpu_available():
	device = torch.device("xpu:0")
	torch.xpu.set_device(device)

xpu: support xpu backend from stock pytorch (>=2.4) #31238

xpu: support xpu backend from stock pytorch (>=2.4) #31238

Conversation

dvrogozh commented Jun 4, 2024 • edited Loading

muellerzr left a comment

Choose a reason for hiding this comment

ydshieh commented Jun 6, 2024

dvrogozh commented Jun 7, 2024

dvrogozh commented Jun 12, 2024

dvrogozh commented Jun 13, 2024

muellerzr left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jun 13, 2024

dvrogozh commented Jun 13, 2024

muellerzr commented Jun 13, 2024 • edited Loading

dvrogozh commented Jun 13, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

dvrogozh Jun 13, 2024

Choose a reason for hiding this comment

dvrogozh Jun 13, 2024

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

muellerzr left a comment

Choose a reason for hiding this comment

ydshieh commented Jun 14, 2024

dvrogozh commented Jun 14, 2024

dvrogozh commented Jun 14, 2024

ydshieh commented Jun 14, 2024

ydshieh left a comment

Choose a reason for hiding this comment

dvrogozh commented Jun 14, 2024

ydshieh commented Jun 14, 2024

dvrogozh commented Jun 14, 2024

dvrogozh commented Jun 4, 2024 •

edited

Loading

muellerzr commented Jun 13, 2024 •

edited

Loading