-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tests] use torch_device
instead of auto
for model testing
#29531
Conversation
Hi @faaany - thanks for opening this PR! Using device_map="auto" is necessary for this test - it's checking that beam search works when the model is split across devices. If it doesn't work with XPU, then you can add a skip with unittest e.g. if "xpu" in torch_device:
return unittest.skip("device_map='auto' does not work with XPU devices") |
Hi @amyeroberts , thanks so much for reviewing this PR! I updated my patch and used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for handling!
Just a small nit
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
done, thx! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
`return unittest.skip()` used in the `test_model_parallel_beam_search` in skip condition for xpu did not actually mark test to be skipped running under pytest: * 148 passed, 1 skipped Other tests use `self.skipTest()`. Reusing this approach and moving the condition outside the loop (since it does not depend on it) allows to skip for xpu correctly: * 148 skipped Secondly, `device_map="auto"` is now implemented for XPU for IPEX>=2.5 and torch>=2.6, so we can now enable these tests for XPU for new IPEX/torch versions. Fixes: 1ea3ad1 ("[tests] use `torch_device` instead of `auto` for model testing (huggingface#29531)") Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
`return unittest.skip()` used in the `test_model_parallel_beam_search` in skip condition for xpu did not actually mark test to be skipped running under pytest: * 148 passed, 1 skipped Other tests use `self.skipTest()`. Reusing this approach and moving the condition outside the loop (since it does not depend on it) allows to skip for xpu correctly: * 148 skipped Secondly, `device_map="auto"` is now implemented for XPU for IPEX>=2.5 and torch>=2.6, so we can now enable these tests for XPU for new IPEX/torch versions. Fixes: 1ea3ad1 ("[tests] use `torch_device` instead of `auto` for model testing (#29531)") Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
…ngface#35742) `return unittest.skip()` used in the `test_model_parallel_beam_search` in skip condition for xpu did not actually mark test to be skipped running under pytest: * 148 passed, 1 skipped Other tests use `self.skipTest()`. Reusing this approach and moving the condition outside the loop (since it does not depend on it) allows to skip for xpu correctly: * 148 skipped Secondly, `device_map="auto"` is now implemented for XPU for IPEX>=2.5 and torch>=2.6, so we can now enable these tests for XPU for new IPEX/torch versions. Fixes: 1ea3ad1 ("[tests] use `torch_device` instead of `auto` for model testing (huggingface#29531)") Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
What does this PR do?
When running test cases under the models folder on XPU, I found that many model tests fail at the same test
test_model_parallel_beam_search
, e.g.FAILED tests/models/bigbird_pegasus/test_modeling_bigbird_pegasus.py::BigBirdPegasusModelTest::test_model_parallel_beam_search - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and xpu:0! (when che
This is because
device_map="auto"
is used. As elaborated in this PR, the device_map="auto" mechanism is still not mature on XPU, causing model to be loaded on CPU, rather than on XPU.If there is no particular reason for using
auto
, I would suggest usingtorch_device
insteadauto
, becausetorch_device
is more specific thanauto
and we don't have the need to useauto
, e.g. large model inference, in our tests anyway. WDYT? @ArthurZucker