Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move qa model to device to allow fsdp test to pass #1829

Open
wants to merge 1 commit into
base: transformers_4_49
Choose a base branch
from

Conversation

skaulintel
Copy link
Collaborator

allows the following pytests to run.

RUN_SLOW=true python -m pytest tests/test_fsdp_examples.py -v -s --token=*** --device gaudi3

solves the following error

[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/fake_tensor.py", line 885, in merge_devices
[rank3]:     raise RuntimeError(
[rank3]: torch._dynamo.exc.BackendCompilerFailed: backend='hpu_backend' raised:
[rank3]: RuntimeError: Unhandled FakeTensor Device Propagation for aten.masked_fill.Scalar, found two different devices hpu:0, cpu

or the following error if running the python command directly

[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/_init_utils.py", line 1035, in _move_states_to_device
[rank3]:     param.data = param.to(device_from_device_id)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 50, in __setattr__
[rank3]:     return object.__setattr__(self_, name, value)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 73, in __torch_function__
[rank3]:     new_args[0].change_device_placement(new_args[1].device)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/core/weight_sharing.py", line 42, in __getattribute__
[rank3]:     return object.__getattribute__(self_, name)
[rank3]: AttributeError: 'HabanaParameterWrapper' object has no attribute 'change_device_placement'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant