-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conftest: get_device_name() without device occupation #1826
Conftest: get_device_name() without device occupation #1826
Conversation
@uartie, would you like to have a look? I'm reviewing as well |
@dsmertin, @12010486, @regisss what if we use |
conftest.py
Outdated
|
||
result = subprocess.run(f"python -c '{script}'", shell=True, capture_output=True, text=True) | ||
|
||
return result.stdout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return result.stdout.strip()
since we check if not name
by caller (i.e. empty string).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is, on non-gaudi I get:
result = CompletedProcess(args="python -c 'import habana_frameworks.torch.hpu as torch_hpu\nprint(torch_hpu.get_device_name())'", returncode=0, stdout='\n', stderr='/usr/lib/python3.10/inspect.py:288: FutureWarning: `torch.distributed.reduce_op` is deprecated, please use `torch.distributed.ReduceOp` instead\n return isinstance(object, types.FunctionType)\n/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/__init__.py:135: UserWarning: Device not available\n warnings.warn("Device not available")\n')
I think that should work yes 👍 |
@dsmertin please try this solution. |
Meanwhile, would be good to fix the source |
Ok, I do have a concern in using get_device_name(), as when used on gaudi3, it will output gaudi3, while currently all the tests are either enabled for gaudi2 (and 3 by proxy) or also covering gaudi, a subset of those. @dsmertin, could you check how a small test is running on g3? |
Hold on, it seems now we are taking care of it correctly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I think you'll have to run make style
too (there should be a blank line after the import).
2228ddc
to
bb1a89c
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
On "main" branch, we are still waiting for #1807 to be merged. #1807 is necessary to effectively use @hsubramony has already pulled #1807 into transformers_4_49 branch via #1824 |
Thanks for the explanation! At this point, it seems the PR got merged in the wrong branch @regisss. or that should be at least merged in transformers_4_49 |
It is already in the transformers_4_49 branch |
What does this PR do?
This change fixes a problem with deepspeed test_examples.
torch_hpu.get_device_name()
has been moved to a separate process, because it occupies a device and doesn't release it.For tests what work in a separate process if there's a need for all devices not all of them will be available due the occupation from current pytest process.
Before submitting