Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Immutable device ID #5

Open
jdsaund opened this issue Mar 3, 2021 · 0 comments
Open

Immutable device ID #5

jdsaund opened this issue Mar 3, 2021 · 0 comments
Milestone

Comments

@jdsaund
Copy link

jdsaund commented Mar 3, 2021

Hi, thanks so much for creating these awesome bindings. You have save us so much time!

I'm looking for a way to configure GPUs in a stable way across reboots and slot changes. Is there any way to enumerate and look up GPUs by UUID?

Existing tools I've seen such as nvoclock address GPUs by their index in Gpu::enumerate(). The Nvidia docs note that device enumeration order is not guaranteed to be consistent across reboots:
https://docs.nvidia.com/deploy/nvml-api/group__nvmlDeviceQueries.html#group__nvmlDeviceQueries_1g4cc7ff5253d53cc97b1afb606d614888

The order in which NVML enumerates devices has no guarantees of consistency between reboots. For that reason it is recommended that devices be looked up by their PCI ids or UUID. See nvmlDeviceGetHandleByUUID() and nvmlDeviceGetHandleByPciBusId_v2().

If this is true, it would mean any tool that assumes the same order and configures devices based on their Gpu::enumerate() index is open to the possibility to have config being applied to the wrong device after a reboot.

One option in the mean time would be to query devices by their PCI Bus ID. Its is currently available in nvapi-rs. The downside on using PCI Bus ID is that a slot change for any device invalidates the config.

Ideally we would expose UUID via nvmlDeviceGetUUID , and get the handle via nvmlDeviceGetHandleByUUID. Alternatively, we could simply attach the device UUID to GpuInfo in Gpu::info() so developers can select the correct device through iteration.

Does this sound correct or have I missed something?

Thanks again

EDIT:
It looks like I was referring to the incorrect API documentation. The NVAPI docs say for NvAPI_EnumPhysicalGPUs:

With drivers 105.00 and up, all physical GPU handles are constant. Physical GPU handles are constant as long as the GPUs are not physically moved and the SBIOS VGA order is unchanged.

So it appears that at least we are OK to assume enumeration is stable across reboots. Still, it would be nice to get the UUID for the devices if possible.

@arcnmx arcnmx added this to the v0.2.0 milestone Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants