Immutable device ID #5

jdsaund · 2021-03-03T21:14:56Z

Hi, thanks so much for creating these awesome bindings. You have save us so much time!

I'm looking for a way to configure GPUs in a stable way across reboots and slot changes. Is there any way to enumerate and look up GPUs by UUID?

Existing tools I've seen such as nvoclock address GPUs by their index in Gpu::enumerate(). The Nvidia docs note that device enumeration order is not guaranteed to be consistent across reboots:
https://docs.nvidia.com/deploy/nvml-api/group__nvmlDeviceQueries.html#group__nvmlDeviceQueries_1g4cc7ff5253d53cc97b1afb606d614888

The order in which NVML enumerates devices has no guarantees of consistency between reboots. For that reason it is recommended that devices be looked up by their PCI ids or UUID. See nvmlDeviceGetHandleByUUID() and nvmlDeviceGetHandleByPciBusId_v2().

If this is true, it would mean any tool that assumes the same order and configures devices based on their Gpu::enumerate() index is open to the possibility to have config being applied to the wrong device after a reboot.

One option in the mean time would be to query devices by their PCI Bus ID. Its is currently available in nvapi-rs. The downside on using PCI Bus ID is that a slot change for any device invalidates the config.

Ideally we would expose UUID via nvmlDeviceGetUUID , and get the handle via nvmlDeviceGetHandleByUUID. Alternatively, we could simply attach the device UUID to GpuInfo in Gpu::info() so developers can select the correct device through iteration.

Does this sound correct or have I missed something?

Thanks again

EDIT:
It looks like I was referring to the incorrect API documentation. The NVAPI docs say for NvAPI_EnumPhysicalGPUs:

With drivers 105.00 and up, all physical GPU handles are constant. Physical GPU handles are constant as long as the GPUs are not physically moved and the SBIOS VGA order is unchanged.

So it appears that at least we are OK to assume enumeration is stable across reboots. Still, it would be nice to get the UUID for the devices if possible.

The text was updated successfully, but these errors were encountered:

arcnmx added this to the v0.2.0 milestone Jun 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Immutable device ID #5

Immutable device ID #5

jdsaund commented Mar 3, 2021 •

edited

Loading

Immutable device ID #5

Immutable device ID #5

Comments

jdsaund commented Mar 3, 2021 • edited Loading

jdsaund commented Mar 3, 2021 •

edited

Loading