You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The order in which NVML enumerates devices has no guarantees of consistency between reboots. For that reason it is recommended that devices be looked up by their PCI ids or UUID. See nvmlDeviceGetHandleByUUID() and nvmlDeviceGetHandleByPciBusId_v2().
If this is true, it would mean any tool that assumes the same order and configures devices based on their Gpu::enumerate() index is open to the possibility to have config being applied to the wrong device after a reboot.
One option in the mean time would be to query devices by their PCI Bus ID. Its is currently available in nvapi-rs. The downside on using PCI Bus ID is that a slot change for any device invalidates the config.
Ideally we would expose UUID via nvmlDeviceGetUUID , and get the handle via nvmlDeviceGetHandleByUUID. Alternatively, we could simply attach the device UUID to GpuInfo in Gpu::info() so developers can select the correct device through iteration.
Does this sound correct or have I missed something?
Thanks again
EDIT:
It looks like I was referring to the incorrect API documentation. The NVAPI docs say for NvAPI_EnumPhysicalGPUs:
With drivers 105.00 and up, all physical GPU handles are constant. Physical GPU handles are constant as long as the GPUs are not physically moved and the SBIOS VGA order is unchanged.
So it appears that at least we are OK to assume enumeration is stable across reboots. Still, it would be nice to get the UUID for the devices if possible.
The text was updated successfully, but these errors were encountered:
Hi, thanks so much for creating these awesome bindings. You have save us so much time!
I'm looking for a way to configure GPUs in a stable way across reboots and slot changes. Is there any way to enumerate and look up GPUs by UUID?
Existing tools I've seen such as nvoclock address GPUs by their index in
Gpu::enumerate()
. The Nvidia docs note that device enumeration order is not guaranteed to be consistent across reboots:https://docs.nvidia.com/deploy/nvml-api/group__nvmlDeviceQueries.html#group__nvmlDeviceQueries_1g4cc7ff5253d53cc97b1afb606d614888
If this is true, it would mean any tool that assumes the same order and configures devices based on their
Gpu::enumerate()
index is open to the possibility to have config being applied to the wrong device after a reboot.One option in the mean time would be to query devices by their PCI Bus ID. Its is currently available in nvapi-rs. The downside on using PCI Bus ID is that a slot change for any device invalidates the config.
Ideally we would expose UUID via
nvmlDeviceGetUUID
, and get the handle vianvmlDeviceGetHandleByUUID
. Alternatively, we could simply attach the device UUID toGpuInfo
inGpu::info()
so developers can select the correct device through iteration.Does this sound correct or have I missed something?
Thanks again
EDIT:
It looks like I was referring to the incorrect API documentation. The NVAPI docs say for
NvAPI_EnumPhysicalGPUs
:So it appears that at least we are OK to assume enumeration is stable across reboots. Still, it would be nice to get the UUID for the devices if possible.
The text was updated successfully, but these errors were encountered: