You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.
I have two pools of Standard_NC6 low priority vm. They have been running fine for some time until today it got a start task failed error. I tried to reboot a few times but still same this error.
rmmod: ERROR: Module nouveau is not currently loaded
WARNING: nvidia-installer was forced to guess the X library path '/usr/lib'
and X module path '/usr/lib/xorg/modules'; these paths were not
queryable from the system. If X fails to find the NVIDIA X driver
module, please install the `pkg-config` utility and the X.Org
SDK/development package for your distribution and reinstall the
driver.
I have read this but to me it is not transient since I have tried reboot many times. Even tried delete and recreate the pools.
Steps to Reproduce
It seems random to me. They have been running fine until got stuck in this state.
It appears that the new nvidia-docker2 18.06 package has broken the installation. Although nvidia-docker2 is pinned, the dependency is not:
$ apt-get install nvidia-docker2=2.0.3+docker18.03.1-1
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
nvidia-docker2 : Depends: nvidia-container-runtime (= 2.0.0+docker18.03.1-1) but 2.0.0+docker18.06.0-1 is to be installed
I'll have to pin the dependent package installation as well to work around this issue. I'll hotfix this and release as soon as possible.
Problem Description
I have two pools of Standard_NC6 low priority vm. They have been running fine for some time until today it got a
start task failed
error. I tried to reboot a few times but still same this error.I have read this but to me it is not transient since I have tried reboot many times. Even tried delete and recreate the pools.
Steps to Reproduce
It seems random to me. They have been running fine until got stuck in this state.
Expected Results
The pools run fine and stable.
Actual Results
Suddenly stuck in
start task failed
Additional Logs
Additonal Comments
Also, why these vm got restarted while running fine?
The text was updated successfully, but these errors were encountered: