Skip to content
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.

cannot specify all start task failed nodes or unusable with a specific node id #249

Closed
veonua opened this issue Dec 17, 2018 · 1 comment

Comments

@veonua
Copy link

veonua commented Dec 17, 2018

Problem Description

File "convoy/batch.py", line 825, in wait_for_pool_ready
File "convoy/batch.py", line 680, in _block_for_nodes_ready
RuntimeError: Please inspect both the node status above and files found within the caffe-cpu//startup directory (in the current working directory) if available. If this error appears non-transient, please submit an issue on GitHub, if not you can delete these nodes with "pool nodes del --all-start-task-failed" first prior to the resize operation.
[30591] Failed to execute script shipyard

veon@ubuntu:~/work/InvoiceRecognition/services/config$ ../batch-shipyard-3.6.1-cli-linux-x86_64 pool nodes del --all-start-task-failed
Traceback (most recent call last):
File "shipyard.py", line 2812, in
File "site-packages/click/core.py", line 764, in call
File "site-packages/click/core.py", line 717, in main
File "site-packages/click/core.py", line 1137, in invoke
File "site-packages/click/core.py", line 1137, in invoke
File "site-packages/click/core.py", line 1137, in invoke
File "site-packages/click/core.py", line 956, in invoke
File "site-packages/click/core.py", line 555, in invoke
File "site-packages/click/decorators.py", line 64, in new_func
File "site-packages/click/core.py", line 555, in invoke
File "shipyard.py", line 1815, in nodes_del
File "convoy/fleet.py", line 3576, in action_pool_nodes_del
ValueError: cannot specify all start task failed nodes or unusable with a specific node id
[30682] Failed to execute script shipyard

Batch Shipyard Version

3.6.1

Steps to Reproduce

just tried to run my pool

Expected Results

be able to clean up pool using command in the error message

Actual Results

cannot specify all start task failed nodes or unusable with a specific node id

@alfpark
Copy link
Collaborator

alfpark commented Dec 17, 2018

This issue was identified and fixed in master@17e26f091b92a8606bec1492f9948a0c02ba08af.

Workarounds:

  1. Get fix via latest master or Docker cli image.
  2. Delete nodes using Batch Explorer or the Portal.
  3. Delete the entire pool.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants