This repository has been archived by the owner on Jan 25, 2018. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
Need to resubmit some cluster jobs #3
Comments
@braymp wrote: |
Related to (same as?) #2 |
@LeeKamentsky wrote: |
Having #2 fixed will tell you that the job was killed - this issue should deal with the problem of queue timeouts being per-job rather than per-task. |
It looks like this was caused by an IT issue. There was no queue timeout. The job failed to start because a drive was not properly mounted on one of the nodes.The good news is that it wasn't a queue timeout, so we are good with our strategy and can close this. Issue #2 will deal with reporting situations like this. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
(MOVED from CellProfiler/CellProfiler CellProfiler/CellProfiler#1522)
I've had to resubmit cluster jobs multiple times for Batch 17 http://imagewebrhel6/batchprofiler/cgi-bin/ViewBatch.py?batch_id=17 . Most jobs within this batch completed initially, but some were left in the SUBMITTED stage overnight. Once I killed them and then re-submitted, some more finished, but not all. There are still a few jobs that have not completed, though just now they are in the RUNNING phase after another round of kill and resubmit.
It may be that the cluster is busy, but at least a couple times, killing and then re-submitting got some more to start to go to RUNNING, so it seems like an issue to be looked into. Sorry, I can't think of any other info to debug this.
The text was updated successfully, but these errors were encountered: