Skip to content
This repository has been archived by the owner on Jan 25, 2018. It is now read-only.

Need to resubmit some cluster jobs #3

Closed
dlogan opened this issue Aug 28, 2015 · 5 comments
Closed

Need to resubmit some cluster jobs #3

dlogan opened this issue Aug 28, 2015 · 5 comments

Comments

@dlogan
Copy link
Member

dlogan commented Aug 28, 2015

(MOVED from CellProfiler/CellProfiler CellProfiler/CellProfiler#1522)
I've had to resubmit cluster jobs multiple times for Batch 17 http://imagewebrhel6/batchprofiler/cgi-bin/ViewBatch.py?batch_id=17 . Most jobs within this batch completed initially, but some were left in the SUBMITTED stage overnight. Once I killed them and then re-submitted, some more finished, but not all. There are still a few jobs that have not completed, though just now they are in the RUNNING phase after another round of kill and resubmit.

It may be that the cluster is busy, but at least a couple times, killing and then re-submitting got some more to start to go to RUNNING, so it seems like an issue to be looked into. Sorry, I can't think of any other info to debug this.

@dlogan
Copy link
Member Author

dlogan commented Aug 28, 2015

@braymp wrote:
I've noticed this myself. The 'short' queue is 2 hrs long; if it exceeds that limit, they're killed, but silently, with nothing in the error logs and no change to the status.

@dlogan
Copy link
Member Author

dlogan commented Aug 28, 2015

Related to (same as?) #2

@dlogan
Copy link
Member Author

dlogan commented Aug 28, 2015

@LeeKamentsky wrote:
Filed with IT as INC0070582

@LeeKamentsky
Copy link

Having #2 fixed will tell you that the job was killed - this issue should deal with the problem of queue timeouts being per-job rather than per-task.

@LeeKamentsky
Copy link

It looks like this was caused by an IT issue. There was no queue timeout. The job failed to start because a drive was not properly mounted on one of the nodes.The good news is that it wasn't a queue timeout, so we are good with our strategy and can close this. Issue #2 will deal with reporting situations like this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants