-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RF: improve support for queue args #328
Conversation
Codecov Report
@@ Coverage Diff @@
## master #328 +/- ##
==========================================
+ Coverage 74.08% 74.31% +0.23%
==========================================
Files 35 35
Lines 2643 2667 +24
==========================================
+ Hits 1958 1982 +24
Misses 685 685
Continue to review full report at Codecov.
|
hm... so what is the benefit of moving to absolute imports exactly, or what problem(s) that would solve? |
I left a comment on #302, but most likely I didn't fully comprehend the situation, since most likely it relates to slurm execution... in either case - it must not be a problem of relative-vs-absolute imports but most likely incorrect invocation. I really do not think we should replace relative imports with absolute to address whatever problem it is |
@mgxd - when you say behave better across different environments, what do you mean? i do have a personal preference for relative imports in these relatively shallow projects. and in the past i have gotten bitten over an installed project vs a dev project by using absolute imports. |
@satra if anything, its good to clearly show where files and functions lie within the project, especially to new contributors. It's also recommended by pep8, and would fit in well given how basic our structure is (I think our max depth currently is 3). |
@mgxd the latest #328 does the trick, a.k.a. it does not crash but executes with arguments being properly passed to SLURM. Now, I understand that each subject will be submitted as a separate "job" to the cluster. Yet, is there any information on how to optimize the execution? For example, does it make sense to specify multiple CPUs per job or is heudiconv a single task and hence operates on one cpu per node? Likewise, would it help to increase the memory per CPU? Apologies for all these questions. I do have a 65 subject dataset for which these are all very helpful questions. I am also keeping track of all the steps and would be more than happy to contribute a working example at the end (I am also not using docker, and I have noticed an increased demand for non-docker examples). Thanks! |
@mgxd - here is a use case for relative imports (although a developer should be more careful):
In this scenario all the absolute imports would get picked from the installed location. Personally, i like the relative imports because of frame of reference. if there are many project cross-links i can easily see how relative imports could get as complicated (go up two and down three). i think they are both clear in shallow trees with only a few branches at the top of level. here are the two selling points for me:
|
Please let stick with relative imports, and concentrate on addressing the problem at hands which smells to be just an "incorrect" invocation specification (directly of a submodule instead of the |
to that extent the culprit is the https://github.com/nipy/heudiconv/blob/master/heudiconv/cli/run.py#L293 pyscript = op.abspath(inspect.getfile(inspect.currentframe())) from "good old" times of heudiconv being a single script, so that is how |
can't pyscript simply be |
in principle it could be I guess. I just dislike hardcoding anything ;) |
…r queue submissions
Ok, I've reverted imports due to popular demand 😆 @fhopp on our cluster, I run heudiconv across a hundred of subjects concurrently; heudiconv is a single core application (currently), and I don't think I have encountered a time where I needed more than 2GB of memory. |
@mgxd Just out of curiosity, how long does generating the heuristic file across all subjects take?
I was also thinking whether it would be better to split the subject into chunks of five rather than submitting the full list as done above? Finally, it appears that the above command submits just one job to each node. While I understand that heudiconv runs on a single CPU, is it possible to have each node work on multiple jobs concurrently? Say one node has 8 cores, is it possible to submit 8 jobs to that node? |
@fhopp actually I think its on our end, I see a bottleneck at heudiconv/heudiconv/cli/run.py Lines 251 to 253 in 3467341
I'm working on a fix now |
@fhopp - hopefully you'll see a drastic speed-up after the last few commits. |
@mgxd Awesome, trying this now. From what I see so far, this increases the processing time before the batch job(s) are submitted. Will check and send an update on the speed up. I still think it would be best if these jobs submissions can run on single cores on each node in the cluster. |
@mgxd Confirmed! Job submission is almost instant. |
If you specify |
@mgxd short answer: yes, please merge, I tried with your latest fix and job submission is instant. As for execution, it was in fact an issue of my SLURM config. It was set to a non-shared, exclusive mode in which CPUs and Memory in the cluster were not seen as "consumable resources": https://slurm.schedmd.com/cons_res.html |
Yes, that would be awesome. We have a small collection of User tutorials on our docs that would be a great place for it |
Closes #302
Changes
queue-args
to submission scriptqueue-args
fromqueue
submissionimports changed from relative to absolute, to behave better across different environments.