Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slurm profile is dead #31

Closed
cmeesters opened this issue Feb 28, 2020 · 9 comments
Closed

slurm profile is dead #31

cmeesters opened this issue Feb 28, 2020 · 9 comments

Comments

@cmeesters
Copy link

cmeesters commented Feb 28, 2020

Hi,

As the command line interface is non-functional (see #15 ) and does not pick up the information available to slurm and will never be able to cover all corner-cases (see snakemake/snakemake#248 ), wouldn't it be a good idea to adopt/re-factor the existing scripts to cover some minimal slurm aspects and to inform potential users of this repo that amendments have to be made?

The later could be incorporated in specific examples.

Best,
Christian

edit: sorry for the provocative title - this way it will hopefully not be overlooked.

@percyfal
Copy link
Collaborator

Hi,

yes, you are right, a re-factorisation is probably in place (as I also mentioned in #15). As you seem to have given this some thought, do you have any specifics in mind when it comes to minimal slurm aspects (e.g. account, log directories, partition)?

Cheers,

Per

ps - no worries about the title, although it did catch my attention ;-)

@cmeesters
Copy link
Author

Hi,

well, apparently there are different philosophies: whilst one can certainly ask for a parameter like a partition in a cookicutter setup, usually there are several to consider.

As I tried to make clear, it hardly makes sense to think in terms of 'minimal aspects' (see snakemake/snakemake#248). Also, it is impossible to imaging the opposite 'maximum' coverage.

Why?

  • clusters have different partition for scheduling reasons, hence a configuration needs to be rule based (like it is now). E.g. smp tools may go into a smp partition, tools scaling to a full node or more (e.g. MPI applications) into a partition, which covers this. Other cluster do have different namings, but I think that would be the 'basic' need.
  • then there are different partition for other reasons (e.g. architectures, accelerators, etc. etc.)
  • then there are (sometimes) distinctions to be made for different partitions, due to different time constrains (to be inferred from the different rule based times)
  • then there may be / are different plugins and CLI to cover -- every admin team is writing different plugins for SLURM
  • then there different I/O patterns to consider and different file systems to address in different stages prior to execution (e.g. reserving an ad hoc file system due to an input flag in the submit script and subsequently in the jobscript the stage-in; also see adding a flag to inform about I/O pattern resp. behaviour snakemake/snakemake#188 )

My point of view is that of a comp. scientist (admin point of view, if you wish) who want to provide a (curated) thin wrapper and workflows -- and will not have to code covering the corner cases over and over again.

Hence, my conclusion is that the cookiecutter approach is to be taken with more than a grain of salt. My suggestion would be a message like: Folks, download a snakemake-profile and give it a go, but consider the additional snippets, too. On the supply side (you/us) that would mean: A basic setup (essentially the existing one minus CLI) plus an annotated set of snippets ("how to implement x and y for your cluster")).

Yet, this is all dependent on whether Johannes drops the cluster-config entirely (see snakemake/snakemake#248). Already now, good workflows, need configs, schemas and lots of boilerplate. This is cumbersome. Also on the snakemake-side some things need improvement.

Cheers,
Chris

@mwort
Copy link

mwort commented Feb 29, 2020

@cmeesters I think we can all agree that slurm setups vary significantly across clusters and the slurm-submit-advanced.py script was always an experimental option. I'm missing concrete suggestions for improvement in your answers a bit.
I have drafted a more open approach to sbatch argument parsing in #33 which also takes care of the eventuality that snakemake completely drops its --cluster-config argument. See PR #33 and the README.md. Still a draft for now, needs testing and your opinions.

@cmeesters
Copy link
Author

I'm missing concrete suggestions for improvement in your answers a bit.
Admitted.

Also, your PR would get around the cluster-config issue. However, I would rather see a slurm executor in snakemake (in the long run) and contribute to it. That would require a cluster-config by snakemake: otherwise only (a) standard path(s) (like in your PR) can be searched. That would render the idea of having one snakemake instance per cluster impossible and would hand the task back to the user.

It might be that I am missing the point, here. Is this the case?

@mwort
Copy link

mwort commented Mar 2, 2020

I'm not sure I fully understand. Do you still want your users to create their own cluster-config files or not (I dont think the latter will be feasible)?
I dont see why the use of profiles would prevent you from having only one snakemake instance. You could just tell your users to generate the profile with the default_cluster_config for every project they setup (or wrap the cookiecutter command in a globally executable mini-shell script that also sets some defaults) and/or you could provide a central slurm profile (see searched paths in here) that will work for the majority of users. Precisely because all slurm setups are so heterogeneous, this kind of stuff should be left out of Snakemake (there are other features/executors I would have left out as well).

@percyfal
Copy link
Collaborator

percyfal commented Mar 4, 2020

@cmeesters after reading your comments here and the snakemake issues #258 and #188 I think I understand better now, and I share some of the sentiments. My use of profiles was more from an end-user perspective, with job submission made easier and cleaner, and it made it possible to automate (albeit with a big hack) readjustment of resources, in particular for jobs that fail due to lack of memory. That slurm environments are heterogenous became painfully clear when I set up the tests and configured the slurm image; for instance, partitions can be named at will, and there's no end to the variation in constraints and hardware configurations.

I have made few adjustments to the profile as of late, and I must confess it stems from some confusion on my part on where to best configure resources and cluster-related configurations. Now that cluster-config is dead (?), presumably that means using the resources(?). However, as they are limited to integers, it would not be possible to assign specific partitions (e.g.) to rules (some changes discussed in Snakemake-Profiles/pbs-torque#2 would have to come in effect). The addition of envmodules has certainly made it easier to work with our HPC module system, but at the same time makes workflows less portable. In the same vain, one might wonder how much should actually be put directly in rule resources but rather in a separate configuration file that caters to the configuration of a specific HPC. @mwort are these examples related to your mention of leaving out features/executors - do you have any concrete examples? In the end, this discussion is probably better suited for issue #258, but I thought I'd share some of my random thoughts here for reference.

@cmeesters
Copy link
Author

Currently I am working, albeit with very low prio, on a proposal to Johannes. A set of features, I would like to cover, plus layout, in a SlurmClusterExecutor; to be derived from the ClusterExecutor in SM. This however depends on the cluster-config. That config, as I will argue, should be amended and checked, such that all sbatch flags plus a 'generic' string goes in there. I will contact Johannes on the implementation.

Same should be done with LSF, PBS/torque, etc. for if there is support for the weirdest of clouds, so should be the thought for HPC workload managers. A profile - from the admin point of view again - should then the thin wrapper to provide curated setups (configs, workflows).

BTW: No, environment modules do not impair the portability of a workflow. If portability is in desired, conda-specific params can be selected in addition.

@mwort
Copy link

mwort commented Mar 4, 2020

I mainly meant that if you add too many interfaces to SM you'll create a jack of all traits and those interfaces are rightly placed in the profiles where they can be much better maintained. That applies to tibanna as it does to SLURM. With every new interface, SM becomes harder to maintain and more buggy IMHO.

@percyfal
Copy link
Collaborator

Closing this issue, feel free to reopen for further discussions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants