Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Improve documentation about TemplateFlow and Containers #1802

Merged
merged 9 commits into from
Oct 8, 2019

Conversation

oesteban
Copy link
Member

@oesteban oesteban commented Oct 2, 2019

This PR addresses #1801. The intent is to provide clearer documentation about how to run fMRIPrep via Singularity containers. In particular, the interaction between TemplateFlow and Singularity should be better described.

@oesteban
Copy link
Member Author

oesteban commented Oct 2, 2019

@oesteban oesteban marked this pull request as ready for review October 2, 2019 18:04
@oesteban
Copy link
Member Author

oesteban commented Oct 2, 2019

Tagging @dmd @JoffJones @utooley @oricon @a3sha2 @marcelzwiers @jooh, who have experienced issues, and proposed solutions to problems arising from Singularity + TemplateFlow. They are perfect to have a quick look and make sure the documentation now contains what they had to figure out the hard way.

@utooley
Copy link
Contributor

utooley commented Oct 2, 2019

Two thoughts on this, would it be useful to lay out the error message you might get (i.e. WARNING: Skipping user bind, non existent bind point (directory) in container) if

a) your Singularity config doesn't allow you to bind non-existent bind points, i.e. has not enabled overlay?
or b) doesn't auto-bind a directory such as \scratch that you reference in the documentation? I believe at some point I got errors trying to follow paths that I thought existed inside the container but don't by default unless your configuration enabled them.

@oesteban
Copy link
Member Author

oesteban commented Oct 2, 2019

Two thoughts on this, would it be useful to lay out the error message you might get (i.e. WARNING: Skipping user bind, non existent bind point (directory) in container) if

a) your Singularity config doesn't allow you to bind non-existent bind points, i.e. has not enabled overlay?
or b) doesn't auto-bind a directory such as \scratch that you reference in the documentation? I believe at some point I got errors trying to follow paths that I thought existed inside the container but don't by default unless your configuration enabled them.

You're right, this is not yet covered in the documentation. Your cluster is one example of these right? Do you mind if I ask you to try a few things over the next week?

@JoffJones
Copy link

Hi All,
Thank you for the documentation and examples of running fmriprep in singularity on HPCs. We are using a slightly different implementation of sbatch through the subprocess module in Jupyter Lab. Some differences include using the -C contain all flag, not using -clean-env, and I have also found that I have to bind mount /tmp:/tmp otherwise there is a persistent error with surface reconstruction (see log below). Weirdly, when I do mount this the log returns that /tmp will not be mounted as it is already mounted.

notmp2-sub-001.out.txt

Happy to share the 'subprocess' based sbatch submission - although this isn't yet running perfectly (#1798).
We could also try to get your template bash script working on our HPC as a validation if that would be useful?

Lastly just some tiny typos under 'Accessing the host’s filesystem'
$ singularity run --cleanenv -B /work:/work fmriprep.smig \

And under 'Internet access problems'
requests.exceptions.SSLError: HTTPSConnectionPool .... In this case, you container seems to be able to

@utooley
Copy link
Contributor

utooley commented Oct 3, 2019

Two thoughts on this, would it be useful to lay out the error message you might get (i.e. WARNING: Skipping user bind, non existent bind point (directory) in container) if
a) your Singularity config doesn't allow you to bind non-existent bind points, i.e. has not enabled overlay?
or b) doesn't auto-bind a directory such as \scratch that you reference in the documentation? I believe at some point I got errors trying to follow paths that I thought existed inside the container but don't by default unless your configuration enabled them.

You're right, this is not yet covered in the documentation. Your cluster is one example of these right? Do you mind if I ask you to try a few things over the next week?

@oesteban Not at all! I will say that I have no sense of how common these misconfigurations or weird default settings are across different cluster setups, so I don't know how useful (or not) it is to document them, but happy to try things out! As a side note, everything I know, I knew from running 1-4-1; haven't tested 1-5-0 myself to see if anything has resolved, but a few days ago someone at Penn ran 1-5-0 and we had similar problems with TemplateFlow.

@oesteban
Copy link
Member Author

oesteban commented Oct 3, 2019

@JoffJones

Some differences include using the -C contain all flag

Could you elaborate more? I would like to see whether this could be of help to others.

not using --cleanenv

I have to admit I've been pretty resistant to this flag but finally surrendered to its usefulness. What is preventing you from using it?

I have to bind mount /tmp:/tmp otherwise there is a persistent error with surface reconstruction (see log below). Weirdly, when I do mount this the log returns that /tmp will not be mounted as it is already mounted.

It seems to me that the FreeSurfer error is unrelated to mounting /tmp...

Happy to share the 'subprocess' based sbatch submission - although this isn't yet running perfectly (#1798).

I wonder how you deal with the environment given that you are not using --cleanenv - the python environment to run the submission is very likely to leak into the containers, isn't it?

We could also try to get your template bash script working on our HPC as a validation if that would be useful?

That would be really sweet if you can.

@oesteban
Copy link
Member Author

oesteban commented Oct 3, 2019

I'll also ping @yarikoptic here - I'm sure he will find missing points in the proposed documentation, esp. about Singularity.

@effigies
Copy link
Member

effigies commented Oct 3, 2019

I'll try to review the content in detail this afternoon, but to give you the feedback I've already got, I think this overshoots the prominence of these things by promoting them to the top level of the table of contents. I would suggest the following TOC:

The top page doesn't need to be very elaborate, but we can add the context of what we generally recommend for different use cases, e.g., fmriprep-docker for local systems, singularity for HPC, and make clear that all approaches accept the same set of arguments, and the distinctions lie in how to make sure the right resources are available to fmriprep.

@oesteban oesteban force-pushed the docs/templateflow-1801 branch from 49f1b28 to a2a30fe Compare October 3, 2019 16:59
@oesteban
Copy link
Member Author

oesteban commented Oct 3, 2019

Okay, I've tried to cover two of your points, @effigies, one moving the fmriprep-docker installation to the installation page (cross-referencing execution) and the second trying to make it more clear the command line structure and how it is modified for the case of containers.

There's perhaps some need to mention that fmriprep-docker is a bit special, but things should be getting closer to how they will look in the end.

Regarding the restructure, I agree with the feeling that giving new sections top-level relevance is an overshoot. But looking at how many of the questions in NeuroStars go around Singularity issues, I think it makes sense to cast them to top-level sections.

There's plenty of room on the left menu, and we will make it a lot more accessible.

@effigies
Copy link
Member

effigies commented Oct 3, 2019

I think we harm overall accessibility by adding to the top level, as now there are more choices to consider as you try to find what you need (and this isn't always going to be the top thing that people are looking for). I think having a clean hierarchy will be much easier to follow, even if it means one extra click.

In fact, I would probably reduce some of what is already there, such as moving SDC under processing details, and usage notes under installing and running. And I feel like the standard/non-standard spaces discussion is also a bit high up, though I'm not sure what I'd put it under. Maybe an "advanced configuration" section, or possibly we could rename "processing pipeline details" to indicate a broader consideration of processing options. But that's out of scope for this PR, certainly.

@oesteban
Copy link
Member Author

oesteban commented Oct 3, 2019

Would you agree with deferring the conversation about structure (including the new sections) to the docusprint?

@effigies
Copy link
Member

effigies commented Oct 3, 2019

Sure.

@yarikoptic
Copy link
Contributor

My problem with any suggestion of exposing system $HOME within container environment as HOME is possible side-effects for other tools. And exposing $HOME as some other directory inside the container for templateflow to use, is making it impossible to capture the state for any specific run to be able to reproduce it later on.
If reproducibility is in mind, I would recommend having per-project/study specific $HOME to reside under that study directory (lets call it $STUDY), and only directories under that study directory to be bind mounted inside. Then if $STUDY is also version controlled, e.g. with datalad, you have a chance to capture all necessary details to be able later on reproduce that specific result.
E.g. that is what we do in https://github.com/ReproNim/containers/

  • scripts/singularity_cmd is our shim to run singularity to
    • sanitize execution
    • bind mount top level directory
    • bind mount binds/HOME which could carry anything desired to be tracked/carried around (or ignored)

Or may be I am misunderstanding the situation and bind mounting $HOME is solely for truly just caching (so exact version(s) of templates are guaranteed to be the correct ones the other way)? Then may be it would be best to bind mount ~/.cache/templateflow? But then, since I trust nobody, I might still prefer to just track that cache directory within the study folder.

@oesteban
Copy link
Member Author

oesteban commented Oct 4, 2019

Or may be I am misunderstanding the situation and bind mounting $HOME is solely for truly just caching (so exact version(s) of templates are guaranteed to be the correct ones the other way)?

You got it right. There is a way of using TemplateFlow with DataLad (i.e., version-controlled) but we ended up disabling it by default since it created a pile of problems. $HOME is used just for caching, although exact versions of templates are not guaranteed. Since 1.5.0, $HOME is also used to cache the latest version of fMRIPrep in a text file under $HOME/.cache/fmriprep - but I guess that wouldn't worry you (it's there to shave off just one HTTP GET).

I'll think about this - I like the idea of the $STUDY environment variable - that is something we might want to propagate to the revision of BIDS-Apps in general (wdyt @effigies?).

If you come up with the markdown explaining your $STUDY idea to be included within the docs (in this PR or sending a separate one after this), it would be fantastic (and probably much more efficient than me trying to draft something first) - but I guess you might not have the time. Please let me know if this is too much to ask :).

@effigies
Copy link
Member

effigies commented Oct 4, 2019

I'm a little fuzzy on what exactly is being proposed. I think my brain is kind of shot...

So we should move things out of /home/fmriprep, to head off people accidentally dropping their home directory into it, and I guess make a /study directory? That seems fine. It's not entirely clear how a $STUDY environment variable comes into play.

Is there a proposal on how we're actually supposed to capture in a datalad-consumable way what specifically is being injected into the container? Or is this mostly just making it easier for people who do have a well-organized setup to capture these things with datalad?

@oesteban
Copy link
Member Author

oesteban commented Oct 7, 2019

@effigies, @yarikoptic in reference to the $STUDY environment variable I think it could make sense in a revised BIDS-App document/spec/protocol/recommendation to act as a default entrypoint for sharing resources with the host in containerized apps, and also a good default for the interface. This way, we could say that, by default, data are expected to be in $STUDY/data, derivatives in $STUDY/data/derivatives, configs/settings under $STUDY/settings, etc. The idea works on bare-metal installations (perhaps with DataLad control) and also with containers (although the recommendation of having derivatives under the dataset breaks potential read-only mounts of only $STUDY/data). For singularity users without overlay enabled (@yarikoptic should correct me if I'm wrong here) having a bind-mount point available in the image (e.g., $STUDY is /study unless otherwise defined) could be very useful to avoid the WARNING: Skipping user bind, non existent bind point (directory) in container @utooley mentioned above.

@utooley - made some changes following your suggestions in 2ef6980 - can you check?

@JoffJones can you comment on some questions I left in #1802 (comment)? - thanks!

@oesteban
Copy link
Member Author

oesteban commented Oct 7, 2019

May I ask some quick eyeballing from @rwblair and @franklin-feingold

I'd like to get this merged soon. Please let me know if there are any blocking points.

@JoffJones
Copy link

Hi @oesteban,

Sorry for the delay in responding. We have been using the -C flag as standard for any singularity images, @jooh would be able to elaborate but he is away at the moment. I think that we don't use --cleanenv because we are running things in a neuroimaging environment on the HPC, but I haven't tried it without.

Apologies I can't elaborate any more!

@oesteban
Copy link
Member Author

oesteban commented Oct 8, 2019

I'll merge - I think it is better to test this with real users and revise during the docusprint.

@oesteban oesteban merged commit 278fb84 into nipreps:master Oct 8, 2019
@oesteban oesteban deleted the docs/templateflow-1801 branch October 8, 2019 21:20
@utooley
Copy link
Contributor

utooley commented Oct 10, 2019

Also sorry for the late reply @oesteban ! I looked over your initial changes after I commented and they made sense to me (although I didn't know about docker2singularity), happy to go over again and revise down the line. Sorry again for the delay!

@JoffJones
Copy link

JoffJones commented Oct 17, 2019

Hi @oesteban,

Thank you very much for the updated slurm script, I have been doing some testing with it and the only trouble so far is with creating the freesurfer-6.0.1 directory for future re-use. I'm currently testing it without this, but would be grateful for your input.

bash script:
fmriprep1.5.0-new.txt

output:
fmriprep-27211-2.out.txt

Update:
Two subjects finished with no significant errors:
fmriprep-27216-6-err.txt
fmriprep-27216-6-out.txt
fmriprep-27216-8-out.txt
fmriprep-27216-8-err.txt

One subject finished with this error:
Node Name: fmriprep_wf.single_subject_007_wf.anat_preproc_wf.anat_derivatives_wf.ds_t1_template_transforms
And the segmentation looked poor
fmriprep-27216-7-err.txt
fmriprep-27216-7-out.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants