Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warning block to readme about singularity image downloads #2338

Closed
wants to merge 4 commits into from

Conversation

fellen31
Copy link
Contributor

Since this only affects nf-core download it can give a person headache when everything else works.

PR checklist

  • This comment contains a description of changes (with reason)
  • CHANGELOG.md is updated
  • If you've fixed a bug or added code that should be tested, add tests!
  • Documentation in docs is updated

@codecov
Copy link

codecov bot commented Jun 22, 2023

Codecov Report

Merging #2338 (aea7354) into dev (760e92e) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##              dev    #2338   +/-   ##
=======================================
  Coverage   73.09%   73.09%           
=======================================
  Files          78       78           
  Lines        8780     8780           
=======================================
  Hits         6418     6418           
  Misses       2362     2362           
Impacted Files Coverage Δ
nf_core/download.py 61.68% <ø> (ø)

@MatthiasZepper
Copy link
Member

MatthiasZepper commented Jun 22, 2023

Can you please elaborate which module/pipeline/revision prompted your PR?

Furthermore, I believe that the README is not really the place to change anything, really. Don't you think that the module template or the nf-core lint would be more appropriate to ensure that other developers do not encounter the same frustration with single vs. double quotes?

@mashehu
Copy link
Contributor

mashehu commented Jun 22, 2023

Thanks for your contribution. For newer pipelines this should already be taken care of with nf-core lint #2085.

Maybe we should have this as a note in the readme that for older pipeline version there might be an issue and give some troubleshooting advice (but I am afraid it is mostly up to pipeline developers to fix this)

@fellen31
Copy link
Contributor Author

Hi, thanks for the comments. I was using nf-core download to download a workflow including a local module where only a docker image is available (like the deepvariant module). In this module I used container 'xxx' instead of container "xxx". Posted on slack and was asked to make a PR about it.

But, I realise this might be an edge case, and the README already states how the containers are scraped, so I absolutely understand if this addition is not necessary. I think the modules template with single quotes surrounding the actual adress, compared to the need of using double quotes around the whole thing is what confused me when only a docker container is available.

A minimal example:

  1. Create a new pipeline using nf-core tools 2.8
  2. Patch nf-core/fastqc to look like container 'quay.io/biocontainers/fastqc:0.11.9--0'
  3. Lints fine, runs fine, but nf-core download will not find the container.

While on the subject, how can I use nf-core lint with local modules? Do I need to upload the module to a repository and then install with nf-core modules?

@MatthiasZepper
Copy link
Member

MatthiasZepper commented Jun 22, 2023

In this module I used container 'xxx' instead of container "xxx". Posted on slack and was asked to make a PR about it.

Thanks for linking the Slack thread. I now got a better idea how, why and what the reason for this PR is. I am still inclined to disagree with Phil on this issue, because in my perception nf-core download is meant for people using the pipelines (possibly for the first time).

Hence, I fear that even the current level of technical peculiarities stated in the Readme rather adds to the confusion that helps to clarify it. I think a pipeline/module developer will very rarely consult the Readme of Download to understand how a container needs to be specified.

But, I realise this might be an edge case, and the README already states how the containers are scraped, so I absolutely understand if this addition is not necessary. I think the modules template with single quotes surrounding the actual adress, compared to the need of using double quotes around the whole thing is what confused me when only a docker container is available.

A minimal example:

1. Create a new pipeline using nf-core tools 2.8

2. Patch nf-core/fastqc to look like `container 'quay.io/biocontainers/fastqc:0.11.9--0'`

3. Lints fine, runs fine, but `nf-core download` will not find the container.

That is quite unfortunate. Maybe we should then have another look at the regexes? In the current dev version of tools, I have already introduced a new regex to handle yet another edge case (when the container definition is made in a separate variable) for which the quote issue also popped up:

tools/nf_core/download.py

Lines 725 to 743 in 4f32c07

Some modules declare the container as separate variable. This entails that " instead of ' is used,
so the above regex will match, but end prematurely before the container name is captured.
Therefore, we need to repeat the search over the contents, extract the variable name, and use it inside a new regex.
To get the variable name ( ${container_id} in above example ), we match the literal word "container" and use lookbehind (reset the match).
Then we skip [^${}]+ everything that is not $ or curly braces. The next capture group is
${ followed by any characters that are not curly braces [^{}]+ and ended by a closing curly brace (}),
but only if it's not followed by any other curly braces (?![^{]*}). The latter ensures we capture the innermost
variable name.
"""
container_definition = re.search(
r"(?<=container)[^\${}]+\${([^{}]+)}(?![^{]*})", contents
)
if bool(container_definition) and bool(container_definition.group(1)):
pattern = re.escape(container_definition.group(1))
# extract the quoted string(s) following the variable assignment
container_names = re.findall(r"%s\s*=\s*[\"\']([^\"\']+)[\"\']" % pattern, contents)

In the new regex, I already accounted for both notations [\"\'], but haven't changed the existing ones. Maybe (not tested) fixing you problem for good just means replacing r"container\s*\"([^\"]*)\"" with r"container\s*[\"\']([^\"\']*)[\"\']" in line 705:

matches = re.findall(r"container\s*\"([^\"]*)\"", contents, re.S)

But it could also cause prematurely abridged matches?

If you like, you could test that: Install the dev version into a virtual environment with pip install --upgrade -r requirements-dev.txt -e . after cloning the current dev branch, change the regex accordingly and then test that both on your own module and some old revisions of pipelines?

@MatthiasZepper
Copy link
Member

@fellen31: We noticed that a bigger refactor of the container detection was anyway required, so I have rewritten those parts of the code. In the course of this, I have also included the proposed regex changes. I would highly appreciate it, if you could test this code with your custom module asap, since there will likely be a tools release still this week!

@ewels
Copy link
Member

ewels commented Jun 29, 2023

Closing as this seems to be resolved. Please open a new issue if this isn't the case 👍🏻

@ewels ewels closed this Jun 29, 2023
@fellen31
Copy link
Contributor Author

fellen31 commented Jun 29, 2023

@MatthiasZepper I can't seem to get it working with the updated code, but I'm not 100% sure I've installed the correct version.

If you want to double check, see if you can download the fastqc container with nf-core download fellen31/nf-core-test -r test4 where container 'biocontainers/fastqc:0.11.9--0' is used, compared to nf-core download fellen31/nf-core-test -r test3 where container "biocontainers/fastqc:0.11.9--0" is used.

@fellen31
Copy link
Contributor Author

If you change

module_container = re.findall(r"container\s*[\"\']([^\"]*)[\"\']", search_space, re.S)

to

module_container = re.findall(r"container\s*[\"\']([^\"\']*)[\"\']", search_space, re.S)

it does work, but then as you said it causes the other matches to stop prematurely (like ['${ workflow.containerEngine == ']).

Don't know if it's possible to find a way here, but I'm fine with just using "" going forward.. :)

@MatthiasZepper
Copy link
Member

If you change

module_container = re.findall(r"container\s*[\"\']([^\"]*)[\"\']", search_space, re.S)

to

module_container = re.findall(r"container\s*[\"\']([^\"\']*)[\"\']", search_space, re.S)

it does work, but then as you said it causes the other matches to stop prematurely (like ['${ workflow.containerEngine == ']).

Don't know if it's possible to find a way here, but I'm fine with just using "" going forward.. :)

Thanks for testing and pointing out this regression! I have wasted many hours today, but I think I got you covered now...

@fellen31
Copy link
Contributor Author

Awesome! I think so too, everything seems to work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants