Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nanopore support #134

Merged
merged 11 commits into from
Jul 20, 2021
Merged

Nanopore support #134

merged 11 commits into from
Jul 20, 2021

Conversation

hoelzer
Copy link
Contributor

@hoelzer hoelzer commented Apr 18, 2021

I added basic nanopore support as described in #131

  • minimap2 instead of hisat2
  • nanoplot instead of fastqc
  • add -L flag for long reads for featureCounts
  • sortmerna still active, not sure how well it works w/ long reads (maybe better to deactivate per default when --nanopore is used

known issues:

  • multiqc reporting does not work bc/ minimap2 does not write stats that can be used. Also no support for nanoplot afaik
    • I think we should adjust the multiqc input channel a bit in case of --nanopore so that a report is still generated
    • for now, this process does simply not start so now output report is generated

Also, this is really a basic hack to get nanopore data working. For sure, there are more specialized tools. But it's also interesting to see how well RNAflow performs w/ that simple changes on long-read nanopore data.

Tested w/ -profile test,local,docker --nanopore --> works except the nanoplots are failing likely because the test data is actually not nanopore long reads. But the general workflow runs through.

@hoelzer hoelzer requested a review from MarieLataretu April 18, 2021 11:27
@AggressiveHayBale
Copy link

Great that you adding Nanopore to the workflow! I will test it today and compare the results with the pipeline developed by nf-core guys(their version is quite basic so I am looking forward to using rnaflow).

@hoelzer
Copy link
Contributor Author

hoelzer commented Apr 19, 2021

Great that you adding Nanopore to the workflow! I will test it today and compare the results with the pipeline developed by nf-core guys(their version is quite basic so I am looking forward to using rnaflow).

Hi! Cool, would be great if you can give it a try. As I mentioned in the PR it's really basic and there might be more long-read specialized tools out there. And we also have to perform some comparisons on our end - but basically it should work and long-reads should be properly mapped and counted. The subsequent DEG calc part is the same as for Illumina at the moment.

Let us know if you run into any issues!

@AggressiveHayBale
Copy link

AggressiveHayBale commented Apr 19, 2021

Hey!

First, if you have problem with minimap2 output of stats, maybe you can use samtools flagstat? It provides all of the mapping statistics.

Now the pipeline is working, the issue was my annotation file. So I tested it and I get similar result as in nf-core pipeline. No bugs for this moment. Thanks for adapting it to Nanopore!

Old comment:
Now I encounter an error that is probably not related to nanopore implementation but stopped me from getting the results.
I am using Prokka anotation file and as you pointed in #116 I used additional parameters --featurecounts_additional_params '-t transcript -g transcript_id'. However now my count tsv files have only 0. The same points deseq2 log file:

Error in DESeqDataSet(se, design = design, ignoreRank) : all samples have 0 counts for all genes. check the counting script. Calls: DESeqDataSetFromHTSeqCount -> DESeqDataSetFromMatrix -> DESeqDataSet Execution halted

@hoelzer
Copy link
Contributor Author

hoelzer commented Apr 25, 2021

@AggresiveHayBale sorry for the late reply!

First, if you have problem with minimap2 output of stats, maybe you can use samtools flagstat? It provides all of the mapping statistics.

Yes, this might work, although I don't know if multiQC can take samtools output directly...

Now the pipeline is working, the issue was my annotation file. So I tested it and I get similar result as in nf-core pipeline. No bugs for this moment. Thanks for adapting it to Nanopore!

Awesome, thanks for testing and replying! The input annotation file is always a bit tricky: and bc/ we started to develop RNAflow with human/mice/rat RNA-Seq data in mind we focused on the integration of "proper" GTF files like the ones from Ensembl that have the full gene-transcript-exon structure. Of course, for bacteria, such a structure is not really necessary but still recommended to work with most tools straight away.

I am just curious: what was the problem w/ your initial input annotation file? Missing transcript entries?

After @MarieLataretu approves, we will merge this PR and make another release for basic Nanopore support.

@AggressiveHayBale
Copy link

@hoelzer The issue was inconsistent naming scheme between fasta and annotation file. Stupid mistake from my part. After correcting everything started to work.

About MultiQC, I don't have that big coding experience so I can't say it for sure but according to their website samtools are fully implemented. The only problematic thing would be Nanoplot then. However as the output is still saved this is not a big problem.

Anyway good luck and thanks for the answer!

@hoelzer
Copy link
Contributor Author

hoelzer commented Jul 14, 2021

@MarieLataretu when you are basically happy w/ this PR we could also merge it and do a rnaflow release w/ (basic!) nanopore support.

@MarieLataretu
Copy link
Collaborator

I'll look into it tomorrow!

@MarieLataretu
Copy link
Collaborator

MarieLataretu commented Jul 16, 2021

  • I updated the changes from the main branch
  • MultiQC at least collects featurecounts, tmp and sortmerna, even if there is a empty input channel

Nanoplot was not working for me. I'm not sure if it's because of only one long read as input. I can look into that when the fiberglass problem is solved at work... I would like to have that tested before merging!


@MultiQC the new NanoStats MultiQC module did not work for NanoPlot output; at least when I tried to include it in CLEAN (the stats files are different)

For CLEAN we already have the full htlm report in MultiQC - looks not that nice, but I think, it's good enough (reminder issue #140)

@hoelzer
Copy link
Contributor Author

hoelzer commented Jul 19, 2021

@MarieLataretu okay great! I agree regarding MultiQC and NanoPlot... then it is like this. I hoped that now the NanoPlot output can be also summarized in MultiQC... but if not it's nothing to invest loads of time

Please feel free to merge and release once you were able to check Nanoplot again! For me it worked but maybe it depends on your example input

@MarieLataretu
Copy link
Collaborator

It was actually a problem in the Nanoplot conda env (see wdecoster/NanoPlot#222, wdecoster/NanoPlot#20), which is solved now by downgrading seaborn.

I assume you used the Docker container:

Tested w/ -profile test,local,docker --nanopore --> works except the nanoplots are failing likely because the test data is actually not nanopore long reads. But the general workflow runs through.

If the error is AttributeError: 'PolyCollection' object has no property 'stat_func' we'll have to check the Docker container too

@hoelzer
Copy link
Contributor Author

hoelzer commented Jul 20, 2021

@MarieLataretu ahh so this might be solved then with a newer version of Nanoplot in the container?

We use right now

withLabel: nanoplot { container = 'nanozoo/nanoplot:1.32.0--1ae6f5d' }`

but I can also provide a newer container for latest Nanoplot version if that helps:

docker pull nanozoo/nanoplot:1.38.0--f6b9f72

@MarieLataretu
Copy link
Collaborator

I tested it with Singularity and it works!

@hoelzer
Copy link
Contributor Author

hoelzer commented Jul 20, 2021

ah, okay! fine. I mean anyway we could update to the latest version

@MarieLataretu
Copy link
Collaborator

In the new Nanoplot Docker container (nanozoo/nanoplot:1.38.0--f6b9f72) zip does not exist.
Nanoplot works but the last zipping command fails (.command.sh: line 6: zip: command not found)

@hoelzer Can you fix it easily or should we modify the expected output in Nextflow?

@hoelzer
Copy link
Contributor Author

hoelzer commented Jul 20, 2021

@MarieLataretu ah strange... I did not touch the other dependencies... but I can easily fix that, one sec

@hoelzer
Copy link
Contributor Author

hoelzer commented Jul 20, 2021

FROM continuumio/miniconda3
ENV VERSION 1.38.0 
ENV TOOL nanoplot

RUN apt-get update && apt install -y procps wget gzip zip && apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

RUN conda config --add channels conda-forge && \
    conda config --add channels bioconda && \
    conda config --add channels default

RUN conda install $TOOL=$VERSION && conda clean -a

I am rebuilding the container w/ this recipe right now.

@hoelzer
Copy link
Contributor Author

hoelzer commented Jul 20, 2021

here it is:

docker pull nanozoo/nanoplot:1.38.0--11e5b96

I deleted the other one

@MarieLataretu
Copy link
Collaborator

Thanks! It works now 👍

I'm merging now

@MarieLataretu MarieLataretu merged commit 6767e4c into master Jul 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants