-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RNA pipeline with adapter clipping #662
Conversation
Remember to squash merge! |
@gbggrant @mmorgantaylor @kcibul Now that we convert the files back to fastq whether the input is a ubam or fastqs, the variables platform_unit etc. are no longer optional. Will that cause problems either upstream or downstream of this wdl? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good on first pass! Let me know whatever help you need with docker stuff.
Also we have a style guide on how we like to format our runtime blocks -> here
If you have time it would be appreciated if you modify your tasks to that style, if not, then I can find some time to do it, thanks!
@@ -20,7 +20,7 @@ import "../../../tasks/broad/RNAWithUMIsTasks.wdl" as tasks | |||
|
|||
workflow RNAWithUMIsPipeline { | |||
|
|||
String pipeline_version = "1.0.3" | |||
String pipeline_version = "1.0.4" | |||
|
|||
input { | |||
File? bam |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will want to remove this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, trying to reference the bam as input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still want to support both ubam and fastqs as input
String? read_group_name | ||
String? sequencing_center = "BI" | ||
# sato: make a note of this change---no longer optional | ||
String platform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the 'parameter meta' for all these values we can remove the 'only required when using fastq files as input blurb'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
tasks/broad/RNAWithUMIsTasks.wdl
Outdated
File fastq1 | ||
File fastq2 | ||
String output_prefix | ||
File adapter_fasta = "gs://broad-dsde-methods-takuto/RNA/resources/Illumina_adapters.fasta" # sato: move to a public place |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something we may want to be configurable in the future i.e. as top level wdl input? Its not necessary just something to consider
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't change but we will probably need to move it somewhere.
@gbggrant where should it go? All the other resources files are in gs://gcp-public-data--broad-references/
but it might not be appropriate there since it's not part of the reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good question - still will probably put it in the Google public references bucket.
tasks/broad/RNAWithUMIsTasks.wdl
Outdated
|
||
|
||
runtime { | ||
docker: "biocontainers/fastp:v0.20.1_cv1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this tool and is it regularly maintained/updated? If this docker image is something that is an official release we can leave it as is, otherwise we will want to move it into a warp managed image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/OpenGene/fastp is the tool and the docker comes form biocontainer...definitely not an official release, it was the best I could find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@takutosato looks like fastp is on 0.23.1 do you want me to update that when i create the new docker image or do you need this specific version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I tested with 0.20.1 so we should stick with that. It should be safe to use 0.23.1 since all the updates since 0.20.1 are not related to adapter trimming, but just to be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay sounds good!
tasks/broad/RNAWithUMIsTasks.wdl
Outdated
File ubam | ||
String output_basename | ||
File gatk_jar = "gs://broad-dsde-methods-takuto/RNA/gatk_transfer_read_tags.jar" | ||
String docker = "us.gcr.io/broad-gotc-prod/picard-cloud:2.26.11" # sato: replace with a new gatk docker release, as needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im guessing the functionality from this JAR will eventually be folded into GATK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
@takutosato Thanks - I don't think that making those parameters non-optional will break anything upstream or downstream. We'll need to update a couple of the tests to now supply those tag values (where the inputs are bams). |
Remember to squash merge! |
Thank you for reviewing @wjdingman1 @gbggrant. I addressed your comments and updated the runtime blocks to conform to the style guide. |
input { | ||
File unmapped_bam | ||
Float? mem = 4 | ||
String docker = "us.gcr.io/tag-public/tag-tools:1.0.0" # sato: This likely needs to be made public |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats the story on this one as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a docker used by TAG team from GP; should I just ask for a docker file from them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if its something they regularly maintain and do security scans on then its okay to leave as is, otherwise we'll take care of bringing it to warp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@takutosato pinging on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wjdingman1 Sorry for the delay. It's updated about once every year and I doubt they do any security scans. I can share the docker file. It looks like the docker installs extraneous tools, it might be good to create a new docker (not necessarily in this PR)---could I get your help trimming it down?
call tasks.Fastp { | ||
input: | ||
fastq1 = SamToFastq.fastq1, | ||
fastq2 = SamToFastq.fastq1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug!
fastq2 = SamToFastq.fastq1, | |
fastq2 = SamToFastq.fastq2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@takutosato I fixed this - I'm 99.999999% certain it was a bug, but please verify
HI @takutosato a couple more comments / minor fixes. Found one bug. |
Remember to squash merge! |
@takutosato I added the fastp docker image changes to this PR. Unrelated but you need to update the changelog.md to be consistent with the current pipeline version |
@@ -1,3 +1,8 @@ | |||
# 1.0.6 | |||
2022-03-29 (Date of Last Commit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has the wdl pipeline version been updated?
Remember to squash merge! |
Fix bug in call to 'Fastp' Update test data files for bam inputs to set required tags
Remember to squash merge! |
Remember to squash merge! |
Remember to squash merge! |
Remember to squash merge! |
tasks/broad/RNAWithUMIsTasks.wdl
Outdated
@@ -970,7 +970,7 @@ task PostprocessTranscriptomeForRSEM { | |||
} | |||
|
|||
output { | |||
File output_bam = "~{prefix}.bam" | |||
File output_bam = "~{prefix}_gatk.bam" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gbggrant I removed _gatk
as I thought it wasn't descriptive; did it cause an error without it? Perhaps we should use _RSEM_post_processed
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@takutosato I added it back because that's what you are naming the file on line 969 - I guess you should remove it there in addition.
GB -> GiB Fix one of the plumbing tests.
Remember to squash merge! |
round fastqc_percent_reads_with_adapter to 5 digits
Remember to squash merge! |
Remember to squash merge! |
@gbggrant Do we want to update plumbing based off what we have here or does it not matter because of the non-deteminism |
Remember to squash merge! |
@wjdingman1 we'll definitely need to update plumbing truth for this PR. There's some new plumbing data, plus some changes in the pipeline. Looks like this is ready to merge, can update the truth once merged. |
@gbggrant sounds good |
(just waiting for this last smart test to run to completion). |
After this change the RNA pipeline will do adapter clipping (using fastp) prior to alignment. This will help to retain small insert reads (which are fairly commonplace in FFPE) that would otherwise be discarded before running RSEM.
Related but not required tasksOne last task: