Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RNA pipeline with adapter clipping #662

Merged
merged 27 commits into from
Apr 13, 2022
Merged

RNA pipeline with adapter clipping #662

merged 27 commits into from
Apr 13, 2022

Conversation

takutosato
Copy link
Contributor

@takutosato takutosato commented Apr 4, 2022

After this change the RNA pipeline will do adapter clipping (using fastp) prior to alignment. This will help to retain small insert reads (which are fairly commonplace in FFPE) that would otherwise be discarded before running RSEM.

Related but not required tasksOne last task:

@github-actions
Copy link

github-actions bot commented Apr 4, 2022

Remember to squash merge!

@takutosato
Copy link
Contributor Author

@gbggrant @mmorgantaylor @kcibul Now that we convert the files back to fastq whether the input is a ubam or fastqs, the variables platform_unit etc. are no longer optional. Will that cause problems either upstream or downstream of this wdl?

@takutosato
Copy link
Contributor Author

image

Copy link
Contributor

@wjdingman1 wjdingman1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on first pass! Let me know whatever help you need with docker stuff.

Also we have a style guide on how we like to format our runtime blocks -> here

If you have time it would be appreciated if you modify your tasks to that style, if not, then I can find some time to do it, thanks!

pipelines/broad/rna_seq/RNAWithUMIsPipeline.wdl Outdated Show resolved Hide resolved
@@ -20,7 +20,7 @@ import "../../../tasks/broad/RNAWithUMIsTasks.wdl" as tasks

workflow RNAWithUMIsPipeline {

String pipeline_version = "1.0.3"
String pipeline_version = "1.0.4"

input {
File? bam
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will want to remove this correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, trying to reference the bam as input

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still want to support both ubam and fastqs as input

String? read_group_name
String? sequencing_center = "BI"
# sato: make a note of this change---no longer optional
String platform
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the 'parameter meta' for all these values we can remove the 'only required when using fastq files as input blurb'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

pipelines/broad/rna_seq/RNAWithUMIsPipeline.wdl Outdated Show resolved Hide resolved
File fastq1
File fastq2
String output_prefix
File adapter_fasta = "gs://broad-dsde-methods-takuto/RNA/resources/Illumina_adapters.fasta" # sato: move to a public place
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something we may want to be configurable in the future i.e. as top level wdl input? Its not necessary just something to consider

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't change but we will probably need to move it somewhere.

@gbggrant where should it go? All the other resources files are in gs://gcp-public-data--broad-references/ but it might not be appropriate there since it's not part of the reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good question - still will probably put it in the Google public references bucket.



runtime {
docker: "biocontainers/fastp:v0.20.1_cv1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this tool and is it regularly maintained/updated? If this docker image is something that is an official release we can leave it as is, otherwise we will want to move it into a warp managed image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/OpenGene/fastp is the tool and the docker comes form biocontainer...definitely not an official release, it was the best I could find.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@takutosato looks like fastp is on 0.23.1 do you want me to update that when i create the new docker image or do you need this specific version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I tested with 0.20.1 so we should stick with that. It should be safe to use 0.23.1 since all the updates since 0.20.1 are not related to adapter trimming, but just to be safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay sounds good!

File ubam
String output_basename
File gatk_jar = "gs://broad-dsde-methods-takuto/RNA/gatk_transfer_read_tags.jar"
String docker = "us.gcr.io/broad-gotc-prod/picard-cloud:2.26.11" # sato: replace with a new gatk docker release, as needed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im guessing the functionality from this JAR will eventually be folded into GATK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@wjdingman1
Copy link
Contributor

looks like womtool is complaining about inputs for BroadInternalRNAWithUMIs
image

pipelines/broad/rna_seq/RNAWithUMIsPipeline.wdl Outdated Show resolved Hide resolved
pipelines/broad/rna_seq/RNAWithUMIsPipeline.wdl Outdated Show resolved Hide resolved
pipelines/broad/rna_seq/RNAWithUMIsPipeline.wdl Outdated Show resolved Hide resolved
tasks/broad/RNAWithUMIsTasks.wdl Outdated Show resolved Hide resolved
tasks/broad/RNAWithUMIsTasks.wdl Outdated Show resolved Hide resolved
@gbggrant
Copy link
Contributor

gbggrant commented Apr 4, 2022

@takutosato Thanks - I don't think that making those parameters non-optional will break anything upstream or downstream. We'll need to update a couple of the tests to now supply those tag values (where the inputs are bams).

@github-actions
Copy link

github-actions bot commented Apr 4, 2022

Remember to squash merge!

@takutosato
Copy link
Contributor Author

Thank you for reviewing @wjdingman1 @gbggrant. I addressed your comments and updated the runtime blocks to conform to the style guide.

input {
File unmapped_bam
Float? mem = 4
String docker = "us.gcr.io/tag-public/tag-tools:1.0.0" # sato: This likely needs to be made public
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the story on this one as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a docker used by TAG team from GP; should I just ask for a docker file from them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if its something they regularly maintain and do security scans on then its okay to leave as is, otherwise we'll take care of bringing it to warp

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@takutosato pinging on this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wjdingman1 Sorry for the delay. It's updated about once every year and I doubt they do any security scans. I can share the docker file. It looks like the docker installs extraneous tools, it might be good to create a new docker (not necessarily in this PR)---could I get your help trimming it down?

call tasks.Fastp {
input:
fastq1 = SamToFastq.fastq1,
fastq2 = SamToFastq.fastq1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug!

Suggested change
fastq2 = SamToFastq.fastq1,
fastq2 = SamToFastq.fastq2,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@takutosato I fixed this - I'm 99.999999% certain it was a bug, but please verify

@gbggrant
Copy link
Contributor

gbggrant commented Apr 6, 2022

HI @takutosato a couple more comments / minor fixes. Found one bug.
I'll put that adapters file somewhere publicly accessible.

@github-actions
Copy link

github-actions bot commented Apr 6, 2022

Remember to squash merge!

@wjdingman1
Copy link
Contributor

wjdingman1 commented Apr 6, 2022

@takutosato I added the fastp docker image changes to this PR.

Unrelated but you need to update the changelog.md to be consistent with the current pipeline version 1.0.4 and note the changes that you have made in there

@@ -1,3 +1,8 @@
# 1.0.6
2022-03-29 (Date of Last Commit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has the wdl pipeline version been updated?

@github-actions
Copy link

github-actions bot commented Apr 6, 2022

Remember to squash merge!

Fix bug in call to 'Fastp'
Update test data files for bam inputs to set required tags
@github-actions
Copy link

github-actions bot commented Apr 8, 2022

Remember to squash merge!

@github-actions
Copy link

github-actions bot commented Apr 8, 2022

Remember to squash merge!

@github-actions
Copy link

github-actions bot commented Apr 9, 2022

Remember to squash merge!

@github-actions
Copy link

Remember to squash merge!

@@ -970,7 +970,7 @@ task PostprocessTranscriptomeForRSEM {
}

output {
File output_bam = "~{prefix}.bam"
File output_bam = "~{prefix}_gatk.bam"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gbggrant I removed _gatk as I thought it wasn't descriptive; did it cause an error without it? Perhaps we should use _RSEM_post_processed instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@takutosato I added it back because that's what you are naming the file on line 969 - I guess you should remove it there in addition.

GB -> GiB
Fix one of the plumbing tests.
@github-actions
Copy link

Remember to squash merge!

@github-actions
Copy link

Remember to squash merge!

@github-actions
Copy link

Remember to squash merge!

@wjdingman1
Copy link
Contributor

@gbggrant Do we want to update plumbing based off what we have here or does it not matter because of the non-deteminism

@github-actions
Copy link

Remember to squash merge!

@gbggrant
Copy link
Contributor

@wjdingman1 we'll definitely need to update plumbing truth for this PR. There's some new plumbing data, plus some changes in the pipeline. Looks like this is ready to merge, can update the truth once merged.

@wjdingman1
Copy link
Contributor

@gbggrant sounds good

@gbggrant
Copy link
Contributor

(just waiting for this last smart test to run to completion).

@gbggrant gbggrant merged commit 000b278 into develop Apr 13, 2022
@gbggrant gbggrant deleted the ts_clipping branch April 13, 2022 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants