Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RNA pipeline with adapter clipping #662

Merged
merged 27 commits into from
Apr 13, 2022
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
d83d0e3
enable adapter clipping
takutosato Apr 4, 2022
87841a0
remove duplicate flag in markduplicates etc
takutosato Apr 4, 2022
7905871
Wes comments
takutosato Apr 4, 2022
d1536e1
done
takutosato Apr 4, 2022
38de775
Bring fastp tool into warp managed docker images
wjdingman1 Apr 6, 2022
6d2cc68
Bring fastp tool into warp managed docker images
wjdingman1 Apr 6, 2022
a87d4a2
fix merge conflicts
wjdingman1 Apr 6, 2022
4bf7a7e
fastQC as output to pipeline
takutosato Apr 6, 2022
6e29ac0
complete a setence
takutosato Apr 6, 2022
6c6e45a
Update to public version of Illumina_adapters
gbggrant Apr 7, 2022
e4aa176
add new outputs to TDR, update docker tag for ingest script
mmorgantaylor Apr 6, 2022
a0ce50c
Merge pull request #666 from broadinstitute/mmt_ts_clipping_tdr
mmorgantaylor Apr 7, 2022
99ccecd
temporarily add monitoring script to fastp
takutosato Apr 7, 2022
2b171bf
Added back input verification.
gbggrant Apr 8, 2022
02b181b
sort unmapped just in case and fastp disable length filtering
takutosato Apr 8, 2022
67a627d
post processing
takutosato Apr 8, 2022
f16953b
placeholder for transcriptome index
takutosato Apr 8, 2022
fee29a3
Fixing version/changelogs.
gbggrant Apr 8, 2022
a8eaa4e
Merge branch 'develop' into ts_clipping
gbggrant Apr 8, 2022
457cf9b
gatk docker
takutosato Apr 8, 2022
34f7ee3
Fix 2 little bugs
gbggrant Apr 9, 2022
1e11b49
Added back the SM-K4Y2X plumbing tests.
gbggrant Apr 11, 2022
b89d21d
Get rid of monitoring script
gbggrant Apr 11, 2022
06d55c7
round fastqc_percent_reads_with_adapter to 5 digits
mmorgantaylor Apr 12, 2022
e53da69
Merge pull request #667 from broadinstitute/mmt_ts_clipping_round
mmorgantaylor Apr 12, 2022
eec8e39
Remove monitoring_log as task output
gbggrant Apr 12, 2022
eab6757
Merge remote-tracking branch 'origin/develop' into ts_clipping
gbggrant Apr 13, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 1.0.6
2022-03-29 (Date of Last Commit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has the wdl pipeline version been updated?


* Clip adapter bases pre-alignment

# 1.0.5
2022-03-29 (Date of Last Commit)

Expand Down
5 changes: 5 additions & 0 deletions pipelines/broad/rna_seq/RNAWithUMIsPipeline.changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 1.0.4
2022-03-29 (Date of Last Commit)

* Clip adapter bases pre-alignment

# 1.0.3
2022-03-29 (Date of Last Commit)

Expand Down
97 changes: 62 additions & 35 deletions pipelines/broad/rna_seq/RNAWithUMIsPipeline.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ import "../../../tasks/broad/RNAWithUMIsTasks.wdl" as tasks

workflow RNAWithUMIsPipeline {

String pipeline_version = "1.0.3"
String pipeline_version = "1.0.4"

input {
File? bam
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will want to remove this correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, trying to reference the bam as input

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still want to support both ubam and fastqs as input

Expand All @@ -30,12 +30,11 @@ workflow RNAWithUMIsPipeline {
String read2Structure
String output_basename

# The following inputs are only required if fastqs are given as input.
String? platform
String? library_name
String? platform_unit
String? read_group_name
String? sequencing_center = "BI"
String platform
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the 'parameter meta' for all these values we can remove the 'only required when using fastq files as input blurb'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

String library_name
String platform_unit
String read_group_name
String sequencing_center = "BI"

File starIndex
File gtf
Expand All @@ -60,11 +59,11 @@ workflow RNAWithUMIsPipeline {
starIndex: "TAR file containing genome indices used for the STAR aligner"
output_basename: "String used as a prefix in workflow output files"
gtf: "Gene annotation file (GTF) used for the rnaseqc tool"
platform: "String used to describe the sequencing platform; only required when using FASTQ files as input"
library_name: "String used to describe the library; only required when using FASTQ files as input"
platform_unit: "String used to describe the platform unit; only required when using FASTQ files as input"
read_group_name: "String used to describe the read group name; only required when using FASTQ files as input"
sequencing_center: "String used to describe the sequencing center; only required when using FASTQ files as input; default is set to 'BI'"
platform: "String used to describe the sequencing platform"
library_name: "String used to describe the library"
platform_unit: "String used to describe the platform unit"
read_group_name: "String used to describe the read group name"
sequencing_center: "String used to describe the sequencing center; default is set to 'BI'"
ref: "FASTA file used for metric collection with Picard tools"
refIndex: "FASTA index file used for metric collection with Picard tools"
refDict: "Dictionary file used for metric collection with Picard tools"
Expand All @@ -75,29 +74,18 @@ workflow RNAWithUMIsPipeline {
population_vcf_index: "Population VCF index file used for contamination estimation"
}

call tasks.VerifyPipelineInputs {
input:
bam = bam,
r1_fastq = r1_fastq,
r2_fastq = r2_fastq,
library_name = library_name,
platform = platform,
platform_unit = platform_unit,
read_group_name = read_group_name,
sequencing_center = sequencing_center
}

if (VerifyPipelineInputs.fastq_run) {
# Assume
if (defined(r1_fastq)) {
call tasks.FastqToUbam {
input:
r1_fastq = select_first([r1_fastq]),
r2_fastq = select_first([r2_fastq]),
bam_filename = output_basename,
library_name = select_first([library_name]),
platform = select_first([platform]),
platform_unit = select_first([platform_unit]),
read_group_name = select_first([read_group_name]),
sequencing_center = select_first([sequencing_center])
library_name = library_name,
platform = platform,
platform_unit = platform_unit,
read_group_name = read_group_name,
sequencing_center = sequencing_center
}
}

Expand All @@ -110,9 +98,43 @@ workflow RNAWithUMIsPipeline {
read2Structure = read2Structure
}

call tasks.STAR {
# Convert SAM to fastq for adapter clipping
# This step also removes reads that fail platform/vendor quality checks
call tasks.SamToFastq {
input:
bam = ExtractUMIs.bam_umis_extracted,
output_prefix = output_basename
}

# Adapter clipping
call tasks.Fastp {
input:
fastq1 = SamToFastq.fastq1,
fastq2 = SamToFastq.fastq1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug!

Suggested change
fastq2 = SamToFastq.fastq1,
fastq2 = SamToFastq.fastq2,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@takutosato I fixed this - I'm 99.999999% certain it was a bug, but please verify

output_prefix = output_basename + ".adapter_clipped"
}

# Back to SAM before alignment
call tasks.FastqToUbam as FastqToUbamAfterClipping {
input:
r1_fastq = Fastp.fastq1_clipped,
r2_fastq = Fastp.fastq2_clipped,
bam_filename = output_basename + ".adapter_clipped",
library_name = library_name,
platform = platform,
platform_unit = platform_unit,
read_group_name = read_group_name,
sequencing_center = sequencing_center
}

call tasks.FastQC {
input:
unmapped_bam = FastqToUbamAfterClipping.unmapped_bam
}

call tasks.STAR {
input:
bam = FastqToUbamAfterClipping.unmapped_bam,
starIndex = starIndex
}

Expand All @@ -125,17 +147,20 @@ workflow RNAWithUMIsPipeline {
call UmiMD.UMIAwareDuplicateMarking {
input:
aligned_bam = STAR.aligned_bam,
output_basename = output_basename
unaligned_bam = ExtractUMIs.bam_umis_extracted,
output_basename = output_basename,
remove_duplicates = false
}

# We set remove dupli
call UmiMD.UMIAwareDuplicateMarking as UMIAwareDuplicateMarkingTranscriptome {
input:
aligned_bam = CopyReadGroupsToHeader.output_bam,
output_basename = output_basename + ".transcriptome"
unaligned_bam = ExtractUMIs.bam_umis_extracted,
output_basename = output_basename + ".transcriptome",
remove_duplicates = true
}

### PLACEHOLDER for CROSSCHECK ###

call tasks.GetSampleName {
input:
bam = bam_to_use
Expand Down Expand Up @@ -208,6 +233,8 @@ workflow RNAWithUMIsPipeline {
File picard_quality_distribution_pdf = CollectMultipleMetrics.quality_distribution_pdf
Float contamination = CalculateContamination.contamination
Float contamination_error = CalculateContamination.contamination_error
File fastqc_html_report = FastQC.fastqc_html
Float fastqc_adapter_content = FastQC.adapter_content # sato: might be good to have this one too.
}
}

Loading