Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seperate ends #57

Merged
merged 27 commits into from
Jan 13, 2025
Merged

Seperate ends #57

merged 27 commits into from
Jan 13, 2025

Conversation

rroutsong
Copy link
Collaborator

Separate rule hierarchy for paired and single end, incorporating cfCHiP in as another hierarchical import in Snakemake

@skchronicles skchronicles self-requested a review December 11, 2024 20:36
@rroutsong
Copy link
Collaborator Author

Waiting for test data sets to finish running before review, all data stored at /data/OpenOmics/dev/datasets/outputs

@rroutsong rroutsong self-assigned this Dec 11, 2024
@rroutsong
Copy link
Collaborator Author

rroutsong commented Dec 23, 2024

Using the command

tree -I 'logfiles|resources|workflow|bin|config|config.json|failed_jobs_*|job_information_*|*.R*.fastq.gz|*.krona.html.files|*.sh|COMPLETED' $DIRONE > $DIRONE.tree
tree -I 'logfiles|resources|workflow|bin|config|config.json|failed_jobs_*|job_information_*|*.R*.fastq.gz|*.krona.html.files|*.sh|COMPLETED' $DIRTWO > $DIRTWO.tree
diff $DIRONE.tree $DIRTWO.tree

I identified that some insert size files are missing in the wo_blocking and w_blocking toy data sets.

1c1
< wo_blocking
---
> wo_blocking_PREV
237a238,239
> │   ├── 27_IFN0h_1_Sp100_S15.Q5DD.insert_size_histogram.pdf
> │   ├── 27_IFN0h_1_Sp100_S15.Q5DD.insert_size_metrics.txt
242a245,246
> │   ├── 35_IFN24h_1_Sp100_S6.Q5DD.insert_size_histogram.pdf
> │   ├── 35_IFN24h_1_Sp100_S6.Q5DD.insert_size_metrics.txt
247a252,253
> │   ├── 39_IFN0h_2_Sp100_S46.Q5DD.insert_size_histogram.pdf
> │   ├── 39_IFN0h_2_Sp100_S46.Q5DD.insert_size_metrics.txt
252a259,260
> │   ├── 47_IFN24h_2_Sp100_S43.Q5DD.insert_size_histogram.pdf
> │   ├── 47_IFN24h_2_Sp100_S43.Q5DD.insert_size_metrics.txt
417c425
< w_blocking
---
> w_blocking_PREV
245a246,247
> │   ├── 27_IFN0h_1_Sp100_S15.Q5DD.insert_size_histogram.pdf
> │   ├── 27_IFN0h_1_Sp100_S15.Q5DD.insert_size_metrics.txt
250a253,254
> │   ├── 35_IFN24h_1_Sp100_S6.Q5DD.insert_size_histogram.pdf
> │   ├── 35_IFN24h_1_Sp100_S6.Q5DD.insert_size_metrics.txt
255a260,261
> │   ├── 39_IFN0h_2_Sp100_S46.Q5DD.insert_size_histogram.pdf
> │   ├── 39_IFN0h_2_Sp100_S46.Q5DD.insert_size_metrics.txt
260a267,268
> │   ├── 47_IFN24h_2_Sp100_S43.Q5DD.insert_size_histogram.pdf
> │   ├── 47_IFN24h_2_Sp100_S43.Q5DD.insert_size_metrics.txt
425c433

@rroutsong
Copy link
Collaborator Author

tree -I 'logfiles|resources|workflow|bin|config|config.json|failed_jobs_*|job_information_*|*.R*.fastq.gz|*.krona.html.files|*.sh|COMPLETED'

Commit 27709bd fixes insert size metric diff problem

1c1
< w_blocking
---
> w_blocking_PREV

Copy link
Collaborator

@tovahmarkowitz tovahmarkowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I just looked through your changes in the PE-specific rules as your current test runs shouldn't affect them. I noticed a few small things that I would like addressed. Thanks for all the hard work.
Tovah

workflow/rules/paired/peakcall.smk Show resolved Hide resolved
workflow/rules/paired/trim_align_dedup.smk Show resolved Hide resolved
@rroutsong
Copy link
Collaborator Author

@tovahmarkowitz all testing outputs are run and stored /data/OpenOmics/dev/datasets/outputs diffs only show files that changed names

1c1
< wo_blocking_SE
---
> wo_blocking_SE_PREV
52,53d51
< │       ├── 27_IFN0h_1_Sp100_S15.Q5DD.pdf
< │       ├── 27_IFN0h_1_Sp100_S15.Q5DD.ppqt.txt
54a53
> │       ├── 27_IFN0h_1_Sp100_S15.Q5DD_tagAlign.ppqt
56a56
> │       ├── 27_IFN0h_1_Sp100_S15.sorted.ppqt
58,59d57
< │       ├── 35_IFN24h_1_Sp100_S6.Q5DD.pdf
< │       ├── 35_IFN24h_1_Sp100_S6.Q5DD.ppqt.txt
60a59
> │       ├── 35_IFN24h_1_Sp100_S6.Q5DD_tagAlign.ppqt
62a62
> │       ├── 35_IFN24h_1_Sp100_S6.sorted.ppqt
64,65d63
< │       ├── 39_IFN0h_2_Sp100_S46.Q5DD.pdf
< │       ├── 39_IFN0h_2_Sp100_S46.Q5DD.ppqt.txt
66a65
> │       ├── 39_IFN0h_2_Sp100_S46.Q5DD_tagAlign.ppqt
68a68
> │       ├── 39_IFN0h_2_Sp100_S46.sorted.ppqt
70,71d69
< │       ├── 47_IFN24h_2_Sp100_S43.Q5DD.pdf
< │       ├── 47_IFN24h_2_Sp100_S43.Q5DD.ppqt.txt
72a71
> │       ├── 47_IFN24h_2_Sp100_S43.Q5DD_tagAlign.ppqt
74a74
> │       ├── 47_IFN24h_2_Sp100_S43.sorted.ppqt

@skchronicles
Copy link
Contributor

@tovahmarkowitz Did you have a chance to review everything? Is this PR ready to merge?

Copy link
Collaborator

@tovahmarkowitz tovahmarkowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ryan,

This looks great! I just have a few minor cleanup requests so things are a little nicer before we do the actual pull request. I have a few other smaller things we still need to change to finish up this dealing with separate ends mess, but they would involve a few rules that need added/removed so they should probably wait until after this major update! Thank you so much!

Tovah

workflow/rules/peakcall.smk Outdated Show resolved Hide resolved
workflow/rules/single/qc.smk Show resolved Hide resolved
workflow/scripts/grouping.py Show resolved Hide resolved
workflow/rules/paired/trim_align_dedup.smk Show resolved Hide resolved
workflow/rules/single/trim_align_dedup.smk Show resolved Hide resolved
workflow/rules/single/trim_align_dedup.smk Show resolved Hide resolved
@rroutsong
Copy link
Collaborator Author

@tovahmarkowitz the changes you requested have broke the single end pipeline I will need to spend some time fixing it @skchronicles please do not merge yet

@rroutsong
Copy link
Collaborator Author

@tovahmarkowitz the changes you requested have broke the single end pipeline I will need to spend some time fixing it @skchronicles please do not merge yet

it was the {ext} wildcard in the ppqt single end rules, fixed now, please review updated code

ready for merge if all is well @skchronicles

@skchronicles
Copy link
Contributor

@rroutsong Can you update bin/ppqt_process.py to write the parse fragment length to a file (similar to how it was setup before)? We don't want to print it to standard output because occasionally this file will need to be manually updated when we run into weird edge cases.

After taking a peek, it looks like different single-end rules that use this PPQT file as an input use slightly different file name that point to output from the different PPQT rules.

Here are some example(s):
image
image
image

Can we just use the BAM PPQT output file instead (i.e. {name}.Q5DD.ppqt.txt)? This would ensure we are using a consistent set of files across the different rules. With that being said, can you update this rule below:
image

@rroutsong rroutsong closed this Jan 10, 2025
@rroutsong rroutsong reopened this Jan 10, 2025
@rroutsong
Copy link
Collaborator Author

alright we have to re-run all the single end data due to the functional change of rules

@rroutsong
Copy link
Collaborator Author

rroutsong commented Jan 13, 2025

all changes have been addressed and single end workflow was ran successfully, see:

  1. /data/OpenOmics/dev/datasets/outputs/w_blocking_SE
  2. /data/OpenOmics/dev/datasets/outputs/wo_blocking_SE

please complete review if you guys are satisfied @skchronicles @tovahmarkowitz

Copy link
Contributor

@skchronicles skchronicles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rroutsong for these awesome changes! Seperating the SE vs. PE logic really makes each of these rules more readable and easier to understand/follow.

@skchronicles skchronicles merged commit 348e6eb into main Jan 13, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants