Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thops466 ercc #39

Merged
merged 20 commits into from
Nov 15, 2022
Merged

Thops466 ercc #39

merged 20 commits into from
Nov 15, 2022

Conversation

e-t-k
Copy link
Contributor

@e-t-k e-t-k commented Nov 15, 2022

Implements changes per https://github.com/UCSC-Treehouse/operations/issues/466.

Create an 'ercc' option to fab process which runs an ercc-aware pipeline.

General Changes

  • The Makefile has a "regex" which locates the input FASTQ files. This regex has been updated.
    Previously it would only find samples with R1 or R2 followed by no numbers at all.
    Now, it also can find samples that end in R1_001 and R2_001.

  • Removed TOIL's 'debug' output spam from the expression step

  • _STARtmp directories will never be downloaded in the case of a fusions crash

  • At the end of the log it will print out a list of all sample IDs which failed the fusion step, if any

  • Fusion now runs before jfkm instead of after.

  • Added some troubleshooting guidance to the treeshop page

Changes to text in:

  • Treeshop's list of output files
  • expression step of makefile
  • error message in _setup()
  • README

ERCC-aware pipeline

New ercc.md page documenting how to use the ERCC pipeline

Makefile:

  • new reference_ercc option
  • new expression_ercc option
  • new qc_ercc option

Fabfile:

  • new reference_ercc() method
  • pipeline steps now have an "ercc" bool parameter to change output dir if ercc-aware pipeline is run
  • fab process() is ercc-aware when run with ercc=True. ERCC will not run pizzly, fusions, jfkm, variants.

e-t-k added 20 commits October 16, 2022 23:33
add erccexpression step to Makefile (tested, works)
and fabfile (untested).

currently output files that are not ideal are:
- kallisto file
- rsem_genes.hugo.results
(see issue)
Added qc step to Makefile (currently running)
and fabfile (fully untested)
its stringly typed in the process( signature!
Convert it to an actual bool. hilarious.
added ERCC option for pizzly, fusion, jfkm, variants
mostly just changes the output dir, a few of them
that drop files in primary / derived need to change the bam names too

totally untested, not even executed.
so the previous version is broken because i forgot the pipe  character
but i tried putting it in - so the last line is
| grep -v "DEBUG toil"

and it's not sucessfully filtering the lines.

I'm not sure if the pipe is running inside or outside the docker
and im not sure whether docker is sending things to stdout or stderr or what.
So for now I just totally remove it.

(so no, there is not a committed version with the pipe in --
I tried running it without committing and it did run but didn't filter the lines. )
still in progress, not tested.

> can't hardlink some bams because they are owned by root.
but can move them because ubuntu owns the parent dir. so just move them to a name with
ERCC in them, download, and move back instead.

> fixed longstanding typo "Unable find any fastqs or bams...
- add the --logInfo flag to expression_ercc docker to hopefully
get rid of debug output for real

- fix qc_ercc - wasn't properly giving it the path to the reference file
removed a wayward do_ercc (should be ercc) that caused pizzly to crash
and 1 more thing.
mostly works.
(this change applies to both standard and ERCC-transcript runs)

fix situation where fusion would hang indefinitely if it didn't generate proper output and instead left behind a _STARtmp folder with a named pipe inside it -- fab would try to download the pipe and it would never say it was done.

With this version -- if it doesn't find any fusion output files at all, it will accept that and continue on with variants and jfkm before moving to the next sample.

this is the version of the fabfile i am testing right now
- ERCC - run expression and QC only - skip  pizzly, fusions, jfkm, variants.
(However the ERCC toggles are still within those steps if we change and want to run them.)

Non-ERCC Change:
If fusion fails, the pipeline will continue onward and make a note at the end
ERCC-aware pipeline: add documentation (thops#466)
add notes about acceptable fastq names
separated out make from git clone to hopefully clarify that its not mandatory
@e-t-k e-t-k merged commit 74e4034 into master Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant