-
Notifications
You must be signed in to change notification settings - Fork 83
CNV consensus (1 of n): Split large files into small sample files #288
CNV consensus (1 of n): Split large files into small sample files #288
Conversation
I think this will require that we run this from the root. I'm testing that out on your branch.
Hi @fingerfen the changes I added in 773c134 are somethings we've found helpful during development in this project. Specifically, what I've added in lines 6-11 of the shell script will make it such that the working directory of the script is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few suggestions on switching parsing to argparse. I'm going to pull from your branch and make the changes. I also had a question around why you're filtering/making empty files for those with more than 2500 CNVs. Do you think those are bad, or just time consuming to process?
## Nhat Duong | ||
## November, 22 2019 | ||
|
||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small pep8 issue: import order should be standard first, then third party, then local.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cgreene for the 2500 CNVs question, we are considering these to be noisy/poor quality samples (this came from what GISTIC uses as a cutoff for noisy samples).
import os | ||
|
||
######### ASSUMPTIONS ######## | ||
# 1) Files are feed into stdout as manta, cnvkit, then freec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is using the command line arguments so not stdin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize, I will fix that.
# 5) pval to filter out for freec is 0.01 | ||
|
||
|
||
## Get the list of file names from stdin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relying on order can be brittle and opaque. argparse
was designed for this.
analyses/copy_number_consensus_call/src/scripts/merged_to_individual_files.py
Show resolved
Hide resolved
|
||
|
||
## Make the Snakemake config file. Write all of the sample names into the config file | ||
with open('../../scratch/config_snakemake.yaml', 'w') as file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code could be more broadly useful with no downsides if this gets printed to stdout and then your bash script could redirect it to a file. This would let you point it elsewhere if you used this code in the future. An alternative would be to make this path a commandline argument.
file.write('size_cutoff: 3000' + '\n') | ||
file.write('freec_pval: 0.01' + '\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to earlier comment, can we make these command line args?
Sorry for the late reply.
I also have a question, what is the procedure for resubmitting my code changes? Is it prefered that I make the changes here or do I make the changes locally and resubmit another pull request? Thank you |
I made a PR into your repo. If you accept that this will update and be
ready to go.
…On Sun, Nov 24, 2019, 10:20 PM Nhat Duong ***@***.***> wrote:
Sorry for the late reply.
Below are the changes that I will make to the script:
1. Change import orders
2. Change the comment on line 11
3. Use argparse to parse the command-line arguments
4. Line 99, change the output of the snakemake config file to output
to stdout
5. Have CNV number < 2500, CNVs size cutoff, freec pval, and orders of
input files as command line args
I also have a question, what is the procedure for resubmitting my code
changes? Is it prefered that I make the changes here or do I make the
changes locally and resubmit another pull request?
Thank you
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#288?email_source=notifications&email_token=AAEEPM7KJ43ILT4V4SUBWPLQVNAAVA5CNFSM4JQXKJH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFA7EUY#issuecomment-557970003>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEEPMZXXT3NN4UDHTAZGJDQVNAAVANCNFSM4JQXKJHQ>
.
|
In this case, you can accept @cgreene's PR, and you will be ready to go. That will make the changes in the branch associated with the PR on github. You will then have to In general, the procedure for making changes in a pull request is to make the changes in your local branch, Did I explain that well enough? Let me know if you have any followup questions. |
pep8 / argparse / constants
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates!
Just some very minor edits, to keep line lengths shorter and switch to use os.path.join()
throughout.
analyses/copy_number_consensus_call/src/scripts/merged_to_individual_files.py
Outdated
Show resolved
Hide resolved
analyses/copy_number_consensus_call/src/scripts/merged_to_individual_files.py
Outdated
Show resolved
Hide resolved
analyses/copy_number_consensus_call/src/scripts/merged_to_individual_files.py
Outdated
Show resolved
Hide resolved
analyses/copy_number_consensus_call/src/scripts/merged_to_individual_files.py
Outdated
Show resolved
Hide resolved
analyses/copy_number_consensus_call/src/scripts/merged_to_individual_files.py
Outdated
Show resolved
Hide resolved
analyses/copy_number_consensus_call/src/scripts/merged_to_individual_files.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am approving / please @fingerfen accept @jashapiro's suggested changes and then we will merge!
Co-Authored-By: jashapiro <jashapiro@gmail.com>
Co-Authored-By: jashapiro <jashapiro@gmail.com>
…vidual_files.py Co-Authored-By: jashapiro <jashapiro@gmail.com>
…vidual_files.py Co-Authored-By: jashapiro <jashapiro@gmail.com>
…vidual_files.py Co-Authored-By: jashapiro <jashapiro@gmail.com>
…vidual_files.py Co-Authored-By: jashapiro <jashapiro@gmail.com>
…vidual_files.py Co-Authored-By: jashapiro <jashapiro@gmail.com>
…vidual_files.py Co-Authored-By: jashapiro <jashapiro@gmail.com>
Alright @fingerfen, I am going to get this merged. Congratulations on getting your first pull request through and thank you for this contribution! In preparation for your next pull request, you will want to create a new branch from your updated master branch. The instructions for keeping your master branch synced are in step 4 of this section of the contributing guidelines: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/CONTRIBUTING.md#filing-a-pull-request-from-your-own-branch If you have any questions as you are preparing your next pull request, please don’t hesitate to reach out. |
Purpose/implementation Section
To take in CNVs calls from Manta, Cnvkit, and FreeC and split them into their own directories. Each caller directory would have a file of CNVs for each sample that the caller has.
Also, to compose a Snakemake config file for down stream analysis.
What was your approach?
Use python3 and Pandas to split big files into tiny files.
What GitHub issue does your pull request address?
Issue #128