CNV consensus (1 of n): Split large files into small sample files #288

nhatduongnn · 2019-11-22T23:54:17Z

Purpose/implementation Section

To take in CNVs calls from Manta, Cnvkit, and FreeC and split them into their own directories. Each caller directory would have a file of CNVs for each sample that the caller has.
Also, to compose a Snakemake config file for down stream analysis.

What was your approach?

Use python3 and Pandas to split big files into tiny files.

What GitHub issue does your pull request address?

Issue #128

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

I think this will require that we run this from the root. I'm testing that out on your branch.

jaclyn-taroni · 2019-11-23T11:40:36Z

Hi @fingerfen the changes I added in 773c134 are somethings we've found helpful during development in this project. Specifically, what I've added in lines 6-11 of the shell script will make it such that the working directory of the script is analyses/copy_number_consensus_call. Some background on the other additions can be found here.

cgreene

I have a few suggestions on switching parsing to argparse. I'm going to pull from your branch and make the changes. I also had a question around why you're filtering/making empty files for those with more than 2500 CNVs. Do you think those are bad, or just time consuming to process?

cgreene · 2019-11-23T14:14:57Z

analyses/copy_number_consensus_call/src/scripts/merged_to_individual_files.py

+## Nhat Duong
+## November, 22 2019
+
+import numpy as np


Small pep8 issue: import order should be standard first, then third party, then local.

@cgreene for the 2500 CNVs question, we are considering these to be noisy/poor quality samples (this came from what GISTIC uses as a cutoff for noisy samples).

cgreene · 2019-11-23T14:15:18Z

analyses/copy_number_consensus_call/src/scripts/merged_to_individual_files.py

+import os
+
+######### ASSUMPTIONS ########
+# 1) Files are feed into stdout as manta, cnvkit, then freec


It looks like this is using the command line arguments so not stdin.

I apologize, I will fix that.

cgreene · 2019-11-23T14:17:23Z

analyses/copy_number_consensus_call/src/scripts/merged_to_individual_files.py

+# 5) pval to filter out for freec is 0.01
+
+
+## Get the list of file names from stdin


Relying on order can be brittle and opaque. argparse was designed for this.

analyses/copy_number_consensus_call/src/scripts/merged_to_individual_files.py

cgreene · 2019-11-23T14:20:40Z

analyses/copy_number_consensus_call/src/scripts/merged_to_individual_files.py

+
+
+## Make the Snakemake config file. Write all of the sample names into the config file
+with open('../../scratch/config_snakemake.yaml', 'w') as file:


I think this code could be more broadly useful with no downsides if this gets printed to stdout and then your bash script could redirect it to a file. This would let you point it elsewhere if you used this code in the future. An alternative would be to make this path a commandline argument.

jashapiro · 2019-11-23T15:03:01Z

analyses/copy_number_consensus_call/src/scripts/merged_to_individual_files.py

+    file.write('size_cutoff: 3000' + '\n')
+    file.write('freec_pval: 0.01' + '\n')


Similar to earlier comment, can we make these command line args?

nhatduongnn · 2019-11-25T03:20:42Z

Sorry for the late reply.
Below are the changes that I will make to the script:

Change import orders
Change the comment on line 11
Use argparse to parse the command-line arguments
Line 99, change the output of the snakemake config file to output to stdout
Have CNV number < 2500, CNVs size cutoff, freec pval, and orders of input files as command line args

I also have a question, what is the procedure for resubmitting my code changes? Is it prefered that I make the changes here or do I make the changes locally and resubmit another pull request?

Thank you

cgreene · 2019-11-25T09:50:01Z

I made a PR into your repo. If you accept that this will update and be ready to go.

…

On Sun, Nov 24, 2019, 10:20 PM Nhat Duong ***@***.***> wrote: Sorry for the late reply. Below are the changes that I will make to the script: 1. Change import orders 2. Change the comment on line 11 3. Use argparse to parse the command-line arguments 4. Line 99, change the output of the snakemake config file to output to stdout 5. Have CNV number < 2500, CNVs size cutoff, freec pval, and orders of input files as command line args I also have a question, what is the procedure for resubmitting my code changes? Is it prefered that I make the changes here or do I make the changes locally and resubmit another pull request? Thank you — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#288?email_source=notifications&email_token=AAEEPM7KJ43ILT4V4SUBWPLQVNAAVA5CNFSM4JQXKJH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFA7EUY#issuecomment-557970003>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEEPMZXXT3NN4UDHTAZGJDQVNAAVANCNFSM4JQXKJHQ> .

jashapiro · 2019-11-25T12:18:42Z

I also have a question, what is the procedure for resubmitting my code changes? Is it prefered that I make the changes here or do I make the changes locally and resubmit another pull request?

In this case, you can accept @cgreene's PR, and you will be ready to go. That will make the changes in the branch associated with the PR on github. You will then have to pull those back down to your local branch to keep everything in sync.

In general, the procedure for making changes in a pull request is to make the changes in your local branch, commit them, thenpush those to the github copy of the branch. The PR will then automatically be updated to the most current version of the branch. (You can see that in the history of @cgreene's PR to your branch, where he made a few commits after the initial filing of the PR, and they are reflected in the version you would merge)

Did I explain that well enough? Let me know if you have any followup questions.

pep8 / argparse / constants

jashapiro

Thanks for the updates!
Just some very minor edits, to keep line lengths shorter and switch to use os.path.join() throughout.

analyses/copy_number_consensus_call/run_consensus_call.sh

analyses/copy_number_consensus_call/src/scripts/merged_to_individual_files.py

analyses/copy_number_consensus_call/run_consensus_call.sh

cgreene

I am approving / please @fingerfen accept @jashapiro's suggested changes and then we will merge!

Co-Authored-By: jashapiro <jashapiro@gmail.com>

…vidual_files.py Co-Authored-By: jashapiro <jashapiro@gmail.com>

jaclyn-taroni · 2019-12-03T11:33:05Z

Alright @fingerfen, I am going to get this merged. Congratulations on getting your first pull request through and thank you for this contribution!

In preparation for your next pull request, you will want to create a new branch from your updated master branch. The instructions for keeping your master branch synced are in step 4 of this section of the contributing guidelines: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/CONTRIBUTING.md#filing-a-pull-request-from-your-own-branch

If you have any questions as you are preparing your next pull request, please don’t hesitate to reach out.

add files

27c7249

nhatduongnn changed the title ~~add files~~ Split large files into small sample files Nov 22, 2019

jaclyn-taroni requested a review from jashapiro November 23, 2019 00:14

cgreene and others added 2 commits November 23, 2019 05:53

test path change

f57bdf3

I think this will require that we run this from the root. I'm testing that out on your branch.

Run shell script as if called from dir it is in

773c134

cgreene reviewed Nov 23, 2019

View reviewed changes

jashapiro reviewed Nov 23, 2019

View reviewed changes

cgreene added 2 commits November 23, 2019 10:04

pep8 / argparse / constants

844ace7

switch to writing files

7e44f25

jashapiro changed the title ~~Split large files into small sample files~~ CNV consensus (1 of n): Split large files into small sample files Nov 23, 2019

cgreene and others added 3 commits November 26, 2019 04:45

update program description

cac1592

Merge pull request #1 from cgreene/pr/288

dcca0a8

pep8 / argparse / constants

Merge branch 'master' into merge_to_individual

60174c3

jashapiro approved these changes Nov 27, 2019

View reviewed changes

cgreene approved these changes Nov 27, 2019

View reviewed changes

nhatduongnn and others added 10 commits November 27, 2019 20:17

Merge branch 'master' into merge_to_individual

887d4a7

Update analyses/copy_number_consensus_call/run_consensus_call.sh

729001d

Co-Authored-By: jashapiro <jashapiro@gmail.com>

Update analyses/copy_number_consensus_call/run_consensus_call.sh

de6c735

Co-Authored-By: jashapiro <jashapiro@gmail.com>

Update analyses/copy_number_consensus_call/src/scripts/merged_to_indi…

1733254

…vidual_files.py Co-Authored-By: jashapiro <jashapiro@gmail.com>

Update analyses/copy_number_consensus_call/src/scripts/merged_to_indi…

24110b4

…vidual_files.py Co-Authored-By: jashapiro <jashapiro@gmail.com>

Update analyses/copy_number_consensus_call/src/scripts/merged_to_indi…

4ab9654

…vidual_files.py Co-Authored-By: jashapiro <jashapiro@gmail.com>

Update analyses/copy_number_consensus_call/src/scripts/merged_to_indi…

891e57d

…vidual_files.py Co-Authored-By: jashapiro <jashapiro@gmail.com>

Update analyses/copy_number_consensus_call/src/scripts/merged_to_indi…

8fd0487

…vidual_files.py Co-Authored-By: jashapiro <jashapiro@gmail.com>

Update analyses/copy_number_consensus_call/src/scripts/merged_to_indi…

e8b2151

…vidual_files.py Co-Authored-By: jashapiro <jashapiro@gmail.com>

Merge branch 'master' into merge_to_individual

eddbe48

jaclyn-taroni merged commit 9880be8 into AlexsLemonade:master Dec 3, 2019

jharenza mentioned this pull request Jan 13, 2020

Proposed Analysis: Copy number consensus calls #128

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CNV consensus (1 of n): Split large files into small sample files #288

CNV consensus (1 of n): Split large files into small sample files #288

nhatduongnn commented Nov 22, 2019

jaclyn-taroni commented Nov 23, 2019 •

edited

Loading

cgreene left a comment

cgreene Nov 23, 2019

jharenza Nov 23, 2019

cgreene Nov 23, 2019

nhatduongnn Nov 25, 2019

cgreene Nov 23, 2019

cgreene Nov 23, 2019

jashapiro Nov 23, 2019

nhatduongnn commented Nov 25, 2019

cgreene commented Nov 25, 2019 via email

jashapiro commented Nov 25, 2019

jashapiro left a comment

cgreene left a comment

jaclyn-taroni commented Dec 3, 2019

		# 5) pval to filter out for freec is 0.01


		## Get the list of file names from stdin



		## Make the Snakemake config file. Write all of the sample names into the config file
		with open('../../scratch/config_snakemake.yaml', 'w') as file:

		file.write('size_cutoff: 3000' + '\n')
		file.write('freec_pval: 0.01' + '\n')

CNV consensus (1 of n): Split large files into small sample files #288

CNV consensus (1 of n): Split large files into small sample files #288

Conversation

nhatduongnn commented Nov 22, 2019

Purpose/implementation Section

What was your approach?

What GitHub issue does your pull request address?

jaclyn-taroni commented Nov 23, 2019 • edited Loading

cgreene left a comment

Choose a reason for hiding this comment

cgreene Nov 23, 2019

Choose a reason for hiding this comment

jharenza Nov 23, 2019

Choose a reason for hiding this comment

cgreene Nov 23, 2019

Choose a reason for hiding this comment

nhatduongnn Nov 25, 2019

Choose a reason for hiding this comment

cgreene Nov 23, 2019

Choose a reason for hiding this comment

cgreene Nov 23, 2019

Choose a reason for hiding this comment

jashapiro Nov 23, 2019

Choose a reason for hiding this comment

nhatduongnn commented Nov 25, 2019

cgreene commented Nov 25, 2019 via email

jashapiro commented Nov 25, 2019

jashapiro left a comment

Choose a reason for hiding this comment

cgreene left a comment

Choose a reason for hiding this comment

jaclyn-taroni commented Dec 3, 2019

jaclyn-taroni commented Nov 23, 2019 •

edited

Loading