Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

CNV consensus (1 of n): Split large files into small sample files #288

Merged
merged 18 commits into from
Dec 3, 2019
Merged

CNV consensus (1 of n): Split large files into small sample files #288

merged 18 commits into from
Dec 3, 2019

Conversation

nhatduongnn
Copy link
Contributor

Purpose/implementation Section

To take in CNVs calls from Manta, Cnvkit, and FreeC and split them into their own directories. Each caller directory would have a file of CNVs for each sample that the caller has.
Also, to compose a Snakemake config file for down stream analysis.

What was your approach?

Use python3 and Pandas to split big files into tiny files.

What GitHub issue does your pull request address?

Issue #128

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

@nhatduongnn nhatduongnn changed the title add files Split large files into small sample files Nov 22, 2019
cgreene and others added 2 commits November 23, 2019 05:53
I think this will require that we run this from the root. I'm testing that out on your branch.
@jaclyn-taroni
Copy link
Member

jaclyn-taroni commented Nov 23, 2019

Hi @fingerfen the changes I added in 773c134 are somethings we've found helpful during development in this project. Specifically, what I've added in lines 6-11 of the shell script will make it such that the working directory of the script is analyses/copy_number_consensus_call. Some background on the other additions can be found here.

Copy link
Collaborator

@cgreene cgreene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few suggestions on switching parsing to argparse. I'm going to pull from your branch and make the changes. I also had a question around why you're filtering/making empty files for those with more than 2500 CNVs. Do you think those are bad, or just time consuming to process?

## Nhat Duong
## November, 22 2019

import numpy as np
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small pep8 issue: import order should be standard first, then third party, then local.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cgreene for the 2500 CNVs question, we are considering these to be noisy/poor quality samples (this came from what GISTIC uses as a cutoff for noisy samples).

import os

######### ASSUMPTIONS ########
# 1) Files are feed into stdout as manta, cnvkit, then freec
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is using the command line arguments so not stdin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize, I will fix that.

# 5) pval to filter out for freec is 0.01


## Get the list of file names from stdin
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relying on order can be brittle and opaque. argparse was designed for this.



## Make the Snakemake config file. Write all of the sample names into the config file
with open('../../scratch/config_snakemake.yaml', 'w') as file:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code could be more broadly useful with no downsides if this gets printed to stdout and then your bash script could redirect it to a file. This would let you point it elsewhere if you used this code in the future. An alternative would be to make this path a commandline argument.

Comment on lines 113 to 114
file.write('size_cutoff: 3000' + '\n')
file.write('freec_pval: 0.01' + '\n')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to earlier comment, can we make these command line args?

@jashapiro jashapiro changed the title Split large files into small sample files CNV consensus (1 of n): Split large files into small sample files Nov 23, 2019
@nhatduongnn
Copy link
Contributor Author

Sorry for the late reply.
Below are the changes that I will make to the script:

  1. Change import orders
  2. Change the comment on line 11
  3. Use argparse to parse the command-line arguments
  4. Line 99, change the output of the snakemake config file to output to stdout
  5. Have CNV number < 2500, CNVs size cutoff, freec pval, and orders of input files as command line args

I also have a question, what is the procedure for resubmitting my code changes? Is it prefered that I make the changes here or do I make the changes locally and resubmit another pull request?

Thank you

@cgreene
Copy link
Collaborator

cgreene commented Nov 25, 2019 via email

@jashapiro
Copy link
Member

I also have a question, what is the procedure for resubmitting my code changes? Is it prefered that I make the changes here or do I make the changes locally and resubmit another pull request?

In this case, you can accept @cgreene's PR, and you will be ready to go. That will make the changes in the branch associated with the PR on github. You will then have to pull those back down to your local branch to keep everything in sync.

In general, the procedure for making changes in a pull request is to make the changes in your local branch, commit them, thenpush those to the github copy of the branch. The PR will then automatically be updated to the most current version of the branch. (You can see that in the history of @cgreene's PR to your branch, where he made a few commits after the initial filing of the PR, and they are reflected in the version you would merge)

Did I explain that well enough? Let me know if you have any followup questions.

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates!
Just some very minor edits, to keep line lengths shorter and switch to use os.path.join() throughout.

Copy link
Collaborator

@cgreene cgreene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am approving / please @fingerfen accept @jashapiro's suggested changes and then we will merge!

nhatduongnn and others added 10 commits November 27, 2019 20:17
Co-Authored-By: jashapiro <jashapiro@gmail.com>
Co-Authored-By: jashapiro <jashapiro@gmail.com>
…vidual_files.py

Co-Authored-By: jashapiro <jashapiro@gmail.com>
…vidual_files.py

Co-Authored-By: jashapiro <jashapiro@gmail.com>
…vidual_files.py

Co-Authored-By: jashapiro <jashapiro@gmail.com>
…vidual_files.py

Co-Authored-By: jashapiro <jashapiro@gmail.com>
…vidual_files.py

Co-Authored-By: jashapiro <jashapiro@gmail.com>
…vidual_files.py

Co-Authored-By: jashapiro <jashapiro@gmail.com>
@jaclyn-taroni
Copy link
Member

Alright @fingerfen, I am going to get this merged. Congratulations on getting your first pull request through and thank you for this contribution!

In preparation for your next pull request, you will want to create a new branch from your updated master branch. The instructions for keeping your master branch synced are in step 4 of this section of the contributing guidelines: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/CONTRIBUTING.md#filing-a-pull-request-from-your-own-branch

If you have any questions as you are preparing your next pull request, please don’t hesitate to reach out.

@jaclyn-taroni jaclyn-taroni merged commit 9880be8 into AlexsLemonade:master Dec 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants