Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

CNV consensus (7 of 6): Remove duplicated coordinates and NULLs #416

Merged
merged 76 commits into from
Jan 10, 2020

Conversation

nhatduongnn
Copy link
Contributor

Purpose/implementation Section

Remove duplicated coordinates and NULLs in the result file

What GitHub issue does your pull request address?

#128

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.
  • This analysis is recorded in the table in analyses/README.md.

Duong and others added 30 commits December 18, 2019 03:51
Co-Authored-By: jashapiro <josh.shapiro@ccdatalab.org>
Co-Authored-By: jashapiro <josh.shapiro@ccdatalab.org>
Co-Authored-By: jashapiro <josh.shapiro@ccdatalab.org>
Co-Authored-By: jashapiro <josh.shapiro@ccdatalab.org>
…t_calling_updated.py

Co-Authored-By: jashapiro <josh.shapiro@ccdatalab.org>
…t_calling_updated.py

Co-Authored-By: jashapiro <josh.shapiro@ccdatalab.org>
Co-Authored-By: jashapiro <josh.shapiro@ccdatalab.org>
Co-Authored-By: jashapiro <josh.shapiro@ccdatalab.org>
Co-Authored-By: jashapiro <josh.shapiro@ccdatalab.org>
Co-Authored-By: jashapiro <josh.shapiro@ccdatalab.org>
Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @fingerfen. This looks good. I suggested sorting the coordinates before writing, and a minor reversion to fix, but that's about it.

Thank you for your contributions!

nhatduongnn and others added 7 commits January 8, 2020 21:01
Co-Authored-By: jashapiro <josh.shapiro@ccdatalab.org>
…L_entries.py

Co-Authored-By: jashapiro <josh.shapiro@ccdatalab.org>
…L_entries.py

Co-Authored-By: jashapiro <josh.shapiro@ccdatalab.org>
…L_entries.py

Co-Authored-By: jashapiro <josh.shapiro@ccdatalab.org>
…L_entries.py

Co-Authored-By: jashapiro <josh.shapiro@ccdatalab.org>
@nhatduongnn
Copy link
Contributor Author

@jashapiro Thanks for catching the reversion. That was not on purpose. Thank you for your effort and quick responses. Also, since we are finished here, I just want to revisit the 3rd (3 of n) PR that I made. Recall you wanted to see if Bedtools Subtract could be a quicker alternative to my Python script. Do you think we should go back at this point to take a look and make changes to that? If I recall correctly, that would be the only part left to improve in this pipeline.

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, with just a fix to potential sorting error to add (and a format suggestion). The sorting fix would mean rerunning the last step though to generate new output.

On the topic of updating step 3, yes, I think that is worth doing (in a separate PR). I looked back at some test code I had written, and it looks like you should be able to use:

"bedtools subtract -N -a {input.bedfile} -b {input.bad_list} -f 0.5 > {output.filtered}"

in place of your get_rid_bad_segments.py script, assuming the input files are actually bedfiles, which should just mean replacing space delimiters with tabs in the intermediate files (making them true bedfiles rather than pseudo-bed).

nhatduongnn and others added 4 commits January 9, 2020 12:28
…L_entries.py

Co-Authored-By: jashapiro <josh.shapiro@ccdatalab.org>
…L_entries.py

Co-Authored-By: jashapiro <josh.shapiro@ccdatalab.org>
Co-Authored-By: jashapiro <josh.shapiro@ccdatalab.org>
@jaclyn-taroni
Copy link
Member

@fingerfen - it looks like the last addition before you merged in the master branch (7b500c4) failed on the copy number consensus step: https://circleci.com/gh/AlexsLemonade/OpenPBTA-analysis/2390

Here's the full output: build_2390_step_116_container_0.txt

@nhatduongnn
Copy link
Contributor Author

I just tested the line break fix that @jashapiro just did and it worked on my local. I am not sure why it is failing on CI at the momment.

@jashapiro
Copy link
Member

I just tested the line break fix that @jashapiro just did and it worked on my local. I am not sure why it is failing on CI at the momment.

It didn't really fail... I just merged in master before it finished running.

@jashapiro
Copy link
Member

@fingerfen I am just waiting on the update to cnv_consensus.tsv to approve this and get it merged in.

@jashapiro
Copy link
Member

@fingerfen I'm guessing the last commit (f72938a) was to respond to my request for the updated version, but that one doesn't seem to be the latest... it still uses sample instead of Biospecimen; I'm guessing it is just because you didn't rerun the merge step that sets the header names.

@nhatduongnn
Copy link
Contributor Author

@jashapiro I apologize. I didn't re-run the merging step. I just re-run it and this new file should have Biospecimen instead of sample

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! We'll merge as soon as tests finish again.

@jaclyn-taroni jaclyn-taroni merged commit e7009a8 into AlexsLemonade:master Jan 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants