This repository has been archived by the owner on Jun 21, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 83
Reproducing copy number excluded regions #438
Labels
cnv
Related to or requires CNV data
data
in progress
Someone is working on this issue, but feel free to propose an alternative approach!
updated analysis
Comments
From @xiehongbo via email:
Using hg18->hg38 liftover, these correspond to:
Which are the regions as found in the provided file. I will update the analysis to include this as a starting point. |
jashapiro
added a commit
to jashapiro/OpenPBTA-analysis
that referenced
this issue
Jan 22, 2020
These regions are the ones defined by @hongboxie here: AlexsLemonade#438 (comment) Converted from hg18 to hg38
jaclyn-taroni
added a commit
that referenced
this issue
Jan 25, 2020
* add to Snakefile * updating fork * changed output path and name * implement segmean * implement segmean * add result file * add result files * add trailing line * fix .py * change Snakefile comment * change README.md * change README.md * Updates to file organization Removing `src` directory to unnest `scripts` and adding `ref` directory for genomic info files. * add alternative segdup generation Link and script to process downloaded file for segmental duplciations. * Updates to blacklist generation * Add IG regions These regions are the ones defined by @hongboxie here: #438 (comment) Converted from hg18 to hg38 * Add step to potentially fix overlapping dup del segments. * Notebook to look at consensus calls for overlaps * Add overlap pruning * Update output files Note that ordering has changed, but the actual differences between these files should be relatively small other than that. There are changes to the cnv_consensus.tsv file where segments that are not contained within the defined CNV are discarded but might have been retained before. * update readme * Add telomere definition file * Update blacklist generation script * Remove accidentally included notebook * Tried to clarify complicated bedtools step. * Update analyses/copy_number_consensus_call/scripts/remove_dup_NULL_overlap_entries.py Co-Authored-By: Candace Savonen <cansav09@gmail.com> * Update analyses/copy_number_consensus_call/scripts/remove_dup_NULL_overlap_entries.py Co-Authored-By: Candace Savonen <cansav09@gmail.com> * Add more clarifying comments * Add full exclusion list and remove outdated files * Update readmes * Updated output files. * Re-add previous blacklist * More descriptive excluded file name * Update filename Co-authored-by: Candace Savonen <cansav09@gmail.com> Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>
jaclyn-taroni
added a commit
that referenced
this issue
Jan 27, 2020
* add to Snakefile * updating fork * changed output path and name * implement segmean * implement segmean * add result file * add result files * add trailing line * fix .py * change Snakefile comment * change README.md * change README.md * Updates to file organization Removing `src` directory to unnest `scripts` and adding `ref` directory for genomic info files. * add alternative segdup generation Link and script to process downloaded file for segmental duplciations. * Updates to blacklist generation * Add IG regions These regions are the ones defined by @hongboxie here: #438 (comment) Converted from hg18 to hg38 * Add step to potentially fix overlapping dup del segments. * Notebook to look at consensus calls for overlaps * Add overlap pruning * Update output files Note that ordering has changed, but the actual differences between these files should be relatively small other than that. There are changes to the cnv_consensus.tsv file where segments that are not contained within the defined CNV are discarded but might have been retained before. * update readme * Add telomere definition file * Update blacklist generation script * Remove accidentally included notebook * Tried to clarify complicated bedtools step. * Update analyses/copy_number_consensus_call/scripts/remove_dup_NULL_overlap_entries.py Co-Authored-By: Candace Savonen <cansav09@gmail.com> * Update analyses/copy_number_consensus_call/scripts/remove_dup_NULL_overlap_entries.py Co-Authored-By: Candace Savonen <cansav09@gmail.com> * Add more clarifying comments * Add full exclusion list and remove outdated files * Update readmes * Updated output files. * Re-add previous blacklist * Add chromosome lengths file * Create file of neutral regions * Use hg.38.chrom.sizes * More descriptive excluded file name * Update filename * Sort chromosomes and remove alt from callable. * Fix sed command * Finish the rule to combine neutral regions. * Add output of bad callers * Bad caller summary notebook * Add output of neutral segments to the seg file Neutral segments (copy number 2) are included if they fall within a "callable region" which is one not covered by a large excluded region. When we add these back, we still exclude specimens where more than two callers 'failed' with high numbers of segments * remove working notebooks * Bug fixes * Unset X and Y copy number calls * Update README * Add callable regions to analyses/README.md * Simplify output file description in readme * Simplify file reading we don't need data types here, so keeping everything as strings simplifies, and removes potential errors from unexpected conversions from int to float * comment out status message * Move segfile step into snakemake * Fix filename in snakemake * Update results. * Update scratch dir handling Put all intermediate files in a defined scratch sub directory. * Update analyses/copy_number_consensus_call/scripts/bed_to_segfile.R Co-Authored-By: Jaclyn Taroni <jaclyn.n.taroni@gmail.com> * remove unused option. Co-authored-by: Candace Savonen <cansav09@gmail.com> Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
cnv
Related to or requires CNV data
data
in progress
Someone is working on this issue, but feel free to propose an alternative approach!
updated analysis
What data file(s) does this issue pertain to?
https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/analyses/copy_number_consensus_call/src/scripts/IGLL_telo_centromeric_region.txt
Put your question or report your issue here.
I am trying to reproduce the generation of the
IGLL_telo_centromeric_region.txt
file that is used for Copy number consensus generation #128. This file includes regions that are to be excluded from analysis due to high error rates. The telomeres and centromeres can be reproduced from UCSC data files, but I am confused by the immunoglobulin regions. The documentation points to http://penncnv.openbioinformatics.org/en/latest/misc/faq/, but it is not clear from where the enumerations of those regions is defined. Moreover, the regions defined there are for hg18:However, the regions in the
IGLL_telo_centromeric_region.txt
file do not seem to correspond to a liftOver of those regions to hg38Applying liftOver hg18->hg38, I get the following regions:
The nearest equivalent regions in
IGLL_telo_centromeric_region.txt
seem to be these:(There are only three here, presumably because the other chr14 region falls near a telomere and is excluded that way?)
In addition,
IGLL_telo_centromeric_region.txt
includes the regionchr21:3100000-7000000
which is listed as astalk
(acrocentric arm) the UCSC cytoband file, but no otherstalk
regions are excluded, so I was not sure why this one was.Can @fingerfen or @xiehongbo provide some additional information?
The text was updated successfully, but these errors were encountered: