HolistIC: Unambiguous prediction of multiple double minute chromosome architectures

Usage: python holistic.py <in.amplicons.bed> <in.gothic.csv> <float: q-value threshold> <autosomes only Y|N>
		in.amplicons.bed:      BED file containing coordinates of identified eccDNA segments (amplicons). Column 4 should uniquely identify the contig to which the amplicon belongs.
		in.gothic.csv:         CSV file containing GOTHiChicup output dumped from R data frame.
		q-value threshold:     Count interaction as significant if q-value is less than or equal this value. (Recommended: 0.05).
		autosomes?:            If Y, HolistIC will look for interactions on the whole genome including X and Y chromosomes.

Prerequisites

Tested with Python 2.7 but Python 3+ should work too
Python Networkx: https://networkx.github.io/documentation/stable/install.html
matplotlib: https://matplotlib.org/users/installing.html#installation-guide
rpy2: https://pypi.org/project/rpy2/

Quick Start

The sample BED file is provided in this repository. The sample CSV file can be downloaded here: https://s3.amazonaws.com/meh52-B1/Simulated.hic.gothic.results.csv.gz

python holistic.py dmfinder.HG38.wgs.results.SORTED.bed Simulated.hic.gothic.results.csv 0.05 Y

Input

The BED file contains WGS-based coordinates of contiguous regions of copy number amplification, organized by contig. The CSV file is the dumped output from the GOTHiChicup function of the GOTHiC package. The input to HolistIC is data assumed to come from the same individual.

BED file

The BED file must be a sequence of records of the following format:

chr	start	end	ID

Assume that we have the following NGS-based predicted double minutes (i.e. where we know the breakpoints of the amplicons and their adjacencies):

The BED file will contain the following records:

chr5	3000000		3300000		DM1
chr8	28000000	28500000	DM1
chr19	40000000	40200000	DM1
chr4	2000000		2300000		DM2
chr11	6000000		6800000		DM2
chr18	20000000	20200000	DM2

The BED file can be created using the output from programs that reconstruct circular eccDNA architectures (like AmpliconArchitect or CouGaR). HolistIC does not assume that all connected amplicons are within a circular contig, so incomplete predicted architectures can be considered too.

WARNING: Within each unique ID in the 4th column, the records must first be sorted in lexicographic order by chromosome, and then in ascending numeric order by the interval start position. This will not preserve the adjacencies provided in the BED file, but this does not adversely affect HolistIC's ability to segregate amplicons by contig. The input BED file can be sorted with the following command:

sort -k4,4 -k1,1 -k2,2 in.bed > in.sorted.bed

CSV file

We assume that significant chromatin interactions are determined with GOTHiC. The CSV file is the dumped output from an R dataframe generated by the GOTHiChicup function:

https://master.bioconductor.org/packages/release/bioc/manuals/GOTHiC/man/GOTHiC.pdf

Output

Assume that we have the following double minutes:

Due to ambiguities inherent in WGS-only predictions, the above double minutes could have resulted in the following input BED file, where all amplicons are erroneously assumed to be part of the same double minute (remember that the records are lexicographically sorted by chromosome and then sorted ascending by start position of interval):

chr1	8000000		8900000		DM1
chr3	7000000		7400000		DM1
chr5	3000000		3300000		DM1
chr6	17000000	17800000	DM1
chr18	12000000	13000000	DM1

This is due to ambiguity induced by the red amplicon since the WGS data shows a red-blue-yellow-red-orange-green-red cycle.

If the CSV file of Hi-C interactions is provided, HolistIC will attempt to segregate each amplicon (i.e. row in BED file) into its true double minute. In the ideal case, HolistIC would have the following output for the above input BED file (written to [input file name].disambig.bed):

chr1	8000000		8900000		DM1
chr3	7000000		7400000		DM1
chr5	3000000		3300000		DM1
chr6	17000000	17800000	DM2
chr18	12000000	13000000	DM2
chr5	3000000		3300000		DM2

Furthermore, HolistIC will create PNG image files for each identified maximal clique, which ostensibly refer to separate double minutes.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
README.md		README.md
dmfinder.HG38.wgs.results.SORTED.bed		dmfinder.HG38.wgs.results.SORTED.bed
holistic.py		holistic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HolistIC: Unambiguous prediction of multiple double minute chromosome architectures

Prerequisites

Quick Start

Input

BED file

CSV file

Output

About

Releases

Packages

Languages

mhayes20/HolistIC

Folders and files

Latest commit

History

Repository files navigation

HolistIC: Unambiguous prediction of multiple double minute chromosome architectures

Prerequisites

Quick Start

Input

BED file

CSV file

Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages