Repository containing code and analyses for the paper "Human systemic epigenetic variants are implicated in neurodevelopmental and metabolic disorders".
-
probe_collection/
- Scripts to search PubMed articles and extract probes for disease categoriespubmed_search_pipeline.ipynb
- Main script to collect, scrape, and extract probes from PubMed articleshandle_zip.py
- Helper script to handle zip filesget_probe_supplementary.py
- Helper script to extract probes from articles' supplementary files
-
permutation_testing/
- Suite for permutation testing to evaluate the statistical significance of CoRSIV enrichment, refer topermutation_testing/README.md
for more details. -
corsiv_regions/
- Scripts and data to merge previously identified SIV regions into a unified list of CoRSIVs.CoRSIV_annotation.ipynb
- Main script to annotate and merge SIV regions.SIV.hg38.bed
/ME.hg38.bed
/ESS.hg38.bed
/corsiv2019.txt
- input SIV, ME, ESS, and CoRSIV regions.
-
controls/
- Scripts to generate control regions / probes.generate_lookup.ipynb
- Generate lookup table on a chromosome-by-chromosome basis that are later used to sample control regions from.sample_from_lookup.ipynb
- Sample control regions from lookup table based on CoRSIV metrics.process_control.ipynb
- Post-processing of control regions.
-
GSEA.ipynb
- Scripts to perform gene-set enrichment analysis on CoRSIVs. -
median_iir_icc.ipynb
- Scripts to calculate median IRR and ICC values for different region types and categories. -
main_figure_code.ipynb
- Code to generate main figures. -
supplementary_figure_code.ipynb
- Code to generate supplementary figures. -
util.py
- Utility functions used in the above scripts.
- Python 3.7+
- High-performance computing environment recommended for probe extraction and permutation testing.