Skip to content

Code to reproduce the results from the GECCO'24 paper on generating synthetic datasets in genomicsned in the benchmark of prediction tools to predict Hypertrophic Cardiomyopathy associated variants

Notifications You must be signed in to change notification settings

PedroBarbosa/Synthetic_datasets_generation

Repository files navigation

DOI

Generation of synthetic genomics datasets

This repository contains the analysis steps associated with the paper "Semantically Rich Local Dataset Generation for Explainable AI in Genomics", accepted at GECCO 2024, which introduces a new approach for generating local synthetic genomic datasets based on grammars of sequence perturbations.

The code used to generate the figures of the manuscript is available in the figures.ipynb notebook.

To reproduce the analysis, there are README files within each section with the steps employed. It is important to note that our evolutionary searches were conducted on a specific GPU model (NVIDIA GeForce RTX 3090) within a local server environment. Given this hardware specificity and the time constraint of 5 minutes per experiment, the exact replication of results may vary when using different hardware setups.

A full copy of this repository and the generated datasets are available at Zenodo.

About

Code to reproduce the results from the GECCO'24 paper on generating synthetic datasets in genomicsned in the benchmark of prediction tools to predict Hypertrophic Cardiomyopathy associated variants

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published