This repository contains the analysis steps associated with the paper "Semantically Rich Local Dataset Generation for Explainable AI in Genomics", accepted at GECCO 2024, which introduces a new approach for generating local synthetic genomic datasets based on grammars of sequence perturbations.
The code used to generate the figures of the manuscript is available in the figures.ipynb
notebook.
To reproduce the analysis, there are README files within each section with the steps employed. It is important to note that our evolutionary searches were conducted on a specific GPU model (NVIDIA GeForce RTX 3090) within a local server environment. Given this hardware specificity and the time constraint of 5 minutes per experiment, the exact replication of results may vary when using different hardware setups.
A full copy of this repository and the generated datasets are available at Zenodo.