EVOSeq

Overview

EVOSeq is an evolutionary algorithm designed to search for common nucleotide motifs in given FASTA sequences. It provides a flexible, modular approach to motif discovery, allowing users to customize key parameters such as population size, mutation probability, and reproduction operators.

What is an Evolutionary Algorithm?

An evolutionary algorithm is a type of optimization algorithm inspired by the principles of natural selection. It iteratively improves a population of candidate solutions (motifs) through selection, reporoduction and mutation, favoring solutions that better match the given dataset.

Features

Easy to read, flexible, and relatively fast
User-customizable parameters, including population size, individual motif length, mutation rate, and reproduction operators
Modular implementation allows easy modifications and extensions
Supports optional comparative analysis, distinguishing motifs between different organisms

Input & Output

Input

FASTA file of sequences (positive set) – the algorithm searches for common motifs in these sequences.
(Optional) FASTA file for comparison (negative set) – used to check how discovered motifs differentiate between datasets.
(Optional) FASTA file for initial population generation – if not provided, motifs are generated randomly.

Output

SVG Report – visualizes the distribution of discovered motifs across positive and (optionally) negative sets.
CSV File - dicovered motifs with their scores

Parameter Description

Key Algorithm Parameters

population_size – Number of motifs per generation. Smaller values (<50) speed up execution but may lose diversity, while larger values (≥100) improve diversity but increase computation time.
len_range – Tuple (min_length, max_length) defining the motif size. Smaller values may overfit data, while larger values may struggle to generalize. Recommended starting value: (5,8).
match_reward – Score assigned when a motif appears in the dataset (default: 10). Higher values increase the likelihood of motif survival in future generations.
len_reward – Rewards longer motifs to counterbalance the natural prevalence of shorter motifs. Options: quadratic (default), linear, linear_additive, or custom implementations.
mutation_prob – Probability of single-point mutation per generation (0-1, default: 0.1). Prevents premature convergence by introducing random variation.
crossover_point – Integer value defining the split point for crossover reproduction. Must be smaller than the minimum motif length.
tournament_size – Number of motifs participating in selection tournaments. Smaller values allow weaker motifs to survive; larger values accelerate convergence but may reduce diversity.
evaluation_size – Number of randomly selected sequences used for motif evaluation. Larger values improve accuracy but slow down computation. Default: 100.

Installation

Prerequisites

Python 3.12.2
Required Python libraries (install using pip):
```
pip install -r requirements.txt
```

Clone the repository:

git clone https://github.com/yourusername/EVOSeq.git
cd EVOSeq

Execution

Modify parameters and provide input files in main.py.
Run the algorithm:
```
python3 main.py
```

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
example_data		example_data
src		src
.DS_Store		.DS_Store
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EVOSeq

Overview

What is an Evolutionary Algorithm?

Features

Input & Output

Input

Output

Parameter Description

Key Algorithm Parameters

Installation

Prerequisites

Execution

About

Releases

Packages

Languages

karatedava/EVOSeq

Folders and files

Latest commit

History

Repository files navigation

EVOSeq

Overview

What is an Evolutionary Algorithm?

Features

Input & Output

Input

Output

Parameter Description

Key Algorithm Parameters

Installation

Prerequisites

Execution

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages