Protein Engineering with Large Language Models

This repository contains source code developed as part of my diploma thesis available at https://dspace.cvut.cz/handle/10467/115759.

The three proposed MLDE methods are implemented in code/de.
The two tradintional DE benchmarks are implemented in code/single_mutation_walk and code/recombine_mutation.
Folder code/llm contains code from initial exploratory experiments, which is not essential to any of the methods.
Folder code/plot and code/dimred contain code used to generate included graphics.

Installation

To start using the proposed MLDE methods implemented in code/de, please follow these steps:

Install Julia.
Install Conda.
In code/de/setup_pycall.jl set CONDA_PATH to path to the Conda root folder.
In bash run:

cd code/de
source setup.sh

Acknowledgements

All of the proposed MLDE methods require a pre-trained protein language model as an embedding extractor.
If you use the ESM-1b model, cite the original paper.

ESM-1b: Alexander Rives et al. “Biological structure and function emerge from scaling unsu- pervised learning to 250 million protein sequences”. In: Proceedings of the National Academy of Sciences 118.15 (2021), e2016239118.

The used datasets are included in data. If you use them, don't forget to cite the original papers.
Correct citations are included with each dataset in a CITE_AS.txt file with corresponding BibTeX template.

GB1: Nicholas C Wu et al. “Adaptation in protein fitness landscapes is facilitated by indirect paths”. In: Elife 5 (2016), e16965.
PhoQ: Anna I Podgornaia and Michael T Laub. “Pervasive degeneracy and epistasis in a protein-protein interface”. In: Science 347.6222 (2015), pp. 673–677.

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
code		code
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Protein Engineering with Large Language Models

Installation

Acknowledgements

About

Releases

Packages

Languages

License

soldatmat/PELLM

Folders and files

Latest commit

History

Repository files navigation

Protein Engineering with Large Language Models

Installation

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages