Skip to content

Commit

Permalink
Merge pull request #27 from gbouras13/dev
Browse files Browse the repository at this point in the history
v0.1.2
  • Loading branch information
gbouras13 authored Mar 6, 2024
2 parents 7f7b492 + 973580a commit d017ef9
Show file tree
Hide file tree
Showing 14 changed files with 190 additions and 102 deletions.
7 changes: 7 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# History

0.1.2 (2024-03-06)
------------------

* Fixes `phold compare` cds_id issue where input file was FASTA
* Fixes issues with `phold remote` where input file was FASTA
* Improved documentation with conda/mamba install

0.1.1 (2024-03-05)
------------------

Expand Down
49 changes: 28 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# phold - phage annotation using protein structures
[![Anaconda-Server Badge](https://anaconda.org/bioconda/phold/badges/version.svg)](https://anaconda.org/bioconda/phold)
[![Bioconda Downloads](https://img.shields.io/conda/dn/bioconda/phold)](https://img.shields.io/conda/dn/bioconda/phold)
[![PyPI version](https://badge.fury.io/py/phold.svg)](https://badge.fury.io/py/phold)
[![Downloads](https://static.pepy.tech/badge/phold)](https://pepy.tech/project/phold)

# phold - Phage Annotation using Protein Structures

`phold` is a sensitive annotation tool for bacteriophage genomes and metagenomes using protein structural homology.

Expand All @@ -16,7 +21,7 @@ Check out the `phold` tutorial at [https://phold.readthedocs.io/en/latest/tutori

# Table of Contents

- [phold - phage annotation using protein structures](#phold---phage-annotation-using-protein-structures)
- [phold - Phage Annotation using Protein Structures](#phold---phage-annotation-using-protein-structures)
- [Tutorial](#tutorial)
- [Table of Contents](#table-of-contents)
- [Documentation](#documentation)
Expand All @@ -33,27 +38,29 @@ Check out the full documentation at [https://phold.readthedocs.io](https://phold

# Installation

The only way to install `phold` is from source for now.
For more details (particularly if you are using a non-NVIDIA GPU), check out the [installation documentation](https://phold.readthedocs.io/en/latest/install/).

PyPI and conda installations will be available soon.
The best way to install `phold` is using [mamba](https://github.com/conda-forge/miniforge), as this will install [Foldseek](https://github.com/steineggerlab/foldseek) (the only non-Python dependency) along with the Python dependencies.

The only required non-Python dependency is `foldseek`. To install `phold` in a conda environment using [mamba](https://github.com/conda-forge/miniforge):
To install `phold` using [mamba](https://github.com/conda-forge/miniforge):

```bash
mamba create -n pholdENV -c conda-forge -c bioconda phold
```
mamba create -n pholdENV -c conda-forge -c bioconda pip foldseek python=3.11
conda activate pholdENV
git clone https://github.com/gbouras13/phold.git
cd phold
pip install -e .
```

To utilise `phold` with GPU, a GPU compatible version of `pytorch` must be installed.
To utilise `phold` with GPU, a GPU compatible version of `pytorch` must be installed. By default conda/mamba will install a CPU-only version.

Therefore, if you have an NVIDIA GPU, please try:

If it is not automatically installed via the pip installation, please see [this link](https://pytorch.org) for more instructions on how to install `pytorch`. If you have an older version of CUDA installed, then you might find [this link useful](https://pytorch.org/get-started/previous-versions/).
```bash
mamba create -n pholdENV -c conda-forge -c bioconda phold pytorch=*=cuda*
```

If you are having trouble with `pytorch` see [this link](https://pytorch.org) for more instructions. If you have an older version of CUDA installed, then you might find [this link useful](https://pytorch.org/get-started/previous-versions/).

Once `phold` is installed, to download and install the database run:

```
```bash
phold install
```

Expand All @@ -64,25 +71,25 @@ phold install
* `phold` takes a GenBank format file output from [pharokka](https://github.com/gbouras13/pharokka) as its input by default.
* If you are running `phold` on a local work station with GPU available, using `phold run` is recommended. It runs both `phold predict` and `phold compare`

```
``` bash
phold run -i tests/test_data/NC_043029.gbk -o test_output_phold -t 8
```

* If you do not have a GPU available, add `--cpu`
* If you do not have a GPU available, add `--cpu`.
* `phold run` will run in a reasonable time for small datasets with CPU only (e.g. <5 minutes for a 50kbp phage).
* However, `phold predict` will complete much faster if a GPU is available, and is necessary for large metagenomic datasets to run in a reasonable time.

* In a cluster environment, it is most efficient to run `phold` in 2 steps for optimal resource usage.

1. Predict the 3Di sequences with ProstT5 using `phold predict`. This is massively accelerated if a GPU available.

```
```bash
phold predict -i tests/test_data/NC_043029.gbk -o test_predictions
```

2. Compare the the 3Di sequences to the `phold` structure database with Foldseek using `phold compare`. This does not utilise a GPU.

```
```bash
phold compare -i tests/test_data/NC_043029.gbk --predictions_dir test_predictions -o test_output_phold -t 8
```

Expand All @@ -96,7 +103,7 @@ phold compare -i tests/test_data/NC_043029.gbk --predictions_dir test_prediction

# Usage

```
```bash
Usage: phold [OPTIONS] COMMAND [ARGS]...

Options:
Expand All @@ -114,7 +121,7 @@ Commands:
run phold predict then comapare all in one - GPU recommended
```

```
```bash
Usage: phold run [OPTIONS]

phold predict then comapare all in one - GPU recommended
Expand Down Expand Up @@ -154,7 +161,7 @@ Options:

`phold plot` will allow you to create Circos plots with [pyCirclize](https://github.com/moshi4/pyCirclize) for all your phage(s). For example:

```
```bash
phold plot -i tests/test_data/NC_043029_phold_output.gbk -o NC_043029_phold_plots -t '${Stenotrophomonas}$ Phage SMA6'
```

Expand Down
71 changes: 48 additions & 23 deletions docs/install.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,36 @@
# Installation

The only way to install `phold` is from source for now.
The best way to install `phold` is using [mamba](https://github.com/conda-forge/miniforge), as this will install [Foldseek](https://github.com/steineggerlab/foldseek) (the only non-Python dependency) along with the Python dependencies.

Pypi and (hopefully) conda installations will be available soon.
To install `phold` using [mamba](https://github.com/conda-forge/miniforge):

The only required non-Python dependency is [Foldseek](https://github.com/steineggerlab/foldseek). To install `phold` in a conda environment using [mamba](https://github.com/conda-forge/miniforge):
```bash
mamba create -n pholdENV -c conda-forge -c bioconda phold
```

To utilise `phold` with GPU, a GPU compatible version of `pytorch` must be installed. By default conda/mamba will install a CPU-only version.

Therefore, if you have an NVIDIA GPU, please try:

```bash
mamba create -n pholdENV -c conda-forge -c bioconda phold pytorch=*=cuda*
```

## Pip

You can also install the `phold` using pip.

```bash
pip install phold
```

You will need to have [Foldseek](https://github.com/steineggerlab/foldseek) installed and available in the $PATH.

## Source

You can install the latest version of `phold` with potentially untested and unreleased changes into a conda environment using [mamba](https://github.com/conda-forge/miniforge) as follows:

```bash
mamba create -n pholdENV pip foldseek python=3.11
conda activate pholdENV
git clone https://github.com/gbouras13/phold.git
Expand All @@ -18,29 +42,29 @@ pip install -e .

To utilise `phold` with GPU, a GPU compatible version of `pytorch` must be installed.

If it is not automatically installed via the pip/conda installation, please see [this link](https://pytorch.org) for more instructions on how to install `pytorch`.
If it is not automatically installed via the installation methods above, please see [this link](https://pytorch.org) for more instructions on how to install `pytorch`.

If you have an older version of the CUDA driver installed on your NVIDIA GPU, then you might find [this link useful](https://pytorch.org/get-started/previous-versions/).

Phold has been tested on NVIDIA GPUs (A100, RTX4090) and AMD GPUs (Radeon).

Installation on AMD GPUs requires `torch` compatible with rocm e.g.
Installation on AMD GPUs requires a version of `torch` compatible with rocm e.g.

```
```bash
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7
```

# Database Installation

To download and install the `phold` database

```
```bash
phold install
```

If you would like to specify a particular location for the database, please use `-d`
If you would like to specify a particular location for the database (e.g. if you use `phold` on a shared server), please use `-d`

```
```bash
phold install -d <path/to/databse_dir>
```

Expand All @@ -60,35 +84,36 @@ Please follow the instructions at the links to install based on your computer ar

After your installation is complete, you should add the following channels to your conda configuration:

```
```bash
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
```

We would recommend installing `phold` into a fresh environment. Assuming you installed miniforge, to create a environment called `pholdENV` with `phold` installed:
We would recommend installing `phold` into a fresh environment. Assuming you installed miniforge, to create an environment called `pholdENV` with `phold` installed (assuming you have an NVIDIA GPU):

* To create a conda environment called `pholdENV` with foldseek installed

```
conda create -n pholdENV foldseek pip
```bash
mamba create -n pholdENV -c conda-forge -c bioconda phold pytorch=*=cuda*
```

* To activate the environment
If you don't have a GPU:

```
conda activate pholdENV
```bash
mamba create -n pholdENV -c conda-forge -c bioconda phold
```

* To install `phold`
Then activate the environment

```
pip install phold
```bash
conda activate pholdENV
```

* Once that has finished downloading and installing, you can check installation worked using:
You can then check installation worked and download the `phold` databases:

```
```bash
phold -h
phold install
```
```

See the [tutorial](https://phold.readthedocs.io/en/latest/tutorial/) for more information on how to run `phold`.
17 changes: 8 additions & 9 deletions docs/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,21 +29,18 @@ conda deactivate

## Step 3 Installing `phold`

* To install phold from source, replace the pip step with `pip install -e .`
* `phold` should work with Python v3.8-3.11. The below uses 3.11
* To install `phold` with mamba from bioconda (assuming you have an NVIDIA GPU available):

```bash
mamba create -n pholdENV foldseek pip python=3.11
mamba create -n pholdENV -c conda-forge -c bioconda phold pytorch=*=cuda*
conda activate pholdENV
pip install phold
phold install
```

## Step 4 Running `phold`

* If you skipped step 2, replace `NC_043029_pharokka_output/pharokka.gbk` with `tests/test_data/NC_043029.fasta`
* If you do not have a GPU available, remove `pytorch=*=cuda*`
* For more installation options, see the [installation documentation](https://phold.readthedocs.io/en/latest/install/).

* If you have a GPU available:
## Step 4 Running `phold`

```bash
phold run -i NC_043029_pharokka_output/pharokka.gbk -o NC_043029_phold_output -t 8 -p NC_043029
Expand All @@ -55,10 +52,12 @@ phold run -i NC_043029_pharokka_output/pharokka.gbk -o NC_043029_phold_output -t
phold run -i NC_043029_pharokka_output/pharokka.gbk -o NC_043029_phold_output -t 8 -p NC_043029 --cpu
```

* If you skipped step 2, replace `NC_043029_pharokka_output/pharokka.gbk` with `tests/test_data/NC_043029.fasta`

## Step 5 Running `phold plot`

* `phold` can generate Circos plot of your phage(s)
* The plot will be saves in the `NC_043029_phold_plots` directory. See the [documentation](https://phold.readthedocs.io/en/latest/run/#phold-plot) for more parameter details
* The plot will be saved in the `NC_043029_phold_plots` directory. See the [documentation](https://phold.readthedocs.io/en/latest/run/#phold-plot) for more parameter details
* `phold plot` provides .png and .svg outputs

```bash
Expand Down
3 changes: 0 additions & 3 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,6 @@ dependencies:
- pytorch >=2.1.2
- numpy >=1.20
- pycirclize >=0.3.1
- just
- poetry
- ripgrep



Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ requires = ["setuptools>=61.0", "wheel>=0.37.1"]
[project]
# https://packaging.python.org/en/latest/specifications/declaring-project-metadata/
name = "phold"
version = "0.1.1" # change VERSION too
version = "0.1.2" # change VERSION too
description = "Phage Annotations using Protein Structures"
readme = "README.md"
requires-python = ">=3.8, <3.12"
Expand Down
Loading

0 comments on commit d017ef9

Please sign in to comment.