The phruta
R package is designed to simplify the basic phylogenetic pipeline. All the code is run within the same program and data from intermediate steps are saved in independent folders (optional). phruta
retrieves gene sequences, combines newly downloaded to local gene sequences, performs sequence alignments, and basic phylogenetic inference.
The main functions in the phruta
R package allow for a quick mining and curation of GenBank sequences. This package is designed for students and researchers interested in generating species-level genetic datasets for particular sets of taxa. Specifically, if you have a clade or group of species in mind, phruta
will help you to assemble a molecular dataset with information available in GenBank.
phruta
simplifies the phylogenetic pipeline, increases reproducibility, and helps organizing information used to infer molecular phylogenies.
phruta
has two core functions. The main applications of these functions is briefly outlined below:
-
sq.retrieve.direct()
andsq.retrieve.indirect()
: These functions downloads sequences from genbank (nucleotide database) for particular taxa (taxonomic groups or particular species) and a list of genes. -
sq.curate()
: After sequences are downloaded from genbank, this function curates sequences within each of the examined genes by detecting sequence outliers and by using taxonomic information.
In addition to these two main functions, users will be able to align the downloaded sequences, infer phylogenetic trees, and calibrate phylogenies using additional functions in phruta
.
phruta
is currently only available through GitHub
. It can be easily installed using the following code.
library(devtools)
install_github("ropensci/phruta")
Alternatively, you can install phruta
using:
install.packages("phruta", repos = "https://ropensci.r-universe.dev")
Please make sure that the R
packages msa
, DECIPHER
, Biostrings
, and odseq
are correctly installed. If you are interested in using the development version of phruta
, please install it using the following code:
library(devtools)
install_github("ropensci/phruta", ref = "dev")
I have constructed a shiny app that hosts phruta
and enables users to run the basic functions in a less-code intensive environment. The app, salphycon
is currently available in the following GitHub repo. The shiny app will be live at some point in 2023.
In MacOS
, RAxML
can be easily installed to the PATH
using one of the two lines below in conda
:
conda install -c bioconda/label/cf201901 raxml
conda install -c bioconda raxml
For other OS
(Windows, Linux), please follow the instructions listed in the official RAxML
website
Once RAxML
has been installed to your computer, open R
and make sure that the following line doesn't throw an error.
system("raxmlHPC")
Depending on how RAxML
was installed, you may want to check if RAxML
is called from the terminal using raxmlHPC
or raxmlHPC
. This string needs to be passed to tree.raxml
using the argument raxml_exec
. Please note that this argument corresponds to the exec
argument in ips::raxml
.
Finally, note that RStudio
sometimes has issues finding stuff in the path while using system()
. If you're using macOS
, try starting RStudio
from the command line by running the following line:
open /Applications/RStudio.app
VS code does not suffer of the same issues. In other OS, it might be better to simply avoid using RStudio
if you're interested in running the phylogenetic functions in phruta
.
There are excellent guides for installing PATHd-8
and treePL
. Here, I summarize two potentially relevant options.
First, you can use Brian O'Meara's approach for installing PATHd-8
in MacOs and linux. I summarize the code in the following link. For Windows users, please use the compiled version of the software provided in the following link.
Second, you can use homebrew to install treePL
(Windows, MacOS, and Linux), thanks to Jonathan Chang.
brew install brewsci/bio/treepl
Please check the following link) if you're interested in running brew
from Windows and Linux.
Only if you're interested in running phylogenetic analyses, please make sure you open RStudio
using the following code from the terminal:
open /Applications/RStudio.app
My package is dedicated to my mom. I still have lots of things to learn from you. You will always have all my admiration. The logo features a Palenquera in Cartagena (Colombia). For many folks, Palenqueras are just the Black woman ones who sell fruits in particular Colombian turistic areas. However, palenqueras and Palenque are central to Black identity in Colombia, Latin America, and across the America: "Palenque was the first free African town in the Americas"](https://en.wikipedia.org/wiki/San_Basilio_de_Palenque).
Fruta is the Spanish word for Fruit. English ph sounds the same as F in Spanish. In phruta
, ph is relative to phylogenetics. I pronounce phruta
just as fruta in Spanish.
More details about the functions implemented in phruta
can be found in the different vignettes associated with the package or in our website.
Similar functionalities for assembling curated molecular datasets for phylogenetic analyses can be found in phylotaR
and SuperCRUNCH. However, note that phylotaR
is limited to downloading and curating sequences (e.g. doesn't align sequences). Similarly, SuperCRUNCH
only curates local sequences. phruta
is closer to the SUPERSMART
and its "new" associated R
workflow SUPERSMARTR
. However, most of the applications in the different packages that are part of SUPERSMARTR
are simplified in phruta
.
Please see our contributing guide.
Please see the package DESCRIPTION for package authors.
Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.