Skip to content
This repository has been archived by the owner on Sep 7, 2023. It is now read-only.

Commit

Permalink
Fix clean code (#64)
Browse files Browse the repository at this point in the history
  • Loading branch information
moshi4 authored Oct 28, 2021
1 parent 2886048 commit 75f80e2
Show file tree
Hide file tree
Showing 13 changed files with 158 additions and 91 deletions.
3 changes: 3 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ on:
pull_request:
branches: main
paths: ["src/**", "tests/**", ".github/workflows/**"]
schedule:
# Scheduled Daily CI
- cron: "0 0 * * *"

jobs:
CI_black-flake8-pytest:
Expand Down
48 changes: 25 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# FastDTLmapper: Fast genome-wide DTL event mapper
# FastDTLmapper: Fast genome-wide DTL event mapper

![Python3](https://img.shields.io/badge/Language-Python_3.7_|_3.8_|_3.9-steelblue)
![OS](https://img.shields.io/badge/OS-Linux-steelblue)
Expand All @@ -24,7 +24,7 @@ driving adaptive evolution, but it remains largely unexplored.
Therefore, to investigate the relationship between gene gain/loss and adaptive evolution
in the evolutionary process of organisms, I developed a software pipeline **FastDTLmapper**
which automatically estimates and maps genome-wide gene gain/loss.
FastDTLmapper takes two inputs, 1. *Species tree (Newick format)* 2. *Genomic Protein CDSs (Fasta|Genbank format)*,
FastDTLmapper takes two inputs, 1. *Species tree (Newick format)* & 2. *Genomic Protein CDSs (Fasta|Genbank format)*,
and performs genome-wide mapping of DTL(Duplication-Transfer-Loss) events by
DTL reconciliation of species tree and gene trees.
Additionally, FastDTLmapper can perform
Expand Down Expand Up @@ -162,25 +162,25 @@ This is brief description of analysis pipeline. See [wiki](https://github.com/mo
--dup_cost Duplication event cost (Default: 2)
--los_cost Loss event cost (Default: 1)
--trn_cost Transfer event cost (Default: 3)
--inflation OrthoFinder MCL inflation parameter (Default: 3.0)
--inflation OrthoFinder MCL inflation parameter (Default: 1.5)
--timetree Use species tree as timetree in AnGST (Default: off)
--rseed Number of random seed (Default: 0)

#### Timetree Option
- **Timetree Option**

If user set this option, input species tree must be ultrametric tree.
--timetree enable AnGST timetree option below (See [AnGST manual](<https://github.com/almlab/angst/blob/master/doc/manual.pdf>) for details).
> If the branch lengths on the provided species tree represent times,
> AnGST can restrict the set of possible inferred gene transfers to
> only those between contemporaneous lineages
If user set this option, input species tree must be ultrametric tree.
--timetree enable AnGST timetree option below (See [AnGST manual](<https://github.com/almlab/angst/blob/master/doc/manual.pdf>) for details).
> If the branch lengths on the provided species tree represent times,
> AnGST can restrict the set of possible inferred gene transfers to
> only those between contemporaneous lineages
#### Input Limitation
- **Input Limitation**

fasta or genbank files (--indir option)
>:warning: Following characters cannot be included in file name '_', '-', '|', '.'
fasta or genbank files (--indir option)
>:warning: Following characters cannot be included in file name '_', '-', '|', '.'
species tree file (--tree option)
>:warning: Species name in species tree must match fasta or genbank file name
species tree file (--tree option)
>:warning: Species name in species tree must match fasta or genbank file name
### Example Command

Expand All @@ -190,23 +190,23 @@ Download example dataset:

This dataset is identical to [example](https://github.com/moshi4/FastDTLmapper/tree/main/example) in this repository.

#### 1. Minimum test dataset
- **Minimum test dataset**

7 species, 100 CDS limited fasta dataset
7 species, 100 CDS limited fasta dataset

FastDTLmapper -i example/minimum_dataset/fasta/ -t example/minimum_dataset/species_tree.nwk -o output_minimum
FastDTLmapper -i example/minimum_dataset/fasta/ -t example/minimum_dataset/species_tree.nwk -o output_minimum

#### 2. Mycoplasma dataset (Input Format = Fasta)
- **Mycoplasma dataset (Input Format = Fasta)**

7 Mycoplasma species, 500 ~ 1000 CDS fasta dataset
7 Mycoplasma species, 500 ~ 1000 CDS fasta dataset

FastDTLmapper -i example/mycoplasma_dataset/fasta/ -t example/mycoplasma_dataset/species_tree.nwk -o output_mycoplasma_fasta
FastDTLmapper -i example/mycoplasma_dataset/fasta/ -t example/mycoplasma_dataset/species_tree.nwk -o output_mycoplasma_fasta

#### 3. Mycoplasma dataset (Input Format = Genbank)
- **Mycoplasma dataset (Input Format = Genbank)**

7 Mycoplasma species, 500 ~ 1000 CDS genbank dataset
7 Mycoplasma species, 500 ~ 1000 CDS genbank dataset

FastDTLmapper -i example/mycoplasma_dataset/genbank/ -t example/mycoplasma_dataset/species_tree.nwk -o output_mycoplasma_genbank
FastDTLmapper -i example/mycoplasma_dataset/genbank/ -t example/mycoplasma_dataset/species_tree.nwk -o output_mycoplasma_genbank

## Output Contents

Expand Down Expand Up @@ -256,6 +256,8 @@ This dataset is identical to [example](https://github.com/moshi4/FastDTLmapper/t
├── parallel_cmds/ -- Parallel run command log results
└── run_config.log -- Program run config log file

See [wiki](https://github.com/moshi4/FastDTLmapper/wiki/1.2.-Output-Contents-(FastDTLmapper)) for output files details.

## Further Analysis

### Plot Gain/Loss Map Figure
Expand Down
21 changes: 17 additions & 4 deletions src/fastdtlmapper/args.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import sys
import time
from dataclasses import dataclass
from enum import IntEnum, auto
from pathlib import Path
from typing import List, Optional, Union

Expand Down Expand Up @@ -66,6 +67,18 @@ def unixtime_to_datestr(unixtime: float) -> str:
return log_text


class RestartFrom(IntEnum):
"""RestartFrom Enum Class"""

ORTHO_FINDER = auto()
MAFFT = auto()
TRIMAL = auto()
IQTREE = auto()
TREERECS = auto()
ANGST = auto()
AGG_MAP = auto()


def get_args(argv: Optional[List[str]] = None) -> Args:
"""Get arguments
Expand Down Expand Up @@ -135,7 +148,7 @@ def get_args(argv: Optional[List[str]] = None) -> Args:
default=default_trn_cost,
metavar="",
)
default_inflation = 3.0
default_inflation = 1.5
parser.add_argument(
"--inflation",
type=float,
Expand All @@ -145,7 +158,7 @@ def get_args(argv: Optional[List[str]] = None) -> Args:
)
parser.add_argument(
"--timetree",
help="Use species tree as timetree",
help="Use species tree as timetree in AnGST (Default: off)",
action="store_true",
)
default_rseed = 0
Expand All @@ -163,8 +176,8 @@ def get_args(argv: Optional[List[str]] = None) -> Args:
"--restart_from",
type=int,
help=argparse.SUPPRESS,
default=1,
choices=[1, 2, 3, 4, 5, 6, 7],
default=RestartFrom.ORTHO_FINDER,
choices=[rf.value for rf in RestartFrom],
)

args = parser.parse_args(argv)
Expand Down
71 changes: 33 additions & 38 deletions src/fastdtlmapper/goea/goea.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,41 +58,44 @@ def run(self, output_prefix: Path) -> List[Path]:
def plot(
self,
goea_result_file: Path,
output_prefix: Path,
plot_outfile: Path,
over_or_under: str,
title: str = "",
) -> None:
"""Plot GOEA significant GOterms
Args:
goea_result_file (Path): GOEA result file path
output_prefix (Path): Output files prefix path
plot_outfile (Path): Output file path
over_or_under (str): "over" or "under"
title (str): Plot title
"""
for goea_type in ("over", "under"):
# Extract goterm & pvalue
goterm2pvalue = self._extract_goterm2pvalue(goea_result_file, goea_type)
if len(goterm2pvalue) == 0:
continue
if over_or_under not in ("over", "under"):
raise ValueError("goea_type must be 'over' or 'under'")

# Plot color setting
goterm2hexcolor = {}
if self.plot_color:
# Set specified plot color
for goterm in goterm2pvalue.keys():
goterm2hexcolor[goterm] = self.plot_color
else:
# Get hexcolor from pvalue for color plot
pvalue_abs_log10_list = [
abs(math.log10(v)) for v in goterm2pvalue.values()
]
pvalue_hexcolor_list = self._convert_hexcolor_gradient(
pvalue_abs_log10_list
)
# Set yellow to red gradient plot color
for goterm, hexcolor in zip(goterm2pvalue.keys(), pvalue_hexcolor_list):
goterm2hexcolor[goterm] = hexcolor

# Plot GOterm with gradient color
plot_outfile = Path(f"{output_prefix}_{goea_type}.{self.plot_format}")
self._color_plot(plot_outfile, goterm2hexcolor, goterm2pvalue)
# Extract goterm & pvalue
goterm2pvalue = self._extract_goterm2pvalue(goea_result_file, over_or_under)
if len(goterm2pvalue) == 0:
return

# Plot color setting
goterm2hexcolor = {}
if self.plot_color:
# Set specified plot color
for goterm in goterm2pvalue.keys():
goterm2hexcolor[goterm] = self.plot_color
else:
# Get hexcolor from pvalue for color plot
pvalue_abs_log10_list = [abs(math.log10(v)) for v in goterm2pvalue.values()]
pvalue_hexcolor_list = self._convert_hexcolor_gradient(
pvalue_abs_log10_list
)
# Set yellow to red gradient plot color
for goterm, hexcolor in zip(goterm2pvalue.keys(), pvalue_hexcolor_list):
goterm2hexcolor[goterm] = hexcolor

# Plot GOterm with gradient color
self._color_plot(plot_outfile, goterm2hexcolor, goterm2pvalue, title)

def _extract_goterm2pvalue(
self,
Expand Down Expand Up @@ -181,13 +184,15 @@ def _color_plot(
plot_outfile: Union[str, Path],
goid2color: Dict[str, str],
goid2pvalue: Dict[str, float] = {},
title: str = "",
) -> None:
"""Plot GO DAG using self-defined GO color
Args:
plot_outfile (str): Output plot file path
goid2color (Dict[str, str]): go id and hexcolor dict
goid2pvalue (Dict[str, float], optional): go id and pvalue dict
title (str): Plot title
"""
# Get plot target GO DAG
obodag = GODag(self.obo_file)
Expand All @@ -214,16 +219,6 @@ def _color_plot(
godag_plg_vars = GODagPltVars()
godag_plg_vars.fmthdr = "{GO}"

# Plot title
title = Path(plot_outfile).with_suffix("").name.replace("_", " ")
title += " representation\n"
title += f"Top{self.plot_max_num} GOterm "
if self.use_adjusted_pvalue:
title += f"(BH adjusted P-value < {self.pvalue_thr})"
else:
title += f"(P-value < {self.pvalue_thr})"
title = f"\n{title}\n"

# Create plot obj & add plot color
godagplot = GODagSmallPlot(
godagsmall, abodag=obodag, GODagPltVars=godag_plg_vars, title=title
Expand Down
2 changes: 1 addition & 1 deletion src/fastdtlmapper/out_path.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ def __post_init__(self):
self.result_summary_dir = self.goea_dir / "result_summary"
self.result_summary_plot_dir = self.result_summary_dir / "significant_go_plot"

self.obo_file = self.goea_dir / "go-basic.obo"
self.obo_file = self.go_enrichment_dir / "go-basic.obo"
self.og2go_association_file = self.go_enrichment_dir / "og2go_association.txt"
self.significant_go_list_file = (
self.result_summary_dir / "significant_go_list.tsv"
Expand Down
16 changes: 15 additions & 1 deletion src/fastdtlmapper/scripts/FastDTLgoea.py
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,21 @@ def run_goatools_goea(
goea_result_file_list = goea.run(output_prefix)
# Plot GOEA significant GOterms
for goea_result_file in goea_result_file_list:
goea.plot(goea_result_file, goea_result_file.with_suffix(""))
for over_or_under in ("over", "under"):
go_category = str(goea_result_file.with_suffix("")).split("_")[-1]
# Define title
title = f"{node_id} {gain_or_loss} {over_or_under} representation\n"
title += f"Top{plot_max_num} {go_category} GOterm "
if use_adjusted_pvalue:
title += f"(BH adjusted P-value < {pvalue_thr})"
else:
title += f"(P-value < {pvalue_thr})"
title = f"\n{title}\n"

plot_outfile = Path(
f"{output_prefix}_{over_or_under}_{go_category}.{plot_format}"
)
goea.plot(goea_result_file, plot_outfile, over_or_under, title)


if __name__ == "__main__":
Expand Down
Loading

0 comments on commit 75f80e2

Please sign in to comment.