Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev 0.7.6 - Notebook updates #285

Merged
merged 23 commits into from
Oct 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
8e7ec05
notebook: use heatmap to depict COG distribution
matinnuhamunada Sep 26, 2023
b068d5a
notebook: enrich deeptf with faa annotation
matinnuhamunada Sep 26, 2023
6f8117a
notebook: generate graphml file for cytoscape
matinnuhamunada Sep 27, 2023
3684896
fix: correct notebook links and display
matinnuhamunada Sep 27, 2023
403a349
feat: colorise bigscape class and add knownclusterblast
matinnuhamunada Sep 27, 2023
975ea78
fix: cleanup unused cell
matinnuhamunada Sep 27, 2023
e99f4d1
feat: extract ARTS 4 tables
matinnuhamunada Sep 29, 2023
07f7522
fix: correct new arts output format
matinnuhamunada Sep 29, 2023
0936781
fix: update rule for arts output and notebook
matinnuhamunada Sep 29, 2023
67a8dd9
test: update GTDB API result
matinnuhamunada Sep 29, 2023
75cdd38
test: update expected output for arts extract
matinnuhamunada Sep 29, 2023
d0c4cef
test: merge arts results
matinnuhamunada Sep 29, 2023
b261a0c
test: add missing expected duptable
matinnuhamunada Sep 29, 2023
cc24be6
test: add missing config and symlink
matinnuhamunada Sep 29, 2023
2989d76
test: add final step of arts
matinnuhamunada Sep 29, 2023
7bf99b8
test: add config
matinnuhamunada Sep 30, 2023
34ade74
feat: annotate bigfam models
matinnuhamunada Sep 30, 2023
c3e0474
fix: refrain using directory in params
matinnuhamunada Sep 30, 2023
aef96dc
fix: correct shell script
matinnuhamunada Sep 30, 2023
9e4de6d
chore: update java requirement for metabase
matinnuhamunada Oct 3, 2023
0c11c77
notebook: add instruction for cblaster-bgc
matinnuhamunada Oct 3, 2023
e057e48
chore: remove unused notebooks
matinnuhamunada Oct 3, 2023
d32336d
chore: bump version 0.7.6
matinnuhamunada Oct 3, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions .tests/unit/arts_allhits_combine/data/config/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# This file should contain everything to configure the workflow on a global scale.

#### PROJECT INFORMATION ####
# This section control your project configuration.
# Each project are separated by "-".
# A project can be defined as (1) a yaml object or (2) a Portable Encapsulated Project (PEP) file.
# (1) To define project as a yaml object, it must contain the variable "name" and "samples".
# - name : name of your project
# - samples : a csv file containing a list of genome ids for analysis with multiple sources mentioned. Genome ids must be unique.
# - rules: a yaml file containing project rule configurations. This will override global rule configuration.
# - prokka-db (optional): list of the custom accessions to use as prokka reference database.
# - gtdb-tax (optional): output summary file of GTDB-tk with "user_genome" and "classification" as the two minimum columns
# (2) To define project using PEP file, only variable "name" should be given that points to the location of the PEP yaml file.
# - pep: path to PEP .yaml file. See project example_pep for details.
# PS: the variable pep and name is an alias

projects:
# Project 1 (yaml object)
- name: config/lactobacillus_delbruecki/project_config.yaml

bgc_projects:
- pep: config/lanthipeptide/project_config.yaml

#### GLOBAL RULE CONFIGURATION ####
# This section configures the rules to run globally.
# Use project specific rule configurations if you want to run different rules for each projects.
# pipelines or rules: set value to TRUE if you want to run the analysis or FALSE if you don't
pipelines:
seqfu: FALSE
mash: FALSE
fastani: FALSE
checkm: FALSE
gtdbtk: FALSE
prokka-gbk: FALSE
antismash: TRUE
query-bigslice: FALSE
bigscape: FALSE
bigslice: FALSE
automlst-wrapper: FALSE
arts: FALSE
roary: FALSE
eggnog: FALSE
eggnog-roary: FALSE
deeptfactor: FALSE
deeptfactor-roary: FALSE
cblaster-genome: FALSE
cblaster-bgc: FALSE

#### RESOURCES CONFIGURATION ####
# resources : the location of the resources to run the rule.
# The default location is at "resources/{resource_name}".
resources_path:
antismash_db: resources/antismash_db
eggnog_db: resources/eggnog_db
BiG-SCAPE: resources/BiG-SCAPE
bigslice: resources/bigslice
checkm: resources/checkm
gtdbtk: resources/gtdbtk
#RNAmmer: resources/RNAmmer # If specified, will override Barnapp in Prokka

rule_parameters:
install_gtdbtk:
release: 214
release_version: 214
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: Lactobacillus_delbrueckii

pep_version: 2.1.0

description: "Lactobacillus delbrueckii 27 01 2023"

sample_table: samples.csv

#### RULE CONFIGURATION ####
# rules: set value to TRUE if you want to run the analysis or FALSE if you don't
rules:
seqfu: TRUE
mash: TRUE
fastani: TRUE
checkm: FALSE
gtdbtk: FALSE
prokka-gbk: TRUE
antismash: TRUE
query-bigslice: TRUE
bigscape: TRUE
bigslice: TRUE
automlst-wrapper: TRUE
arts: TRUE
roary: TRUE
eggnog: TRUE
eggnog-roary: TRUE
deeptfactor: TRUE
deeptfactor-roary: TRUE
cblaster-genome: TRUE
cblaster-bgc: TRUE
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
genome_id,source,organism,genus,species,strain,closest_placement_reference,input_file
GCA_000056065.1,ncbi,,,,,,
GCA_000182835.1,ncbi,,,,,,
GCA_000191165.1,ncbi,,,,,,
GCA_000014405.1,ncbi,,,,,,

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"GCA_000014405.1": {
"CP000412.1.region001.gbk": {
"record_id": "CP000412.1",
"original_id": "CP000412.1",
"target_path": "data/interim/antismash/7.0.0/GCA_000014405.1/CP000412.1.region001.gbk",
"symlink_path": "data/interim/bgcs/Lactobacillus_delbrueckii/7.0.0/GCA_000014405.1/CP000412.1.region001.gbk"
},
"CP000412.1.region002.gbk": {
"record_id": "CP000412.1",
"original_id": "CP000412.1",
"target_path": "data/interim/antismash/7.0.0/GCA_000014405.1/CP000412.1.region002.gbk",
"symlink_path": "data/interim/bgcs/Lactobacillus_delbrueckii/7.0.0/GCA_000014405.1/CP000412.1.region002.gbk"
},
"GCA_000014405.1.gbk": {
"record_id": "CP000412.1",
"original_id": "CP000412.1",
"target_path": "data/interim/antismash/7.0.0/GCA_000014405.1/GCA_000014405.1.gbk",
"symlink_path": "data/interim/bgcs/Lactobacillus_delbrueckii/7.0.0/GCA_000014405.1/GCA_000014405.1.gbk"
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{
"GCA_000056065.1": {
"CR954253.1.region001.gbk": {
"record_id": "CR954253.1",
"original_id": "CR954253.1",
"target_path": "data/interim/antismash/7.0.0/GCA_000056065.1/CR954253.1.region001.gbk",
"symlink_path": "data/interim/bgcs/Lactobacillus_delbrueckii/7.0.0/GCA_000056065.1/CR954253.1.region001.gbk"
},
"CR954253.1.region002.gbk": {
"record_id": "CR954253.1",
"original_id": "CR954253.1",
"target_path": "data/interim/antismash/7.0.0/GCA_000056065.1/CR954253.1.region002.gbk",
"symlink_path": "data/interim/bgcs/Lactobacillus_delbrueckii/7.0.0/GCA_000056065.1/CR954253.1.region002.gbk"
},
"CR954253.1.region003.gbk": {
"record_id": "CR954253.1",
"original_id": "CR954253.1",
"target_path": "data/interim/antismash/7.0.0/GCA_000056065.1/CR954253.1.region003.gbk",
"symlink_path": "data/interim/bgcs/Lactobacillus_delbrueckii/7.0.0/GCA_000056065.1/CR954253.1.region003.gbk"
},
"GCA_000056065.1.gbk": {
"record_id": "CR954253.1",
"original_id": "CR954253.1",
"target_path": "data/interim/antismash/7.0.0/GCA_000056065.1/GCA_000056065.1.gbk",
"symlink_path": "data/interim/bgcs/Lactobacillus_delbrueckii/7.0.0/GCA_000056065.1/GCA_000056065.1.gbk"
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"GCA_000182835.1": {
"GCA_000182835.1.gbk": {
"record_id": "CP002342.1",
"original_id": "CP002342.1",
"target_path": "data/interim/antismash/7.0.0/GCA_000182835.1/GCA_000182835.1.gbk",
"symlink_path": "data/interim/bgcs/Lactobacillus_delbrueckii/7.0.0/GCA_000182835.1/GCA_000182835.1.gbk"
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"GCA_000191165.1": {
"CP000156.1.region001.gbk": {
"record_id": "CP000156.1",
"original_id": "CP000156.1",
"target_path": "data/interim/antismash/7.0.0/GCA_000191165.1/CP000156.1.region001.gbk",
"symlink_path": "data/interim/bgcs/Lactobacillus_delbrueckii/7.0.0/GCA_000191165.1/CP000156.1.region001.gbk"
},
"CP000156.1.region002.gbk": {
"record_id": "CP000156.1",
"original_id": "CP000156.1",
"target_path": "data/interim/antismash/7.0.0/GCA_000191165.1/CP000156.1.region002.gbk",
"symlink_path": "data/interim/bgcs/Lactobacillus_delbrueckii/7.0.0/GCA_000191165.1/CP000156.1.region002.gbk"
},
"GCA_000191165.1.gbk": {
"record_id": "CP000156.1",
"original_id": "CP000156.1",
"target_path": "data/interim/antismash/7.0.0/GCA_000191165.1/GCA_000191165.1.gbk",
"symlink_path": "data/interim/bgcs/Lactobacillus_delbrueckii/7.0.0/GCA_000191165.1/GCA_000191165.1.gbk"
}
}
}
1 change: 1 addition & 0 deletions .tests/unit/arts_allhits_combine/data/workflow
Loading