Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest alignment to Pathogen repo guide and add frequency panel #88

Merged
merged 10 commits into from
Jan 4, 2025
1 change: 1 addition & 0 deletions phylogenetic/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ wildcard_constraints:
rule all:
input:
auspice_json = expand("auspice/dengue_{serotype}_{gene}.json", serotype=serotypes, gene=genes),
tip_frequencies_json= expand("auspice/dengue_{serotype}_{gene}_tip-frequencies.json", serotype=serotypes, gene=genes)

include: "rules/prepare_sequences.smk"
include: "rules/prepare_sequences_E.smk"
Expand Down
6 changes: 6 additions & 0 deletions phylogenetic/defaults/config_dengue.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,5 +38,11 @@ clades:
denv3: 'defaults/clades_genotypes.tsv'
denv4: 'defaults/clades_genotypes.tsv'

tip_frequencies:
min_date: "1980-01-01"
max_date: "6M"
narrow_bandwidth: 0.2
wide_bandwidth: 0.6

export:
description: "defaults/description.md"
35 changes: 35 additions & 0 deletions phylogenetic/rules/export.smk
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,12 @@ rule prepare_auspice_config:
"region",
"author"
],
"panels": [
"tree",
"map",
"entropy",
"frequencies"
],
"metadata_columns": [
"accession",
"url"
Expand Down Expand Up @@ -179,3 +185,32 @@ rule final_strain_name:
--display-strain-name {params.display_strain_field} \
--output {output.auspice_json}
"""

rule tip_frequencies:
"""
Estimating KDE frequencies for tips
"""
input:
tree = "results/{gene}/tree_{serotype}.nwk",
metadata = "data/metadata_{serotype}.tsv",
output:
tip_freq = "auspice/dengue_{serotype}_{gene}_tip-frequencies.json"
params:
strain_id = config["strain_id_field"],
Copy link
Contributor

@joverlee521 joverlee521 Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely certain what is causing the empty regions

I was curious what was going on here, so I took a look. There is mismatch in the strain id used in the main JSON and the frequencies JSON so Auspice cannot link the strains to their frequencies.

Since the main JSON is getting it's strain name set to the display_strain_field in final_strain_name, the frequencies JSON needs to use the same field here:

Suggested change
strain_id = config["strain_id_field"],
strain_id = config["display_strain_field"],

Copy link
Contributor

@joverlee521 joverlee521 Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the underlying issue was correct, but the suggestion caused CI to fail. This because the newick tree uses the accession as id so it cannot be matched with the metadata strain.

Separately discussed with @j23414 to remove set_final_strain_name.py as suggested in nextstrain/public#5 so that both use the accession as the strain id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just logging that @joverlee521 and I chatted that augur frequencies takes tree.json file, which is ID'd on accession to match against the metadata.tsv file. Ergo, it might be time to remove set_final_strain_name.py so that both the main JSON and the frequencies JSON use the accession as the strain id.

This falls inline with nextstrain/public#5, see nextstrain/zika#72 as an example

I'll make the changes and re-run tests, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice debugging! Whenever things look strange in Auspice I'd suggest first checking the console logs which often contain useful information (i.e. the browser's developer console when viewing Auspice). In this case it had hundreds of warnings logged - perhaps Jover did just this?

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good to know that Auspice warns about this! I didn't look at the console logs, I was just inspecting the frequencies JSON and noticed the strain names were all accessions.

min_date = config["tip_frequencies"]["min_date"],
max_date = config["tip_frequencies"]["max_date"],
narrow_bandwidth = config["tip_frequencies"]["narrow_bandwidth"],
wide_bandwidth = config["tip_frequencies"]["wide_bandwidth"]
shell:
r"""
augur frequencies \
--method kde \
--tree {input.tree} \
--metadata {input.metadata} \
--metadata-id-columns {params.strain_id} \
--min-date {params.min_date} \
--max-date {params.max_date} \
--narrow-bandwidth {params.narrow_bandwidth} \
--wide-bandwidth {params.wide_bandwidth} \
--output {output.tip_freq}
"""
Loading