Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polytomy collapsing #3011

Closed
hyanwong opened this issue Oct 7, 2024 · 5 comments
Closed

Polytomy collapsing #3011

hyanwong opened this issue Oct 7, 2024 · 5 comments

Comments

@hyanwong
Copy link
Member

hyanwong commented Oct 7, 2024

Here's a quick demo of the visual scheme I have come up with for condensing trees with polytomies, so we show only the lineages relating to a set of tracked samples (tips in cyan). Such samples might represent (say) a geographical region, or a covid Pango lineage. Here's an example, followed by the suggested scheme:

Screenshot 2024-10-07 at 19 07 59

Condensed:

Screenshot 2024-10-07 at 23 29 23

Two things are going on here:

  1. nodes at the top of a clade of entirely untracked samples (here node 36) are collapsed with a. triangle showing the number of samples underneath as "+n" (as in Allow multi-sample tips for Tree.draw_svg() #3010).
  2. where there are 2 or more lineages containing entirely untracked nodes that are part of a polytomy, that polytomy is collapsed into a dotted line (followed by "+n/m" where n is the number of samples and m is the number of additional branches in the polytomy)

Optionally (3rd plot), we can also collapse nodes that consist of entirely tracked samples (here node 39) into a triangle/trapezium:

Screenshot 2024-10-07 at 23 31 24

Does this look like a reasonable approach? I'm not sold on the "+n/m" notation but it was the most succinct/consistent that I could come up with.

@hyanwong
Copy link
Member Author

hyanwong commented Oct 7, 2024

Here's the viz run on a random covid pangolin lineage:

Screenshot 2024-10-08 at 00 16 00

Once we have defined a postorder_minlex_tracked_node_traversal, this is produced using e.g.

pango = "B.1.1.70"
tracked_nodes = ti.pango_lineage_samples[pango]
tree = ts.first(tracked_samples=tracked_nodes)
order = list(postorder_minlex_tracked_node_traversal(tree, collapse_tracked=False))
print(len(order), f"nodes in subtree. Nodes in magenta are {pango}")
tree.draw_svg(
    time_scale="rank",
    order=order,
    size=(1000, 800),
    node_labels={u: ts.node(u).metadata.get("Viridian_pangolin", "") for u in order if u not in tracked_nodes},
    mutation_labels={},
    symbol_size=4,
    summarise_untracked_polytomies=True,
    style=(
        "".join(f".n{u} > .sym {{fill: magenta}}" for u in tracked_nodes + [39]) +
        ".lab.summary {font-size: 9px}" + 
        ".polytomy {font-size: 10px}"
    ),
)

@hyanwong
Copy link
Member Author

hyanwong commented Oct 7, 2024

And here's a path to a pango lineage represented by a single sample:

Screenshot 2024-10-08 at 00 39 57

@jeromekelleher
Copy link
Member

Looks great Yan!

@hyanwong
Copy link
Member Author

hyanwong commented Oct 8, 2024

Looks great Yan!

Great, thanks. I'll work it into a PR.

@hyanwong
Copy link
Member Author

hyanwong commented Oct 8, 2024

The main issue to which there is no easy solution is when we have a huge polytomy of (say) 1000 lineages, 999 of which are lineages containing entirely (or mostly) focal (tracked) samples, and one of which is not. We can't visually collapse parts of such a polytomy in an meaningful way: either we collapse the whole thing, or we have to show all the focal lineages, as we don't know how they relate to each other. For example, here's the top of the B.1.1.7 (alpha) lineage from a covid tree,
Screenshot 2024-10-08 at 15 51 27

I think this is an insoluble issue, so I'm happy to punt it down the line.

@hyanwong hyanwong closed this as completed Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants