Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add num_children to Tree arrays #2274

Closed
jeromekelleher opened this issue May 15, 2022 · 5 comments
Closed

Add num_children to Tree arrays #2274

jeromekelleher opened this issue May 15, 2022 · 5 comments
Assignees
Labels
C API Issue is about the C API enhancement New feature or request Python API Issue is about the Python API
Milestone

Comments

@jeromekelleher
Copy link
Member

jeromekelleher commented May 15, 2022

There are lots of times when we want to ask questions about the arity (number of children) of the nodes in a tree. In particular, many of the operations discussed in #2245 required this as input. While it's easy enough to compute in a linear pass through the nodes in the tree, it's also something we can maintain during the tree transition process very cheaply. The basic algorithm just needs to increment/decrement a count for the parent node:

ts = msprime.sim_ancestry(10, random_seed=2, sequence_length=100, recombination_rate=0.1)

num_children = np.zeros(ts.num_nodes, dtype=int)

for tree, (_, edges_out, edges_in) in zip(ts.trees(), ts.edge_diffs()):
    for e in edges_out:
        num_children[e.parent] -= 1
    for e in edges_in:
        num_children[e.parent] += 1

    tree_num_children = [tree.num_children(u) for u in range(ts.num_nodes)]
    assert tree_num_children == list(num_children)

So, the computational cost would essentially be free, and it's just a question about whether we want to add the extra O(num_nodes) space cost to the Tree struct by default or not.

My feeling is that this is quite a basic thing and it'll be annoying to have to specify a Tree option if you want to do things like compute stats based on node arity, so it's probably worth the extra bit of memory.

If we make it a standard part of the tree building process, then we can export it as a numpy array to Python (like the other tree arrays) and then things like computing the arity histogram, is_binary etc can be done efficiently in Python.

@jeromekelleher jeromekelleher added enhancement New feature or request C API Issue is about the C API Python API Issue is about the Python API labels May 15, 2022
@jeromekelleher
Copy link
Member Author

Another advantage of this is that we'd be able to efficiently count the number of roots, since num_children[tree.virtual_root] should be tree.num_roots.

@hyanwong
Copy link
Member

I definitely think this is worth it for the storage cost.

@andrewkern
Copy link
Member

+1

i came across a use case for this just earlier today.

@benjeffery benjeffery added this to the Python 0.5.1 (paper) milestone May 27, 2022
@benjeffery
Copy link
Member

I've added this to the release after the imminent one, there are quite a few things on it already but we can prioritise after 0.5.0.

@jeromekelleher
Copy link
Member Author

Closed in #2316

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C API Issue is about the C API enhancement New feature or request Python API Issue is about the Python API
Projects
None yet
Development

No branches or pull requests

5 participants