Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add order argument to TreeSequence.nodes #2370

Closed
hyanwong opened this issue Jun 29, 2022 · 8 comments · Fixed by #2471
Closed

Add order argument to TreeSequence.nodes #2370

hyanwong opened this issue Jun 29, 2022 · 8 comments · Fixed by #2471
Labels
Python API Issue is about the Python API

Comments

@hyanwong
Copy link
Member

hyanwong commented Jun 29, 2022

I was just after getting the oldest node(s) in the tree sequence, which I did by node_times = ts.tables.nodes.time; oldest = np.flatnonzero(node_times == np.max(node_times)) but beginners might not know how to do this sort of thing, which involves knowing about tables. Is it worth having a few wrapper functions for things like this (although I don't know where they would belong)?

@hyanwong
Copy link
Member Author

Same for youngest nodes too, I guess. I don't know if returning an iterator over node objects would be nicer, e.g.

ts.nodes(restrict="oldest")
ts.nodes(restrict="youngest")
ts.nodes(restrict=array_of_node_ids)

or something like that.

@benjeffery
Copy link
Member

There's a trade-off here between having useful methods and having a gigantic unwieldy API. Do you think this is a very common operation? My gut instinct here is that this is simple enough that it is a user operation.

@hyanwong
Copy link
Member Author

hyanwong commented Jun 30, 2022

Yes, I see that this is a bit of an edge case. One issue is that people might do

np.argmax(ts.tables.nodes.time)

which would only return the first of the oldest nodes (which is what I did before realising my mistake). The recommended way is (I guess, as above) a two-liner, which people are less likely to use:

node_times = ts.tables.nodes.time
np.flatnonzero(node_times == np.max(node_times))

Having that automatically available somehow would stop people making that mistake.

Implementing it by restricting the node iterator, as I suggested above, would mean not adding another method, at the cost of increasing the calling parameters of ts.nodes(). I don't know what the best trade off is here, or indeed if (as you say), it's too much code bloat.

@hyanwong
Copy link
Member Author

hyanwong commented Jun 30, 2022

Another point is that mostly people expect the node table to have the nodes in time order (because that's what is output by msprime) but this is not guaranteed. So having a documented method to iterate over nodes in time order would be a way to ensure that this was enforced in a routine. Something like the following doesn't seem too bad an idea (although the exact semantics could probably be improved):

ts.nodes(time_asc=True)  # "None" for arbitrary (ts) order, False for descending

Then the oldest node is next(ts.nodes(time_asc=False)) and the youngest is next(ts.nodes(time_asc=True)). Or something like that anyway.

I have seen a fair bit of code that assumes that the samples are the first N nodes, and that the oldest node is the last one, so it might be useful to have a method that can be used in the examples without having to jump through numpy hoops.

@jeromekelleher
Copy link
Member

Adding an order argument similar to the Tree.nodes() method would solve the majority of this (ts.nodes(order="timedesc")) I think. The order defaults to increasing node ID.

Getting the k nodes that are exactly as old as each other is very niche I think and not worth a library function.

@hyanwong
Copy link
Member Author

Adding an order argument similar to the Tree.nodes() method would solve the majority of this (ts.nodes(order="timedesc")) I think. The order defaults to increasing node ID.

Yes, I think this is nice and general. 👍

@jeromekelleher jeromekelleher changed the title Helper functions for oldest nodes, etc. Add order argument to TreeSequence.nodes Jun 30, 2022
@jeromekelleher
Copy link
Member

Changed issue title.

def nodes(self, *, order=None):
     order = "id" if order is None else order
     if order not in ["id", "timedesc", "timeasc"]:
          raise ValueError(...)
     # do stuff

@jeromekelleher jeromekelleher added the Python API Issue is about the Python API label Jun 30, 2022
@benjeffery benjeffery added this to the Python upcoming milestone Jun 30, 2022
@hyanwong
Copy link
Member Author

hyanwong commented Aug 23, 2022

In #2470 I noted that we want to break ties by using the node ID (i.e. np.lexsort((np.arange(ts.tables.nodes.num_rows), ts.tables.nodes.time))). In particular, we should guarantee that order="timeasc" gives the same order of nodes as used in edges.parent (although some nodes, such as leaf nodes will be missing from edges.parent). This would then provide consistency for ARG traversal methods (see #2469)

hyanwong added a commit to hyanwong/tskit that referenced this issue Aug 23, 2022
hyanwong added a commit to hyanwong/tskit that referenced this issue Aug 24, 2022
hyanwong added a commit to hyanwong/tskit that referenced this issue Aug 24, 2022
hyanwong added a commit to hyanwong/tskit that referenced this issue Aug 24, 2022
AdminBot-tskit pushed a commit to hyanwong/tskit that referenced this issue Aug 26, 2022
hyanwong added a commit to hyanwong/tskit that referenced this issue Aug 26, 2022
hyanwong added a commit to hyanwong/tskit that referenced this issue Aug 26, 2022
hyanwong added a commit to hyanwong/tskit that referenced this issue Aug 26, 2022
hyanwong added a commit to hyanwong/tskit that referenced this issue Aug 26, 2022
hyanwong added a commit to hyanwong/tskit that referenced this issue Sep 5, 2022
AdminBot-tskit pushed a commit to hyanwong/tskit that referenced this issue Sep 5, 2022
@mergify mergify bot closed this as completed in #2471 Sep 5, 2022
mergify bot pushed a commit that referenced this issue Sep 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Python API Issue is about the Python API
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants