Skip to content

Commit

Permalink
Document orderings
Browse files Browse the repository at this point in the history
Closes #1691
Closes #1706
  • Loading branch information
jeromekelleher authored and mergify-bot committed Oct 18, 2021
1 parent edbaaca commit a882a5d
Show file tree
Hide file tree
Showing 9 changed files with 237 additions and 185 deletions.
3 changes: 3 additions & 0 deletions c/CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@

- The previously deprecated option ``TSK_SAMPLE_COUNTS`` has been removed. (:user:`benjeffery`, :issue:`1744`, :pr:`1761`).

- FIXME breaking changes for tree API and virtual root

**Features**

Expand All @@ -37,6 +38,8 @@
tree sequence. This is then used to generate an error if ``time_units`` is ``uncalibrated`` when
using the branch lengths in statistics. (:user:`benjeffery`, :issue:`1644`, :pr:`1760`)

- FIXME add features for virtual root, num_edges, stack allocation size etc

**Fixes**

----------------------
Expand Down
60 changes: 27 additions & 33 deletions docs/_static/different_time_samples.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
85 changes: 42 additions & 43 deletions docs/_static/tree_structure1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
80 changes: 40 additions & 40 deletions docs/_static/tree_structure2.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
48 changes: 35 additions & 13 deletions docs/data-model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -987,10 +987,8 @@ details of how to use the quintuply linked structure in the C API.

.. _sec_data_model_tree_roots:

Accessing roots
===============

.. todo:: Update this with a discussion of the virtual root
Roots
=====

The roots of a tree are defined as the unique endpoints of upward paths
starting from sample nodes (if no path leads upward from a sample node,
Expand All @@ -1003,6 +1001,10 @@ example, we get a tree with two roots:
:width: 200px
:alt: An example tree with multiple roots

We keep track of roots in tskit by using a special additional node
called the **virtual root**, whose children are the roots. In the
quintuply linked tree encoding this is an extra element at the end
of each of the tree arrays, as shown here:

=========== =========== =========== =========== =========== ===========
node parent left_child right_child left_sib right_sib
Expand All @@ -1013,17 +1015,37 @@ node parent left_child right_child left_sib right_sib
3 6 -1 -1 -1 4
4 6 -1 -1 3 -1
5 7 0 2 -1 -1
6 -1 3 4 7 -1
7 -1 5 5 -1 6
6 -1 3 4 -1 7
7 -1 5 5 6 -1
**8** **-1** **6** **7** **-1** **-1**
=========== =========== =========== =========== =========== ===========

To gain efficient access to the roots in the quintuply linked encoding we keep
one extra piece of information: the ``left_root``. In this example
the leftmost root is ``7``. Roots are considered siblings, and so
once we have one root we can find all the other roots efficiently using
the ``left_sib`` and ``right_sib`` arrays. For example, we can see here
that the right sibling of ``7`` is ``6``, and the left sibling of ``6``
is ``7``.
In this example, node 8 is the virtual root; its left child is 6
and its right child is 7.
Importantly, though, this is an asymmetric
relationship, since the parent of the "real" roots 6 and 7 is null
(-1) and *not* the virtual root. To emphasise that this is not a "real"
node, we've shown the values for the virtual root here in bold.

The main function of the virtual root is to efficiently keep track of
tree roots in the internal library algorithms, and is usually not
something we need to think about unless working directly with
the quintuply linked tree structure. However, the virtual root can be
useful in some algorithms and so it can optionally be returned in traversal
orders (see :meth:`.Tree.nodes`). The virtual root has the following
properties:

- Its ID is always equal to the number of nodes in the tree sequence (i.e.,
the length of the node table). However, there is **no corresponding row**
in the node table, and any attempts to access information about the
virtual root via either the tree sequence or tables APIs will fail with
an out-of-bounds error.
- The parent and siblings of the virtual root are null.
- The time of the virtual root is defined as positive infinity (if
accessed via :meth:`.Tree.time`). This is useful in defining the
time-based node traversal orderings.
- The virtual root is the parent of no other node---roots do **not**
have parent pointers to the virtual root.


.. _sec_data_model_missing_data:
Expand Down
8 changes: 4 additions & 4 deletions docs/examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,7 @@ def stats():


def tree_structure():
def write_table(tree):
def write_table(tree, show_virtual_root=False):
fmt = "{:<12}"
heading = [
"node",
Expand All @@ -273,7 +273,7 @@ def write_table(tree):
print(line)
print(col_def)

for u in range(ts.num_nodes):
for u in range(ts.num_nodes + int(show_virtual_root)):
line = "".join(
fmt.format(v)
for v in [
Expand Down Expand Up @@ -325,7 +325,7 @@ def write_table(tree):
)
tree = ts.first()

write_table(tree)
write_table(tree, show_virtual_root=True)
print(tree.draw_text())
tree.draw_svg("_static/tree_structure2.svg", time_scale="rank")

Expand Down Expand Up @@ -404,6 +404,6 @@ def finding_nearest_neighbors():
# allele_frequency_spectra()
# missing_data()
# stats()
# tree_structure()
tree_structure()
tree_traversal()
finding_nearest_neighbors()
6 changes: 6 additions & 0 deletions docs/python-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@
compare arrays representing different trees along the sequence, you must
take **copies** of the arrays.

.. |virtual_root_array_note| replace:: The length of these arrays is
equal to the number of nodes in the tree sequence plus 1, with the
final element corresponding to the tree's :meth:`~.Tree.virtual_root`.
Please see the :ref:`tree roots <sec_data_model_tree_roots>` section
for more details.

.. currentmodule:: tskit
.. _sec_python_api:

Expand Down
Loading

0 comments on commit a882a5d

Please sign in to comment.