-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Virtual root in trees #1704
Virtual root in trees #1704
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1704 +/- ##
==========================================
+ Coverage 93.38% 93.40% +0.01%
==========================================
Files 27 27
Lines 24571 24672 +101
Branches 1100 1090 -10
==========================================
+ Hits 22945 23044 +99
- Misses 1591 1593 +2
Partials 35 35
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
ab3517c
to
00debbe
Compare
00debbe
to
1942acd
Compare
365abef
to
c2b63fb
Compare
c01c6a6
to
dae3e6f
Compare
Some benchmarks: import msprime
import collections
import time
import tskit
import sys
import pandas as pd
import numpy as np
def benchmark_tree_ops(name, ts):
orders = [
"preorder",
"inorder",
"postorder",
"levelorder",
"breadthfirst",
"timeasc",
"timedesc",
"minlex_postorder",
]
data = []
max_trees = 100
for ordering in orders:
times = np.zeros(min(max_trees, ts.num_trees))
for tree in ts.trees():
if tree.index == max_trees:
break
before = time.perf_counter()
# consume the iterator as quickly as possible
iterator = tree.nodes(order=ordering)
collections.deque(iterator, maxlen=0)
times[tree.index] = time.perf_counter() - before
data.append({"name": name, "order": ordering, "time": np.mean(times)})
df = pd.DataFrame(data)
print(df)
ts = msprime.sim_ancestry(100000, random_seed=32)
benchmark_tree_ops("big_tree", ts)
ts = msprime.sim_ancestry(1000, sequence_length=1e11, recombination_rate=1e-8,
random_seed=42)
# print(ts.num_trees)
benchmark_tree_ops("many_trees", ts) With current head of git we get:
with this branch we have
So we have roughly an order magnitude traversal perf improvement for preorder, postorder and timeasc and timedesc. We could try to measure the extra memory usage involved in creating the numpy arrays, but I'd be astonished if it was significant. |
acefc2c
to
079f650
Compare
Notes:
|
12e1ae5
to
a2faa9a
Compare
This is ready for review, and has some big changes so it would be good to get some eyes on it. Hopefully the doc updates will explain what's going on. There's still some stuff that needs tidying up on the C side before that gets documented, but I'd like to merge this much first so it doesn't get any bigger. Pinging @benjeffery @petrelharp @molpopgen |
Oh wow. Awesome! I'm in the weeds with vargen, will take a look over this later today. |
I am writing quiz questions on bacterial genetics today--back tomorrow! Ping me if I seem to have disappeared... |
a2faa9a
to
a384c3e
Compare
Bumping this one - @benjeffery any chance you could take a look through please? Good to get this diff pushed through. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! This is amazing. Sorry this took a while to review and check I understand what is going on. Annoyingly I couldn't find anything to comment on.
Closes #1670 --- Minimal C changes to implement virtual root support Includes the minimal changes to get the Python tests passing as well. Skipping viz tests for now, as the rendering is arbitrary. Add virtual roots to the quintuply linked tree arrays. Test properties of the virtual_root
Change timeasc and timedesc to keep the tree ordering within a timeslice rather than sorting by node ID (potentially breaking) Closes tskit-dev#1776 Closes tskit-dev#1725
Keep a track of the number of edges that are used to build the topology of the tree and document in Python and C interfaces.
Closes tskit-dev#1691 Closes tskit-dev#1706
a384c3e
to
a882a5d
Compare
See #1691
Still a WIP, but it definitely works so I think we should do it. The code is so much cleaner this way.