Initial framework for balance metrics and tests #2246

jeromekelleher · 2022-05-09T20:05:05Z

This is the first step for adding tree balance metrics as discussed in #2245.

If we agree with this approach, we can follow up with some specific issues to (a) add other metrics; (b) document them; and (c) implement them in C.

codecov · 2022-05-09T20:20:03Z

Codecov Report

Merging #2246 (cb057e7) into main (db7af18) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #2246   +/-   ##
=======================================
  Coverage   93.31%   93.31%           
=======================================
  Files          27       27           
  Lines       26025    26035   +10     
  Branches     1166     1170    +4     
=======================================
+ Hits        24284    24294   +10     
  Misses       1711     1711           
  Partials       30       30

Flag	Coverage Δ
c-tests	`92.23% <ø> (ø)`
lwt-tests	`89.05% <ø> (ø)`
python-c-tests	`71.96% <10.00%> (-0.05%)`	⬇️
python-tests	`98.88% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
python/tskit/trees.py	`98.02% <100.00%> (+0.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db7af18...cb057e7. Read the comment docs.

jeromekelleher · 2022-05-10T10:53:11Z

@jeremyguez would you mind taking a quick look here to see what you think? Are there any other tree topology examples we should include here? The idea is that we'll add tests for each of the balance metrics to the classes as we add them, so we can see the what the values are for these simple cases, easily.

benjeffery

Looks good, I wonder if we should go for defining these topologies in an examples.py though.

benjeffery · 2022-05-10T10:51:05Z

python/tests/test_balance_metrics.py

+            assert tree.sackin_index() == sackin_index_definition(tree)
+
+
+class TestBalancedBinaryOdd:


There's quite a bit of repeated boilerplate here for what is looping over several different topologies. If we're doing #1804 then should these trees be defined in examples.py then this file is a test that checks a dict of tree_name->value pairs as a single parameterised test?

There's a tension here though between avoiding repetition and being able to see what you're testing. I find it really helpful to be able to just look at the example and see what it is your testing, and see what the result should be. It helps a lot to be able to fine-tune the examples for specific things that you're doing, too, rather than trying to come up with a universal set of examples that cover all possibilities.

Also, might not be obvious here but the idea is to put test functions for all the other metrics into these classes, so the boilder-plate fraction will go down quite a bit.

jeremyguez · 2022-05-10T14:02:08Z

@jeromekelleher
The tested topologies look good enough to me. (We could also have the case of mixed arities in one tree, but I'm not sure if it matters much.)

jeromekelleher · 2022-05-10T14:18:09Z

OK, great, thanks @jeremyguez. I don't think there's much need to have mixed arity in a tree, as these examples are really just to make it easy for us to see what the answer is in some representative situations. The systematic testing is done in the pytest.mark.parametrize where we compare the definition with the implementation over lots of examples.

OK, going to merge this then. Are you happy to pick this up from here @jeremyguez? (I'll do the C version of this now, so you have examples to follow there)

jeremyguez · 2022-05-10T14:31:54Z

@jeromekelleher Yes, I started to look at #2249.

jeromekelleher requested a review from benjeffery May 9, 2022 20:05

jeromekelleher force-pushed the balance-first-steps branch from c512e79 to 36735e4 Compare May 9, 2022 20:08

jeromekelleher force-pushed the balance-first-steps branch from 36735e4 to e7a9bd0 Compare May 10, 2022 10:00

This was referenced May 10, 2022

Tree path length #2249

Closed

Colless imbalance index (Python) #2250

Closed

B1 Balance index (Python) #2251

Closed

B2 Balance index (Python) #2252

Closed

C implementation of Sackin index #2253

Closed

benjeffery approved these changes May 10, 2022

View reviewed changes

jeromekelleher added the AUTOMERGE-REQUESTED Ask Mergify to merge this PR label May 10, 2022

Initial framework for balance metrics and tests

cb057e7

jeromekelleher force-pushed the balance-first-steps branch from e7a9bd0 to cb057e7 Compare May 10, 2022 14:18

mergify bot merged commit 5bff374 into tskit-dev:main May 10, 2022

mergify bot removed the AUTOMERGE-REQUESTED Ask Mergify to merge this PR label May 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial framework for balance metrics and tests #2246

Initial framework for balance metrics and tests #2246

jeromekelleher commented May 9, 2022

codecov bot commented May 9, 2022 •

edited

Loading

jeromekelleher commented May 10, 2022

benjeffery left a comment

benjeffery May 10, 2022

jeromekelleher May 10, 2022

jeromekelleher May 10, 2022

jeremyguez commented May 10, 2022

jeromekelleher commented May 10, 2022

jeremyguez commented May 10, 2022

		assert tree.sackin_index() == sackin_index_definition(tree)


		class TestBalancedBinaryOdd:

Initial framework for balance metrics and tests #2246

Initial framework for balance metrics and tests #2246

Conversation

jeromekelleher commented May 9, 2022

codecov bot commented May 9, 2022 • edited Loading

Codecov Report

jeromekelleher commented May 10, 2022

benjeffery left a comment

Choose a reason for hiding this comment

benjeffery May 10, 2022

Choose a reason for hiding this comment

jeromekelleher May 10, 2022

Choose a reason for hiding this comment

jeromekelleher May 10, 2022

Choose a reason for hiding this comment

jeremyguez commented May 10, 2022

jeromekelleher commented May 10, 2022

jeremyguez commented May 10, 2022

codecov bot commented May 9, 2022 •

edited

Loading