Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for counting coalescing pairs #2915

Merged
merged 1 commit into from
Apr 17, 2024
Merged

Conversation

nspope
Copy link
Contributor

@nspope nspope commented Apr 1, 2024

Core algorithm, tests and API that partially address #2904

Copy link

codecov bot commented Apr 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.61%. Comparing base (d1f81e2) to head (f7078da).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2915      +/-   ##
==========================================
- Coverage   89.68%   89.61%   -0.07%     
==========================================
  Files          29       29              
  Lines       30391    30176     -215     
  Branches     5907     5874      -33     
==========================================
- Hits        27255    27043     -212     
  Misses       1793     1793              
+ Partials     1343     1340       -3     
Flag Coverage Δ
c-tests 86.21% <ø> (ø)
lwt-tests 80.78% <ø> (ø)
python-c-tests 88.72% <ø> (ø)
python-tests 98.97% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
python/tskit/stats.py 100.00% <ø> (+0.77%) ⬆️
python/tskit/trees.py 98.84% <100.00%> (+0.07%) ⬆️

@nspope nspope force-pushed the pair-coal-python branch from 8219dc5 to bf642f5 Compare April 2, 2024 01:15
@nspope
Copy link
Contributor Author

nspope commented Apr 2, 2024

@jeromekelleher and @petrelharp this is ready for a look whenever you have time.

Shall I move the core algorithm into the C library in this PR or in a followup?

python/tskit/trees.py Outdated Show resolved Hide resolved
python/tskit/trees.py Outdated Show resolved Hide resolved
@petrelharp
Copy link
Contributor

The remainder method is very clever. However, it does involve a lot of adding & subtracting of close numbers: is floating point error liable to be an issue? If there are n samples and a sequence length L, then a node present for O(1) sequence length will have their coalescing pairs computed as n^2 * (L + O(1)) - n^2 * L; I suppose this is only a problem if 1 / L is machine epsilon? So, not a worry?

python/tskit/trees.py Outdated Show resolved Hide resolved
python/tskit/trees.py Outdated Show resolved Hide resolved
python/tskit/trees.py Outdated Show resolved Hide resolved
@petrelharp
Copy link
Contributor

This looks great - I've made some comments; and perhaps it needs another test case for which some samples might be parents to other samples? (e.g., produced by the nonWF simulator in the test suite, I think?)

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, I'll have a closer look when @petrelharp's comments are addressed.

I'd suggest getting this much merged first, then following up with a draft PR that implements the algorithm in Python (in the test file) using the TreePosition code. Once we're happy with that algorithm,translating to C should be straightforward.

python/tests/test_coalrate.py Outdated Show resolved Hide resolved
@nspope
Copy link
Contributor Author

nspope commented Apr 2, 2024

I suppose this is only a problem if 1 / L is machine epsilon

Right -- if you've got a node with span S in a sequence of length L, you'll lose bits if S / L < machine epsilon. In practice it seems that this won't happen, and could be detected easily enough. What do you think, @jeromekelleher?

@jeromekelleher
Copy link
Member

Seems pretty unlikely to me

@petrelharp
Copy link
Contributor

We discussed changing the name, but I've forgotten to what? And, to be clear, i think Jerome's suggesting modifying this python algorithm, not adding a new one.

@nspope
Copy link
Contributor Author

nspope commented Apr 3, 2024

This is ready for another look ...

  • Addressed @petrelharp's comments -- algorithm had to be modified slightly to work when internal nodes are flagged as samples.
  • Changed name to 'pair_coalescence_counts'. The idea is this'll have a twin, 'pair_coalescence_rates' that invokes 'pair_coalescence_counts' under the hood and converts counts to rates within time windows or quantile intervals.
  • Added a 'time_discretisation' argument that maps nodes to time intervals. When there's a ton of nodes / windows / sample sets the output array may get too big to store in memory. In which case this lets one get a more compact summary that can be used to e.g. calculate coalescence rates. By default this is a string "nodes"; otherwise an array of breakpoints ending at np.inf.

@nspope
Copy link
Contributor Author

nspope commented Apr 3, 2024

i think Jerome's suggesting modifying this python algorithm, not adding a new one.

But in a followup PR, right?

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - some minor points on naming the time windows argument and how we deal with negative times. If you like you can drop this argument from the initial version and make an issue to track?

python/tskit/trees.py Outdated Show resolved Hide resolved
python/tskit/trees.py Outdated Show resolved Hide resolved
@jeromekelleher
Copy link
Member

Any idea why codecov isn't working here @benjeffery? It's making the diff impossible to read here, which is worse than useless.

@nspope
Copy link
Contributor Author

nspope commented Apr 4, 2024

If you like you can drop this argument from the initial version and make an issue to track?

Thanks @jeromekelleher, it'd be great to sort this API out now as it's very close.

re: naming, @petrelharp brought up that the stats methods will eventually have a time_windows argument, but that the behavior will be a bit different because pair_coalescence_counts needs a (default) time discretisation option of "nodes" as well as an option to pass time window breakpoints. So we could either:

  1. Name this argument something other than time_windows -- I think time_bins is fine (definitely better than time_discretisation)
  2. Call this argument time_windows and document that it accepts an additional string option that won't be accepted by other stats methods

Either is fine by me. Do you have a preference @petrelharp?

@jeromekelleher
Copy link
Member

I think having this method accept an additional argument to time_windows here would be totally fine. It would be more confusing to use a different name just to avoid that minor bit of inconsistency I think.

@benjeffery
Copy link
Member

Any idea why codecov isn't working here @benjeffery? It's making the diff impossible to read here, which is worse than useless.

I'm seeing a "Commit YAML is invalid" error at codecov - checking it out.

@benjeffery
Copy link
Member

@Mergifyio rebase

Copy link
Contributor

mergify bot commented Apr 5, 2024

rebase

✅ Branch has been successfully rebased

@benjeffery
Copy link
Member

@nspope I've rebased here to fix CI so you won't be able to push changes without a reset. Not sure how much of a git ninja you are so let me know if you need to push changes.

@petrelharp
Copy link
Contributor

I think having this method accept an additional argument to time_windows here would be totally fine. It would be more confusing to use a different name just to avoid that minor bit of inconsistency I think.

Sounds good to me!

@nspope nspope force-pushed the pair-coal-python branch from 4265531 to 15fca84 Compare April 5, 2024 16:57
@nspope
Copy link
Contributor Author

nspope commented Apr 5, 2024

Thanks @benjeffery! Looks like codecov is failing lwt-tests and python-c-tests -- I don't think that's due to this PR?

@nspope
Copy link
Contributor Author

nspope commented Apr 5, 2024

Thanks @jeromekelleher -- I've modified so that time_windows argument accepts either "nodes" or a sorted array of breakpoints. Nodes that fall outside of time_windows aren't counted in the output.

@nspope
Copy link
Contributor Author

nspope commented Apr 5, 2024

Darn, it looks like codecov is still mangling the diff, though this is rebased onto 9e1bf0 ? Sorry Ben, maybe rebasing / pushing on my end screwed something up?

@benjeffery
Copy link
Member

So the comment in #2915 (comment) is correct - only one line is missing coverage. However, previous comments made by codecov are not removed :(
I think the only way to clear those would be to open a new PR.

@nspope nspope force-pushed the pair-coal-python branch from 272552b to b43d1b8 Compare April 10, 2024 03:37
@nspope
Copy link
Contributor Author

nspope commented Apr 10, 2024

Hmm, now the post-test Codecov upload is erroring out:

[2024-04-10T04:01:34.167Z] ['verbose'] The error stack is: Error: Error uploading to https://codecov.io: Error: There was an error fetching the storage URL during POST: 404 - {'detail': ErrorDetail(string='Unable to locate build via Github Actions API. Please upload with the Codecov repository upload token to resolve issue.', code='not_found')}

I'll go ahead and open a new PR ...

@benjeffery
Copy link
Member

Hmm, now the post-test Codecov upload is erroring out:

[2024-04-10T04:01:34.167Z] ['verbose'] The error stack is: Error: Error uploading to https://codecov.io: Error: There was an error fetching the storage URL during POST: 404 - {'detail': ErrorDetail(string='Unable to locate build via Github Actions API. Please upload with the Codecov repository upload token to resolve issue.', code='not_found')}

I'll go ahead and open a new PR ...

Yeah, we sometimes get that one. I spent a while investigating it and it seems to be transient on codecov's end.

@nspope
Copy link
Contributor Author

nspope commented Apr 10, 2024

@benjeffery it may be related to v3 codecov-actions deprecation, see comment. Happy to wait and see if it resolves itself though

@nspope nspope force-pushed the pair-coal-python branch 2 times, most recently from 7a3f7b7 to 60f9278 Compare April 11, 2024 00:58
@nspope
Copy link
Contributor Author

nspope commented Apr 11, 2024

Well-- bumping codecov-actions to v4 got rid of the upload errors, but the coverage reports don't seem to be making their way here.

@nspope nspope mentioned this pull request Apr 11, 2024
@nspope
Copy link
Contributor Author

nspope commented Apr 11, 2024

@benjeffery Sorry for the hassle ... but any chance you could clone and force-push this to re-trigger codecov? To see if the issue might be related to this PR coming from a fork? It seems like the coverage report is working fine in #2924

@mergify mergify bot mentioned this pull request Apr 15, 2024
@benjeffery
Copy link
Member

@Mergifyio rebase

…tion

Add test against worked example

Remove unused imports

More efficient windowing

Raise errors with invalid inputs

Misc. fixes and tests; add time discretisation argument

Test nonsuccinct case

Fix negative times for time windows

Delete accidental line copy

Remove unneeded if

Error message

Fix test for error message
Copy link
Contributor

mergify bot commented Apr 17, 2024

rebase

✅ Branch has been successfully rebased

@benjeffery
Copy link
Member

Codecov issue now seems fixed here - it has removed all it's comments about uncovered lines.

@jeromekelleher
Copy link
Member

Are we good to merge then @nspope?

@nspope
Copy link
Contributor Author

nspope commented Apr 17, 2024

Yes! Thanks for the fix @benjeffery

@benjeffery benjeffery added the AUTOMERGE-REQUESTED Ask Mergify to merge this PR label Apr 17, 2024
@mergify mergify bot merged commit e6483fc into tskit-dev:main Apr 17, 2024
24 of 25 checks passed
@mergify mergify bot removed the AUTOMERGE-REQUESTED Ask Mergify to merge this PR label Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants