-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low level pair coalescence counts #2932
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2932 +/- ##
==========================================
+ Coverage 89.62% 89.65% +0.02%
==========================================
Files 29 29
Lines 30170 30393 +223
Branches 5867 5901 +34
==========================================
+ Hits 27041 27249 +208
- Misses 1790 1796 +6
- Partials 1339 1348 +9
Flags with carried forward coverage won't be shown. Click here to find out more.
|
I'd like to generalize this algorithm slightly:
So, I think it'd be useful to take the current algorithm, and have it apply a summary function at the end of each window. This would let us calculate any useful summary statistic without having to create a potentially humongous array (windows by nodes by indexes) as an intermediate. The API would stay the same-- later on, we could add named methods for various summary statistics, and potentially eventually expose a "general summary stat" interface, like is done for the other statistics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me - do you want to add the summary func stuff now before we start porting to C? Probably a good idea, if you want to do this in the short term.
ff871bd
to
7839a94
Compare
The docs build is failing with numpy 2 issues on |
This is probably a package cache issue @benjeffery? |
I bumped the cache version to no avail-- however, it looks like |
Yup, pinning msprime to the latest version in requirements/* seems to have done the trick. |
Alright, I think this is ready for a look @petrelharp and @jeromekelleher. The prototypes for TreeSequence methods are prefixed by I think the next steps are:
Does this seem about right? Alternatively I suppose we could have the reduction functions for (2) be python callbacks? But there might a performance penalty with lots of windows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks great to me @nspope - the algorithm seems very clean.
Regarding Python callbacks for C code, although there's some existing infrastructure for doing this, I'd leave this bit until last. The Python C API stuff is very tricky and there are footguns everywhere. Also, performance will be terrible - there's no way you can use this approach for anything other than prototyping. The summary func gets called in the deepest inner loops, and getting the Python interpreter involved there just doesn't work performance-wise.
ec95c97
to
722a3e4
Compare
Hmm, tests are passing on Ubuntu but evidently not on MacOS or Windows. ssh'ing onto the runner, I can get
on MacOS, happening in one of the tsk_safe_free calls. |
I would strongly advise writing some simple C tests that exercise all reachable code paths, and running it through valgrind. This is a much more effective way of tracking down these kinds of bugs. I can help with this, if it's not clear how to start? |
Makes sense @jeromekelleher, I can take a crack at it. Just so I'm understanding the scope, is the goal to recreate the entire test suite from python in |
Something minimal. You're just looking to exercise all code paths so that valgrind can see them. |
5ad6e09
to
1aab9b5
Compare
That did the trick ... is there a good way to trigger OOM errors after the mallocs, to get patterns like if (whatever == NULL) {
ret = TSK_ERR_NO_MEMORY;
goto out;
} covered? It looks like these aren't covered in the existing tests, so presumably not? |
The only substantial coverage gap is in if (TreeSequence_check_state(self) != 0) {
goto out;
} however, this can't be the only execution path, because otherwise Is there something else needed to get the CPython parts covered? |
Ah, nevermind-- I see that coverage for CPython is getting read off of |
We've been ignoring these here, as they are very tricky to trigger. We did it in the kastore C tests by running with a patched malloc which fails after n allocs, but it's a lot of work for a pretty marginal benefit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! There's a lot going on in there, and I've made a few comments style-wise to make things a bit more idiomatic (mostly just change loop conditions to <). There's a few bugs in there that'll only show up in error conditions though, so we do need some more testing of those.
I think a separate C test case where you take a simple case, and then try to provoke all the input errors around time windows etc would be the best approach.
For the Python C API, we have to be super thorough here and make sure that any non-memory allocation failure cases are covered in test_low_level.py. I know it's a lot of boiler plate, but this is what's involved I'm afraid.
We also want to make sure that the reference counting stuff is working properly. The best way I've found of doing this is to run the code in a tight loop, and watch top
.
Can you do something like this and check please?
ts = # some intermediate size example, so that loops are quick
for _ in range(1000000):
ts.pair_coalescence_counts(...)
The memory usage should stabilise after 30 seconds or so, and then remain constant. Then, try commenting out some of the XDECREFs, and see what happens to memory usage.
Thanks @jeromekelleher! All is clear to me except how to handle the special case where time windows are nodes (which is an important one). I left some comments above describing the issue. Briefly, I'm detecting the "nodes" case via a zero-length time windows array. This can't be replaced by an array of unique times in the tree sequence, because we want to preserve node identity even if its age is not unique, and also the order of nodes in the output array; and both these get lost when we sort nodes into (necessarily unique) time windows. I suppose an alternative (to using a zero-length time windows array) is to use |
7a0e1b8
to
06661b1
Compare
Getting another random windows crash in the CI, but valgrind isn't showing any issues. I traced it back to commit 6287b84 which switched from using |
I really would prefer if we could back out of doing the three different options here and focus on the simpler case of just a fixed set of required time windows first. There's a lot of complexity here already, and I want to make sure we catch all of the boring C level stuff before we have to think about that extra level of trickiness. So, can we leave that bit for a follow up, and focus on getting 100% test coverage on the bits that should be covered? That means more tests in C, and breaking up the Python C tests like I suggested. |
So I should pytest.mark.skip any tests using the "nodes" output for the time being? These are the bulk of the tests, which is why I tried to support that mode. I'm working on extending the test_lowlevel case to get the CPython covered, but I think all the C code paths are now covered aside from OOM. |
338f3f6
to
6318fad
Compare
Here's memory consumption over time: ![]() looks to me like it's stabilizing -- is this what you were after @jeromekelleher ? Code here: monitor mem usage with ps#!/usr/bin/env bash
echo '
import tskit
import numpy as np
import msprime
ts = msprime.sim_ancestry(
1000, sequence_length=1e5, recombination_rate=1e-8, population_size=1e4
)
time_windows = np.array([0, np.max(ts.nodes_time)/2, np.inf])
for _ in range(100000):
ts.pair_coalescence_counts(time_windows=time_windows)
' >del_memory_monitor.py
rm -f del_memory_monitor.log
python del_memory_monitor.py &
for i in {1..1000}; do
ps -eo cmd,%mem,vsz,rsz | grep "python del_memory_monitor" >> del_memory_monitor.log
sleep 0.0001
done
rm del_memory_monitor.py
echo '
import numpy as np
import matplotlib.pyplot as plt
vsz, rsz = np.loadtxt("del_memory_monitor.log", usecols=[2,3]).T
step = np.arange(vsz.size) * 0.0001
plt.plot(step, vsz / np.max(vsz), "-", label="VSZ")
plt.plot(step, rsz / np.max(rsz), "-", label="RSZ")
plt.ylabel("Memory usage / max(memory usage)")
plt.xlabel("Wall time (relative)")
plt.xscale("log")
plt.legend()
plt.savefig("del_memory_monitor.png")
' | python
rm del_memory_monitor.log
|
Nice - did you try it with some decrefs commented out? |
Hmm, you need bigger examples then. You should be leaking memory heavily there. |
Maybe the thing to do with time windows is change the input in |
93f6feb
to
1a01087
Compare
Nice, that sounds like a good idea. Do you want to push forward with that, or merge this much and iterate? (I haven't reviewed again, will need to take another good look) |
I gave it a shot, but it looks like I'm getting stochastic crashes on Windows. Totally perplexed as to why, because the semantics are very similar to the last commit, which worked. (It could be because I had skipped a bunch of tests in the last commit?) |
fde58bf
to
9355d19
Compare
Working now-- bad malloc (d'oh). This is ready for another look whenever you have time @jeromekelleher . Here's mem usage once more, with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just needs a couple of more tests on the low-level input, and we're good to merge I think
31cdd29
to
ca06cff
Compare
Done! It occurs to me that the order of dimensions here -- (windows, time_windows, indexes) -- should be consistent with the (forthcoming) time windowed stats. @petrelharp what order did you guys settle on? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Superb stuff @nspope - there's a lot of things to get up to speed on here and you've done a great job 👍
Same! |
Low level extension of #2915