-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add nice plot of inverse instantaneous coalescent rate to docs #399
Comments
Hmm, looks like you use |
Ah, I see there's a recent commit that changes the (undocumented) API: tskit-dev/tskit@e6483fc - description of the new interface at tskit-dev/tskit#2915 |
edit added the true rates calculated from the model w/ msprime. The API is only partially implemented at this point (e.g. calculating number of coalescing pairs per node). Until then, here's an example that does the rest manually: import stdpopsim
import numpy as np
import matplotlib.pyplot as plt
def rates_from_ecdf(weights, atoms, quantiles):
assert weights.size == atoms.size
assert np.all(weights > 0.0)
assert np.all(atoms > 0.0)
assert np.all(np.diff(atoms) >= 0.0)
assert quantiles[0] == 0.0 and quantiles[-1] == 1.0
assert quantiles.size > 2
# find interior quantiles
weights = np.append(0, weights)
atoms = np.append(0, atoms)
ecdf = np.cumsum(weights)
indices = np.searchsorted(ecdf, ecdf[-1] * quantiles[1:-1], side='left')
lower, upper = atoms[indices - 1], atoms[indices]
ecdfl, ecdfu = ecdf[indices - 1] / ecdf[-1], ecdf[indices] / ecdf[-1]
assert np.all(np.logical_and(quantiles[1:-1] > ecdfl, quantiles[1:-1] < ecdfu))
# interpolate ECDF
assert np.all(ecdfu - ecdfl > 0)
slope = (upper - lower) / (ecdfu - ecdfl)
breaks = np.append(0, lower + slope * (quantiles[1:-1] - ecdfl))
# calculate coalescence rates within intervals
# this uses a Kaplan-Meier-type censored estimator: https://github.com/tskit-dev/tskit/pull/2119
coalrate = np.full(quantiles.size - 1, np.nan)
propcoal = np.diff(quantiles[:-1]) / (1 - quantiles[:-2])
coalrate[:-1] = -np.log(1 - propcoal) / np.diff(breaks)
last = indices[-1]
coalrate[-1] = np.sum(weights[last:]) / np.dot(atoms[last:] - breaks[-1], weights[last:])
return coalrate, breaks
# simulate some data
engine = stdpopsim.get_engine("msprime")
homsap = stdpopsim.get_species("HomSap")
contig = homsap.get_contig("chr1", right=10e6)
demogr = homsap.get_demographic_model("Zigzag_1S14")
ts = engine.simulate(demographic_model=demogr, contig=contig, samples={"generic" : 100})
# calculate coalescence rate between equally-spaced quantiles of the pair coalescence time distribution
quantiles = np.linspace(0, 1, 100)
coal_pairs = ts.pair_coalescence_counts() # number of coalescing pairs per node
atoms = ts.nodes_time[coal_pairs > 0]
weights = coal_pairs[coal_pairs > 0]
order = np.argsort(atoms)
rates, breaks = rates_from_ecdf(weights[order], atoms[order], quantiles)
# expected coalescence rates from model, and true Ne (using msprime)
debugger = demogr.model.debug()
true_rates = debugger.coalescence_rate_trajectory(steps=breaks, lineages={"generic":2})[0]
finescale = np.linspace(0, max(breaks), 1000)
true_ne = 0.5 / debugger.coalescence_rate_trajectory(steps=finescale, lineages={"generic":2})[0]
plt.step(breaks, 0.5 / rates, label='empirical quantiles')
plt.step(breaks, 0.5 / true_rates, label='theoretical quantiles')
plt.step(finescale, true_ne, label='model')
plt.xlabel("Generations in past")
plt.ylabel("Ne (diploid)")
plt.xscale("log")
plt.legend()
# note that the quantile method, while having minimal variance per epoch, averages over recent
# time (as there's not many coalescence events) |
I should mention, I don't think this approach will give very usable estimates on tsinfer'd tree sequences. And, I'd rather not encourage people to use this API-- the variational gamma + mutation rescaling approach should be way more robust to population size changes, with no fiddling on the part of the user. So do we even need an example here? |
This is a fair point. I have commented out that bit of the docs and will close this. Nevertheless, some examples of utility for the dates of nodes (as well as / instead of mutations) might be nice to illustrate. |
I've left a space in the docs (four paragraphs down at https://tskit.dev/tsdate/docs/latest/popsize.html) to place a population-size-over-time plot. I think this will involve plotting the IICR over time and comparing it to the demography object using to create the simulation.
I think you know how to make plots like this, right @nspope ? Do you have any tips?
The text was updated successfully, but these errors were encountered: