Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nice plot of inverse instantaneous coalescent rate to docs #399

Closed
hyanwong opened this issue Jun 5, 2024 · 5 comments
Closed

Add nice plot of inverse instantaneous coalescent rate to docs #399

hyanwong opened this issue Jun 5, 2024 · 5 comments

Comments

@hyanwong
Copy link
Member

hyanwong commented Jun 5, 2024

I've left a space in the docs (four paragraphs down at https://tskit.dev/tsdate/docs/latest/popsize.html) to place a population-size-over-time plot. I think this will involve plotting the IICR over time and comparing it to the demography object using to create the simulation.

I think you know how to make plots like this, right @nspope ? Do you have any tips?

@hyanwong hyanwong changed the title Nice plot of inverse instantaneous coalescent rate Add nice plot of inverse instantaneous coalescent rate to docs Jun 5, 2024
@hyanwong
Copy link
Member Author

hyanwong commented Jun 8, 2024

Hmm, looks like you use ts.coalescence_time_distribution something like the implementation you describe here, but I'm not sure it's worth using the to make an example in the docs if the interface is going to change?

@hyanwong
Copy link
Member Author

hyanwong commented Jun 8, 2024

Ah, I see there's a recent commit that changes the (undocumented) API: tskit-dev/tskit@e6483fc - description of the new interface at tskit-dev/tskit#2915

@nspope
Copy link
Contributor

nspope commented Jun 8, 2024

edit added the true rates calculated from the model w/ msprime.

The API is only partially implemented at this point (e.g. calculating number of coalescing pairs per node). Until then, here's an example that does the rest manually:

import stdpopsim
import numpy as np
import matplotlib.pyplot as plt

def rates_from_ecdf(weights, atoms, quantiles):
    assert weights.size == atoms.size
    assert np.all(weights > 0.0)
    assert np.all(atoms > 0.0)
    assert np.all(np.diff(atoms) >= 0.0)
    assert quantiles[0] == 0.0 and quantiles[-1] == 1.0
    assert quantiles.size > 2

    # find interior quantiles
    weights = np.append(0, weights)
    atoms = np.append(0, atoms)
    ecdf = np.cumsum(weights)
    indices = np.searchsorted(ecdf, ecdf[-1] * quantiles[1:-1], side='left')
    lower, upper = atoms[indices - 1], atoms[indices]
    ecdfl, ecdfu = ecdf[indices - 1] / ecdf[-1], ecdf[indices] / ecdf[-1]
    assert np.all(np.logical_and(quantiles[1:-1] > ecdfl,  quantiles[1:-1] < ecdfu))

    # interpolate ECDF
    assert np.all(ecdfu - ecdfl > 0)
    slope = (upper - lower) / (ecdfu - ecdfl)
    breaks = np.append(0, lower + slope * (quantiles[1:-1] - ecdfl))

    # calculate coalescence rates within intervals
    # this uses a Kaplan-Meier-type censored estimator: https://github.com/tskit-dev/tskit/pull/2119
    coalrate = np.full(quantiles.size - 1, np.nan)
    propcoal = np.diff(quantiles[:-1]) / (1 - quantiles[:-2])
    coalrate[:-1] = -np.log(1 - propcoal) / np.diff(breaks)
    last = indices[-1]
    coalrate[-1] = np.sum(weights[last:]) / np.dot(atoms[last:] - breaks[-1], weights[last:]) 

    return coalrate, breaks


# simulate some data
engine = stdpopsim.get_engine("msprime")
homsap = stdpopsim.get_species("HomSap")
contig = homsap.get_contig("chr1", right=10e6)
demogr = homsap.get_demographic_model("Zigzag_1S14")
ts = engine.simulate(demographic_model=demogr, contig=contig, samples={"generic" : 100})

# calculate coalescence rate between equally-spaced quantiles of the pair coalescence time distribution
quantiles = np.linspace(0, 1, 100)
coal_pairs = ts.pair_coalescence_counts()  # number of coalescing pairs per node
atoms = ts.nodes_time[coal_pairs > 0]
weights = coal_pairs[coal_pairs > 0]
order = np.argsort(atoms)
rates, breaks = rates_from_ecdf(weights[order], atoms[order], quantiles)

# expected coalescence rates from model, and true Ne (using msprime)
debugger = demogr.model.debug()
true_rates = debugger.coalescence_rate_trajectory(steps=breaks, lineages={"generic":2})[0]
finescale = np.linspace(0, max(breaks), 1000)
true_ne = 0.5 / debugger.coalescence_rate_trajectory(steps=finescale, lineages={"generic":2})[0]

plt.step(breaks, 0.5 / rates, label='empirical quantiles')
plt.step(breaks, 0.5 / true_rates, label='theoretical quantiles')
plt.step(finescale, true_ne, label='model')
plt.xlabel("Generations in past")
plt.ylabel("Ne (diploid)")
plt.xscale("log")
plt.legend()

# note that the quantile method, while having minimal variance per epoch, averages over recent
# time (as there's not many coalescence events)

foo

@nspope
Copy link
Contributor

nspope commented Jun 8, 2024

I should mention, I don't think this approach will give very usable estimates on tsinfer'd tree sequences. And, I'd rather not encourage people to use this API-- the variational gamma + mutation rescaling approach should be way more robust to population size changes, with no fiddling on the part of the user. So do we even need an example here?

@hyanwong
Copy link
Member Author

This is a fair point. I have commented out that bit of the docs and will close this.

Nevertheless, some examples of utility for the dates of nodes (as well as / instead of mutations) might be nice to illustrate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants