Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] pseudo-p significance calculation #281

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 101 additions & 0 deletions esda/significance.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
import numpy as np
from scipy import stats

Check warning on line 2 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L1-L2

Added lines #L1 - L2 were not covered by tests


def calculate_significance(test_stat, reference_distribution, method="two-sided"):

Check warning on line 5 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L5

Added line #L5 was not covered by tests
"""
Calculate a pseudo p-value from a reference distribution.

Pseudo-p values are calculated using the formula (M + 1) / (R + 1). Where R is the number of simulations and M is the number of times that the simulated value was equal to, or more extreme than the observed test statistic.

Simulated test statistics are generated through a process of conditional permutation. Conditional permutation holds fixed the value of Xi and values of neighbors are randomly sampled from X removing Xi simulating spatial randomness. This process is repeated R times to generate a reference distribution from which the pseudo-p value is calculated.
ljwolf marked this conversation as resolved.
Show resolved Hide resolved

Parameters
----------
test_stat:
The observed test statistic, or a vector of observed test statistics
reference_distribution:
A numpy array containing simulated test statistics as a result of conditional permutation.
method:
One of 'two-sided', 'lesser', or 'greater'. Indicates the alternative hypothesis.
- 'two-sided': the observed test-statistic is more-extreme than expected under the assumption of complete spatial randomness.
- 'lesser': the observed test-statistic is less than the expected value under the assumption of complete spatial randomness.
- 'greater': the observed test-statistic is greater than the exepcted value under the assumption of complete spatial randomness.
- 'directed': run both lesser and greater tests, then pick the smaller p-value.
"""
reference_distribution = np.atleast_2d(reference_distribution)
n_samples, p_permutations = reference_distribution.shape
test_stat = np.atleast_2d(test_stat).reshape(n_samples, -1)

Check warning on line 28 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L26-L28

Added lines #L26 - L28 were not covered by tests
if method == "directed":
larger = (reference_distribution >= test_stat).sum(axis=1)
low_extreme = (p_permutations - larger) < larger
larger[low_extreme] = p_permutations - larger[low_extreme]
p_value = (larger + 1.0) / (p_permutations + 1.0)

Check warning on line 33 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L30-L33

Added lines #L30 - L33 were not covered by tests
elif method == "lesser":
p_value = (np.sum(reference_distribution >= test_stat, axis=1) + 1) / (

Check warning on line 35 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L35

Added line #L35 was not covered by tests
ljwolf marked this conversation as resolved.
Show resolved Hide resolved
p_permutations + 1
)
elif method == "greater":
p_value = (np.sum(reference_distribution <= test_stat, axis=1) + 1) / (

Check warning on line 39 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L39

Added line #L39 was not covered by tests
ljwolf marked this conversation as resolved.
Show resolved Hide resolved
p_permutations + 1
)
elif method == "two-sided":
percentile = (reference_distribution < test_stat).mean(axis=1)
bounds = np.column_stack((1 - percentile, percentile)) * 100
bounds.sort(axis=1)

Check warning on line 45 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L43-L45

Added lines #L43 - L45 were not covered by tests
lows, highs = np.row_stack(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be able to be vectorised, but I couldn't quickly figure that out. the following did not generate the same results as below:

stats.scoreatpercentile(reference_distribution, bounds, axis=1)

[
stats.scoreatpercentile(r, per=p)
for r, p in zip(reference_distribution, bounds)
]
).T
n_outside = (reference_distribution < lows[:, None]).sum(axis=1)
n_outside += (reference_distribution > highs[:, None]).sum(axis=1) + 1
p_value = (n_outside + 1) / (p_permutations + 1)

Check warning on line 54 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L52-L54

Added lines #L52 - L54 were not covered by tests
elif method == "folded":
means = reference_distribution.mean(axis=1, keepdims=True)
test_stat = np.abs(test_stat - means)
reference_distribution = np.abs(reference_distribution - means)
p_value = ((reference_distribution >= test_stat).sum(axis=1) + 1) / (

Check warning on line 59 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L56-L59

Added lines #L56 - L59 were not covered by tests
p_permutations + 1
)
else:
raise ValueError(

Check warning on line 63 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L63

Added line #L63 was not covered by tests
f"Unknown p-value method: {method}. Generally, 'two-sided' is a good default!"
)
return p_value

Check warning on line 66 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L66

Added line #L66 was not covered by tests


if __name__ == "__main__":
import numpy
import esda
import pandas
from libpysal.weights import Voronoi
from esda.significance import calculate_significance

Check warning on line 74 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L70-L74

Added lines #L70 - L74 were not covered by tests

coordinates = numpy.random.random(size=(2000, 2))
x = numpy.random.normal(size=(2000,))
w = Voronoi(coordinates, clip="bbox")
w.transform = "r"
stat = esda.Moran_Local(x, w)

Check warning on line 80 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L76-L80

Added lines #L76 - L80 were not covered by tests

ts = calculate_significance(stat.Is, stat.rlisas, method="two-sided")
di = calculate_significance(stat.Is, stat.rlisas, method="directed")
lt = calculate_significance(stat.Is, stat.rlisas, method="lesser")
gt = calculate_significance(stat.Is, stat.rlisas, method="greater")
fo = calculate_significance(stat.Is, stat.rlisas, method="folded")

Check warning on line 86 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L82-L86

Added lines #L82 - L86 were not covered by tests

numpy.testing.assert_array_equal(

Check warning on line 88 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L88

Added line #L88 was not covered by tests
numpy.minimum(lt, gt), di
) # di is just the minimum of the two tests

print(

Check warning on line 92 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L92

Added line #L92 was not covered by tests
f"directed * 2 is the same as two-sided {(di*2 == ts).mean()*100}% of the time"
)

print(

Check warning on line 96 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L96

Added line #L96 was not covered by tests
pandas.DataFrame(
numpy.column_stack((ts, di, fo, lt, gt)),
columns=["two-sided", "directed", "folded", "lt", "gt"],
).corr()
)