-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hypothesis testing in the Bayesian framework #25
Comments
@mattansb out of curiosity, I was wondering what do you think of the p-MAP, which Jeff Mills calls "the Bayesian p-value" in his talk? As it seems to offer in theory the "best of both worlds", i.e., it can give evidence for the null (which I remember you mentioned as the main benefit of BF), but it also does not suffer from all the limitations of the BFs. Moreover, it is straightforward to compute, understand and seems (at least that's what Jeff suggests) mathematically grounded... |
I have some thoughts: First, I'm not sure why this is dubbed a "p-value" - it is a ratio (because the denominator is the MAP it is by definition <= 1, but still not a probability), making it more like a BF than a p-value. Second, I don't see how it lends itself to support the null any more than a p-value - at best, when MAP is the null, the p-MAP is 1. This is also true for p-values - when the estimate is the null, the p-value is 1. Since the latter cannot be used to support the null, I don't see how the former can. (I guess this is why it is the Bayesian p-value). Finally, it does not really deal with the problem of choosing a prior - it only deal with a problem of choosing a weak/non-informative prior. But when you have strong priors you get a reversed Jeffreys-Lindley-Bartlett paradox: library(bayestestR)
library(rstanarm)
stan_glm_shhh <- function(...){
capture.output(fit <- stan_glm(...))
fit
}
X <- rnorm(100)
Y <- X + rnorm(100, sd = 0.1)
cor(X,Y) # data points to a strong effects
#> [1] 0.9953305
fit <- stan_glm_shhh(Y ~ X,
prior = normal(0,0.001)) # strong priors for null effect
p_map(fit) # points to no effect!
#> # MAP-based p-value
#>
#> (Intercept): 0.978
#> X : 1.000
X <- rnorm(10000)
Y <- rnorm(10000)
cor(X,Y) # data points to no effect
#> [1] -0.0205174
fit <- stan_glm_shhh(Y ~ X,
prior = normal(1,0.001)) # strong priors against null effect
p_map(fit) # points to a true effect!
#> # MAP-based p-value
#>
#> (Intercept): 0.713
#> X : 0.000 Created on 2019-06-19 by the reprex package (v0.3.0) |
I agree, a "Bayesian p-value" refers IMO more to the pd than to this ratio.
good point.
Interesting interesting. As Justin Bieber recently challenged Tom Cruise for an MMA fight in an octogone, I am thinking about organizing a tournament with Wagenmakers, Mills, Kruschke, the stan people, you and Daniel. I will be taking the bets 💰 💰 |
Just like in the Bieber vs. Cruise case, I'm sure its obvious who would be the ultimate MBA (Mixed Bayesian arts) champion 😜 BTW, the BF here performs here as expected: #> Computation of Bayes factors: estimating marginal likelihood, please wait...
#> Bayes factor analysis
#> ---------------------
#> [2] X 1.01
#>
#> Against denominator:
#> [1] (Intercept only)
#> ---
#> Bayes factor type: marginal likelihoods (bridgesampling) And for the second model, the priors of point-null model are wayyyy more appropriate that the alternative model, that BF <<< 1: #> Computation of Bayes factors: estimating marginal likelihood, please wait...
#> Bayes factor analysis
#> ---------------------
#> [2] X 6.46e-14
#>
#> Against denominator:
#> [1] (Intercept only)
#> ---
#> Bayes factor type: marginal likelihoods (bridgesampling) (Note that I used the compare-models function and not the Savage-Dickey function because for the second model the prior and posterior samples were both so close together and far from 0 that estimation failed ( |
BTW, the reversed Jeffreys-Lindley-Bartlett paradox holds also for pd and ROPE, CI, Median... and any other measure that is only based on the posterior. To summarize:
FIN |
From here..
Hypothesis testing framework
Probability of direction
pd is a measure of existence based only on the posterior - it is the maximal percent of the posterior that is on some side of zero.
In a hypothesis testing framework, it tests
(??the pd is a measure of certainty - how certain we are that theta is not 0 (by testing how probable is the most probable sign of theta).
ROPE
ROPE is a measure of significance that is based on the posterior and on some pre-conceived notion of a "small" effect.
In a hypothesis testing framework, it tests
Bayes factors
BF is a relative measure of evidence. In the case where one model is the point null, it tests the relative probability of the data between the models.
In a hypothesis testing framework, it tests
p-MAP
There's also the p-MAP, which isn't getting much love by us... We are waiting for the feedback from Prof Jeff Mills from which this index was inspired.
What is common between the indices
I think the main vignette and guidelines should be along one or more of these ^ lines...
The text was updated successfully, but these errors were encountered: