anova doesnt flag different sample sizes and instead returns p value #158

ngalanter · 2024-11-16T00:07:20Z

The anova function for regress models doesn't report an error if there are different sample sizes between the full and reduced models, which can occur if a variable in the full and not the reduced model has missingness. Instead a (very low) p-value is returned. This is different than the behavior of the anova function and lmtest::lrtest for glm models, which both report errors.

Here is an example:

library(rigr)
library(lmtest)

set.seed(236)

x1 <- as.factor(rep(1:4,25))

x1[1:10] <- NA

x2 <- rnorm(100)

y <- rbinom(size = 1, n = 100, p = 1/(1+exp(0.5*x2)))

dat <- data.frame(x1=x1, x2=x2, y=y)

mod_full_regress <- regress("odds", y ~ x1+x2,data = dat)
mod_reduced_regress <- regress("odds", y ~ x2, data = dat)

mod_full_regress

anova(mod_reduced_regress,mod_full_regress,test = "Wald")
anova(mod_reduced_regress,mod_full_regress,test = "LRT")

mod_full_glm <- glm(y ~ x1+x2, data = dat, family = binomial)
mod_reduced_glm <- glm(y ~ x2, data = dat, family = binomial)

anova(mod_reduced_glm,mod_full_glm,test = "LRT")

lrtest(mod_reduced_glm,mod_full_glm)

The text was updated successfully, but these errors were encountered:

adw96 · 2024-11-16T10:09:38Z

Thanks, @ngalanter . If you have a pull request that addresses this issue, I would can review it, but I won't be able to address it myself for some time. Thanks for understanding.

adw96 assigned gthopkins Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

anova doesnt flag different sample sizes and instead returns p value #158

anova doesnt flag different sample sizes and instead returns p value #158

ngalanter commented Nov 16, 2024

adw96 commented Nov 16, 2024

anova doesnt flag different sample sizes and instead returns p value #158

anova doesnt flag different sample sizes and instead returns p value #158

Comments

ngalanter commented Nov 16, 2024

adw96 commented Nov 16, 2024