Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Degrees of freedom with Gaussian model fit via geepack and glmtoolbox #496

Closed
Generalized opened this issue Jul 4, 2024 · 5 comments
Closed
Labels

Comments

@Generalized
Copy link

Generalized commented Jul 4, 2024

Hello,

I noticed a discrepancy between how geepack and glmtoolbox GEE engines are supported by emmeans regarding the degrees of freedom.

glmtoolbox::glmgee

When I look into the emm_basis.glmgee I can notice does a trick - assigns the lm and glm classes to the result. This enables the emm_basis.lm (or glm). Now, if the class is lm (or glm but conditional distribution is Gaussian (or gamma) like in my case), the df.residuals is returned, otherwise - Inf.

> emmeans:::emm_basis.glmgee
function (object, trms, xlev, grid, vcov.method = "robust", ...) 
{
....
    class(object) = c("glm", "lm")

-----------------

> emmeans:::emm_basis.lm
function (object, trms, xlev, grid, ...) 
{
......
  if (inherits(object, "glm")) {
        misc = .std.link.labels(object$family, misc)
        dffun = function(k, dfargs) dfargs$df
        dfargs = list(df = ifelse(object$family$family %in% c("gaussian",  "Gamma"), object$df.residual, Inf))  # <-----

geepack:geeglm

This returns Inf - and this is hardcoded.

I think it would be good to unify the way GEE is handled.
It's even simpler, because the geeglm already offers the df.residual component.
And the geeglm function assigns also the glm and lm classes: class(out) <- c("geeglm", "gee", "glm", "lm")

Either way (both return Inf or both return residual df) will be fine, but I think they should be consistent...

@rvlenth
Copy link
Owner

rvlenth commented Jul 4, 2024 via email

@Generalized
Copy link
Author

Generalized commented Jul 4, 2024

I'm sorry, I just updated my question, because the original part was irrelevant and I figured it out.

But the second part of my question holds - would it be possible to unify them both to return either Inf or residual.df? geepack (for which Inf is returned) has built in residual.df function, so this would save some work.

This requires no reproducible example. Just information that geepack uses Inf because it's hardcoded in your procedure, and glmtoolbox uses residual.df() from the lm(). So my question is if it's possible that either both returned the same for consistency or both returned residual.df() (geepack offers it natively).

@rvlenth
Copy link
Owner

rvlenth commented Jul 6, 2024

No, I need an example so I can see what it does.

@Generalized
Copy link
Author

d <- structure(list(ID = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 
3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L, 
9L, 9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L, 13L, 
13L, 13L, 14L, 14L, 14L), levels = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10", "11", "12", "13", "14"), class = "factor"), 
    Arm = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L), levels = c("A", "B"), class = "factor"), Visit = structure(c(1L, 
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("V1", 
    "V2", "V3"), class = c("ordered", "factor")), Visit_non_ord = structure(c(1L, 
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("V1", 
    "V2", "V3"), class = "factor"), Painscore = c(1L, 2L, 4L, 
    5L, 6L, NA, 4L, 2L, 3L, 4L, 3L, 4L, 5L, NA, 5L, 6L, 7L, 6L, 
    4L, 3L, 5L, 1L, NA, 4L, 5L, 4L, 5L, 4L, 6L, NA, 0L, 4L, 3L, 
    4L, NA, 4L, 3L, 4L, NA, 6L, 7L, 8L)), row.names = c(NA, -42L
), class = "data.frame")

geepack::geeglm.
df=Inf because it's hardcoded emmeans:::emm_basis.geeglm --> dffun = function(k, dfargs) Inf

> library(geepack)
> emmeans( m1 <- geeglm(Painscore ~ Visit * Arm,
                family = gaussian(link = "identity"),
                id = ID, 
                corstr = "unstructured",
                data=d), specs = ~Visit*Arm)

 Visit Arm emmean    SE  df asymp.LCL asymp.UCL
 V1    A     4.11 0.562 Inf      3.01      5.22
 V2    A     3.88 0.748 Inf      2.42      5.35
 V3    A     4.80 0.416 Inf      3.99      5.62
 V1    B     3.22 0.735 Inf      1.78      4.66
 V2    B     4.69 0.440 Inf      3.83      5.55
 V3    B     4.90 0.575 Inf      3.77      6.02

Covariance estimate used: vbeta 
Confidence level used: 0.95 

> m$df.residual  # geepack natively provides df.residual() out of the box.
[1] 30

glmtoolbox::glmgee
df=30 because it uses lm df.residual()

> library(glmtoolbox)

> emmeans( m2 <- glmgee(Painscore ~ Visit * Arm,
               family = gaussian(link = "identity"),
               id = ID, 
               corstr = "unstructured",
               data=d), specs = ~Visit*Arm)

 Visit Arm emmean    SE df lower.CL upper.CL
 V1    A     4.09 0.570 30     2.93     5.26
 V2    A     3.86 0.744 30     2.34     5.38
 V3    A     4.82 0.437 30     3.93     5.71
 V1    B     3.25 0.742 30     1.73     4.77
 V2    B     4.61 0.433 30     3.73     5.50
 V3    B     4.97 0.650 30     3.64     6.29

Covariance estimate used: robust 
Confidence level used: 0.95 
Warning message:
Estimate of correlation matrix is not positive definite 

> df.residual(m2)
[30]

Could handling both be unified in that they return same df?
Either both Inf or both df.residual()

It's not needed now, it can wait when you return in July. Just leaving it to not forget about it.

@rvlenth
Copy link
Owner

rvlenth commented Jul 6, 2024

OK, I made both use df.residual(), which I hope always works.

> ref_grid(m1) |> summary()
 Visit Arm prediction    SE df
 V1    A         4.11 0.562 30
 V2    A         3.88 0.748 30
 V3    A         4.80 0.416 30
 V1    B         3.22 0.735 30
 V2    B         4.69 0.440 30
 V3    B         4.90 0.575 30

Covariance estimate used: vbeta 

> ref_grid(m2) |> summary()
 Visit Arm prediction    SE df
 V1    A         4.09 0.570 30
 V2    A         3.86 0.744 30
 V3    A         4.82 0.437 30
 V1    B         3.25 0.742 30
 V2    B         4.61 0.433 30
 V3    B         4.97 0.650 30

Covariance estimate used: robust

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants