Degrees of freedom with Gaussian model fit via geepack and glmtoolbox #496

Generalized · 2024-07-04T12:09:20Z

Hello,

I noticed a discrepancy between how geepack and glmtoolbox GEE engines are supported by emmeans regarding the degrees of freedom.

glmtoolbox::glmgee

When I look into the emm_basis.glmgee I can notice does a trick - assigns the lm and glm classes to the result. This enables the emm_basis.lm (or glm). Now, if the class is lm (or glm but conditional distribution is Gaussian (or gamma) like in my case), the df.residuals is returned, otherwise - Inf.

> emmeans:::emm_basis.glmgee
function (object, trms, xlev, grid, vcov.method = "robust", ...) 
{
....
    class(object) = c("glm", "lm")

-----------------

> emmeans:::emm_basis.lm
function (object, trms, xlev, grid, ...) 
{
......
  if (inherits(object, "glm")) {
        misc = .std.link.labels(object$family, misc)
        dffun = function(k, dfargs) dfargs$df
        dfargs = list(df = ifelse(object$family$family %in% c("gaussian",  "Gamma"), object$df.residual, Inf))  # <-----

geepack:geeglm

This returns Inf - and this is hardcoded.

I think it would be good to unify the way GEE is handled.
It's even simpler, because the geeglm already offers the df.residual component.
And the geeglm function assigns also the glm and lm classes: class(out) <- c("geeglm", "gee", "glm", "lm")

Either way (both return Inf or both return residual df) will be fine, but I think they should be consistent...

The text was updated successfully, but these errors were encountered:

rvlenth · 2024-07-04T13:52:03Z

Give me reproducible example(s) please.

Generalized · 2024-07-04T14:44:53Z

I'm sorry, I just updated my question, because the original part was irrelevant and I figured it out.

But the second part of my question holds - would it be possible to unify them both to return either Inf or residual.df? geepack (for which Inf is returned) has built in residual.df function, so this would save some work.

This requires no reproducible example. Just information that geepack uses Inf because it's hardcoded in your procedure, and glmtoolbox uses residual.df() from the lm(). So my question is if it's possible that either both returned the same for consistency or both returned residual.df() (geepack offers it natively).

rvlenth · 2024-07-06T02:07:59Z

No, I need an example so I can see what it does.

Generalized · 2024-07-06T12:19:00Z

d <- structure(list(ID = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 
3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L, 
9L, 9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L, 13L, 
13L, 13L, 14L, 14L, 14L), levels = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10", "11", "12", "13", "14"), class = "factor"), 
    Arm = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L), levels = c("A", "B"), class = "factor"), Visit = structure(c(1L, 
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("V1", 
    "V2", "V3"), class = c("ordered", "factor")), Visit_non_ord = structure(c(1L, 
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("V1", 
    "V2", "V3"), class = "factor"), Painscore = c(1L, 2L, 4L, 
    5L, 6L, NA, 4L, 2L, 3L, 4L, 3L, 4L, 5L, NA, 5L, 6L, 7L, 6L, 
    4L, 3L, 5L, 1L, NA, 4L, 5L, 4L, 5L, 4L, 6L, NA, 0L, 4L, 3L, 
    4L, NA, 4L, 3L, 4L, NA, 6L, 7L, 8L)), row.names = c(NA, -42L
), class = "data.frame")

geepack::geeglm.
df=Inf because it's hardcoded emmeans:::emm_basis.geeglm --> dffun = function(k, dfargs) Inf

> library(geepack)
> emmeans( m1 <- geeglm(Painscore ~ Visit * Arm,
                family = gaussian(link = "identity"),
                id = ID, 
                corstr = "unstructured",
                data=d), specs = ~Visit*Arm)

 Visit Arm emmean    SE  df asymp.LCL asymp.UCL
 V1    A     4.11 0.562 Inf      3.01      5.22
 V2    A     3.88 0.748 Inf      2.42      5.35
 V3    A     4.80 0.416 Inf      3.99      5.62
 V1    B     3.22 0.735 Inf      1.78      4.66
 V2    B     4.69 0.440 Inf      3.83      5.55
 V3    B     4.90 0.575 Inf      3.77      6.02

Covariance estimate used: vbeta 
Confidence level used: 0.95 

> m$df.residual  # geepack natively provides df.residual() out of the box.
[1] 30

glmtoolbox::glmgee
df=30 because it uses lm df.residual()

> library(glmtoolbox)

> emmeans( m2 <- glmgee(Painscore ~ Visit * Arm,
               family = gaussian(link = "identity"),
               id = ID, 
               corstr = "unstructured",
               data=d), specs = ~Visit*Arm)

 Visit Arm emmean    SE df lower.CL upper.CL
 V1    A     4.09 0.570 30     2.93     5.26
 V2    A     3.86 0.744 30     2.34     5.38
 V3    A     4.82 0.437 30     3.93     5.71
 V1    B     3.25 0.742 30     1.73     4.77
 V2    B     4.61 0.433 30     3.73     5.50
 V3    B     4.97 0.650 30     3.64     6.29

Covariance estimate used: robust 
Confidence level used: 0.95 
Warning message:
Estimate of correlation matrix is not positive definite 

> df.residual(m2)
[30]

Could handling both be unified in that they return same df?
Either both Inf or both df.residual()

It's not needed now, it can wait when you return in July. Just leaving it to not forget about it.

rvlenth · 2024-07-06T18:12:06Z

OK, I made both use df.residual(), which I hope always works.

> ref_grid(m1) |> summary()
 Visit Arm prediction    SE df
 V1    A         4.11 0.562 30
 V2    A         3.88 0.748 30
 V3    A         4.80 0.416 30
 V1    B         3.22 0.735 30
 V2    B         4.69 0.440 30
 V3    B         4.90 0.575 30

Covariance estimate used: vbeta 

> ref_grid(m2) |> summary()
 Visit Arm prediction    SE df
 V1    A         4.09 0.570 30
 V2    A         3.86 0.744 30
 V3    A         4.82 0.437 30
 V1    B         3.25 0.742 30
 V2    B         4.61 0.433 30
 V3    B         4.97 0.650 30

Covariance estimate used: robust

Generalized added the question label Jul 4, 2024

rvlenth closed this as completed Jul 6, 2024

rvlenth added a commit that referenced this issue Jul 6, 2024

Tweaks #496, #497 + static offset repair

af54337

rvlenth mentioned this issue Aug 19, 2024

[Bug]: Change in emmeans package causes an error in tern.gee insightsengineering/tern.gee#67

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Degrees of freedom with Gaussian model fit via geepack and glmtoolbox #496

Degrees of freedom with Gaussian model fit via geepack and glmtoolbox #496

Generalized commented Jul 4, 2024 •

edited

Loading

rvlenth commented Jul 4, 2024 via email •

edited

Loading

Generalized commented Jul 4, 2024 •

edited

Loading

rvlenth commented Jul 6, 2024

Generalized commented Jul 6, 2024

rvlenth commented Jul 6, 2024

Degrees of freedom with Gaussian model fit via geepack and glmtoolbox #496

Degrees of freedom with Gaussian model fit via geepack and glmtoolbox #496

Comments

Generalized commented Jul 4, 2024 • edited Loading

glmtoolbox::glmgee

geepack:geeglm

rvlenth commented Jul 4, 2024 via email • edited Loading

Generalized commented Jul 4, 2024 • edited Loading

rvlenth commented Jul 6, 2024

Generalized commented Jul 6, 2024

rvlenth commented Jul 6, 2024

Generalized commented Jul 4, 2024 •

edited

Loading

rvlenth commented Jul 4, 2024 via email •

edited

Loading

Generalized commented Jul 4, 2024 •

edited

Loading