Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add normal and wald types #648

Merged
merged 69 commits into from
Sep 26, 2022
Merged

add normal and wald types #648

merged 69 commits into from
Sep 26, 2022

Conversation

strengejacke
Copy link
Member

@strengejacke strengejacke commented Sep 25, 2022

Usually, parameters::degrees_of_freedom() and insight::get_df() are expected to do the same thing. However, currently the design of those two functions differ. parameters::degrees_of_freedom() offers more options to extract df's. This PR aims at bringing insight::get_df() on par with parameters::degrees_of_freedom(), so the latter can be fully replaced by insight::get_df(). I think it doesn't make sense and is rather confusing to have two methods which do different things for certain models.

@easystats/core-team

@codecov-commenter
Copy link

codecov-commenter commented Sep 25, 2022

Codecov Report

Merging #648 (404839f) into main (572e53a) will increase coverage by 0.65%.
The diff coverage is 70.49%.

❗ Current head 404839f differs from pull request most recent head 6abada1. Consider uploading reports for the commit 6abada1 to get more accurate results

@@            Coverage Diff             @@
##             main     #648      +/-   ##
==========================================
+ Coverage   54.32%   54.98%   +0.65%     
==========================================
  Files         119      124       +5     
  Lines       14101    14334     +233     
==========================================
+ Hits         7661     7881     +220     
- Misses       6440     6453      +13     
Impacted Files Coverage Δ
R/get_df_betwithin.R 0.00% <0.00%> (ø)
R/get_predicted.R 71.28% <ø> (ø)
R/get_predicted_args.R 72.22% <0.00%> (-4.33%) ⬇️
R/get_predicted_gam.R 73.91% <0.00%> (+3.07%) ⬆️
R/get_varcov.R 23.29% <ø> (ø)
R/get_df_residual.r 22.22% <22.22%> (ø)
R/get_predicted_se.R 68.67% <33.33%> (+0.81%) ⬆️
R/get_varcov_sandwich.R 84.14% <33.33%> (ø)
R/get_df_satterthwaite.R 54.54% <54.54%> (ø)
R/get_sigma.R 28.28% <62.50%> (-1.63%) ⬇️
... and 7 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

R/get_df.R Outdated Show resolved Hide resolved
R/get_df.R Outdated
#' - `"wald"` for models with z-statistic, returns `"Inf"`. Else, tries to
#' extract residual degrees of freedoms. If residual degrees of freedom could
#' not be extracted, returns `"Inf"`.
#' - `"analytical"` returns analytical degrees of freedom, i.e. `n-k`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between residual and analytical? They sound the same to me from this description

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

analytical is always n_obs() - n_param() (n-k). not sure if which cases this may differ from residual df?

@strengejacke
Copy link
Member Author

@bwiernik @mattansb what should "analytical" df return for models with z-statistic? Still n-k, or Inf? If the latter, "analytical" would be almost the same "wald". See docs:

- `"residual"` tries to extract residual degrees of freedoms. If residual
   degrees of freedom cannot be extracted, returns analytical degrees of
   freedom, i.e. `n-k` (number of observations minus number of parameters).
 - `"wald"` for models with z-statistic, returns `"Inf"`. Else, tries to
   extract residual degrees of freedoms. If residual degrees of freedom
   cannot be extracted, returns `"Inf"`.
 - `"analytical"` for models with z-statistic, returns `"Inf"`. Else, returns
   analytical degrees of freedom, i.e. `n-k` (number of observations minus
   number of parameters).
 - `"normal"` always returns `"Inf"`.
 - `"model"` returns model-based degrees of freedom, i.e. the number of
   (estimated) parameters.

@bwiernik
Copy link
Contributor

I think the only issue is that residual is poorly described. It's whatever the model returns as the df, whereas analytical is always exactly n - params.

@strengejacke
Copy link
Member Author

Should we be more strict/consistent here?

@strengejacke
Copy link
Member Author

strengejacke commented Sep 25, 2022

I think analytical and residual df are actually the same, so maybe we should simplify this.

Then we would have

  • "normal" returns Inf.
  • "wald" returns analytical (aka residual) df for models with t-statistic, and Inf for all other models. Also returns Inf if analytical df cannot be extracted.
  • "analytical" (aka "residual") returns n-k (number of observations minus number of parameters) and Inf if analytical df cannot be extracted.

@strengejacke strengejacke merged commit 03e3978 into main Sep 26, 2022
@strengejacke strengejacke deleted the get_df_typey branch September 26, 2022 19:28
@mattansb
Copy link
Member

For some models it depends how you define "number of parameters" - thinking of mixed models. If I have 1000 obs from 10 subjects and y ~ 1 + time + (1 + time | Subject) lme4 has 6 parameters (2 fixed, 2 variance, 1 random covariance, 1 sigma), but arguably there are more - the 2 * 10 individual random parameters.

Something to think about?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants