behavioral consistency of calculate() #356

simonpcouch · 2021-01-04T01:47:58Z

This PR makes some changes to calculate() aiming to error more informatively and behave more consistently. Briefly, from NEWS:

supply a consistent error when the supplied stat argument isn't well-defined for the variables specified
supply a message and return the test statistic when the user supplies unneeded information to calculate an observed statistic, and
supply a warning, assume a reasonable null value, and return the test statistic when the user does not supply sufficient information to calculate an observed statistic

Error informatively with inappropriate `stat`

The most confusing errors I see while teaching with calculate() seem to predominantly arise from within the dispatched calc_impl methods. As a result, the errors (or incorrect output) appear differently for the same type of mistake: supplying a stat that doesn't make sense given the variable types specify()ed. For instance, with the current develop:

> gss %>%
+   specify(college ~ sex, success = "degree") %>%
+   calculate(stat = "prop", order = c("female", "male"))
#> Error: Only strings can be converted to symbols
#> Run `rlang::last_error()` to see where the error occurred.

or

> gss %>%
+   specify(college ~ NULL, success = "degree") %>%
+   calculate(stat = "diff in props")
#> Error: Must extract column with a single valid subscript.
#> x Subscript `as.character(attr(x, "explanatory"))` has size 0 but must be size 1.
#> Run `rlang::last_error()` to see where the error occurred.

or

> gss %>%
+   specify(hours ~ NULL) %>%
+   calculate(stat = "slope")
#> # A tibble: 1 x 1
#>    stat
#>   <dbl>
#> 1    NA

We caught a few of the more cryptic errors with some conditional logic for special cases (the check_for_*_stat functions), though a good bit still slip through. We ought to make sure that calculate() errors come through before dispatching to calc_impl().

The new generalized error takes the form:

> gss %>%
+     specify(hours ~ NULL) %>%
+     calculate(stat = "diff in means")
#> Error: A difference in means is not well-defined for a numeric response variable 
#> (hours) and no explanatory variable.

or

> gss %>%
+     specify(college ~ partyid, success = "degree") %>%
+     calculate(stat = "diff in props")
#> Error: A difference in proportions is not well-defined for a dichotomous categorical 
#> response variable (college) and a multinomial categorical explanatory variable (partyid).

I've tossed around a few different iterations of phrasings for this error, trying to get at "it doesn't make sense" with sensitive but eliciting language. Very much willing to discuss rewordings here. Specifically, are "dichotomous" and "multinomial" the most common phrases here?

This implementation eliminates the need to run ad-hoc checks/transformations on the variable types/existence within the calc_impl methods that have caused issues in the past (see my most two most recent PRs). Those now take place before dispatching. Further, errors will now be consistent for what is essentially the same mistake—specifying a stat that doesn't make sense given the variable(s) specified.

Its logic relies on a stat_types tibble in utils.R—would appreciate a close look at what it considers a valid test statistic given variable types.

Calculating observed statistics

The steps to calculating observed statistics seem to vary unnecessarily across test statistics. Some test statistics couldn't be calculated without generate()ing first, test statistics differ in their response to unneeded hypothesis information, test statistics with nontrivial null hypotheses differed in the strictness with which they required a null, and we tended often not to warn in pipelines that were similar to those needed to successfully calculate a test statistic. My most recent two PRs for reference here, as well, in addition to the following examples.

Calculate will now supply a message when the user supplies "too much" information to calculate the given ("untheorized") observed statistic:

> gss %>%
+   specify(response = hours) %>%
+   hypothesize(null = "point", mu = 40) %>%
+   calculate(stat = "mean")
#> Message: The point null hypothesis `mu = 40` does not inform calculation of 
#> the observed statistic (a mean) and will be ignored.
#> # A tibble: 1 x 1
#>    stat
#>   <dbl>
#> 1  41.4

In previous infer versions, depending on the statistic, we used to error out or return the original x argument here. Some of my more recent PRs adjust some of the latter behavior to return the statistic without messaging for z statistics.

On the other hand, not supplying enough information (for "theorized" statistics) results in a warning:

> gss %>%
+     specify(response = sex, success = "female") %>%
+     calculate(stat = "z")
#> # A tibble: 1 x 1
#>    stat
#>   <dbl>
#> 1 -1.16
#> Warning message:
#> A z statistic requires a null hypothesis to calculate the observed statistic. 
#> Output assumes the following null value: `p = .5`.

or

> gss %>%
+   specify(response = partyid) %>%
+   calculate(stat = "Chisq")
#> # A tibble: 1 x 1
#>    stat
#>  <dbl>
#> 1  334.
#> Warning message:
#> A chi-square statistic requires a null hypothesis to calculate the observed statistic. 
#> Output assumes the following null values: `p = c(dem = 0.2, ind = 0.2, rep = 0.2, other = 0.2, DK = 0.2)`.

In the above case, depending on the statistic, we used to error (Chi-Square GOF), message (one-sample t), or silently report the statistic/error uninformatively (other theorized statistics). I could see the argument for erroring here, though this would be a breaking change for some statistics and wrappers.

calculate() will behave similarly after generate()ing.

Some other changes:

deletes code comments justifying old PRs / mentioning historical behavior
adds some explanatory commenting
increases usage of helper functions
removes redundant checks & simplifies conditional logic
relocates some helpers that are only used in one core verb
accept "two.sided" as an alias for "two-sided" in *_p_value() (Allow direction = "two.sided" in get_p_value() #355)
some style edits, mostly adding vertical space and ensuring spaces after commas and before/after operators

I don't believe this PR breaks any code that used to work... all previous unit tests pass except for rewordings and transitions between messages/warnings/errors. See 508eefa for needed adjustments.

I don't introduce it in this PR pending more discussion, but I think it's worth mentioning that this allows for a single function to calculate observed statistics (observe()?) a la the current *_stat() functions. Not sure if this goes against the grain of infer's pedagogical approach.

moves one-off helpers into the file they're used in

this should be useful in determining whether a supplied statistic is appropriate given variable types.

also fixes a bug in copying attrs to calculate results: generate -> generated closes #355

adjust line break, use appropriate formatting for t statistic / null mu

some for new functionality, some making up for coverage that we let slip in recent PRs

one was indistinguishable, the other had to do with copying attributes.

Turns out previous behavior wasn't to error when calculating a _distribution_ of statistics without hypothesizing. This will allow calculate to be internally consistent with its handling of null hypotheses.

ismayc · 2021-01-04T02:49:26Z

I kinda do like observe() but I'd need to better understand what the layout would like like and some examples.

Thanks much for all the other work here! It's amazing all the little pieces that needed to be put back together and finished up that have somehow trickled their way into the package. I'm glad we have some master tidiers 😀

ismayc · 2021-01-04T02:50:59Z

What I'd really like is tying the infer verbs back to Allen Downey's There Is Only One Test diagram (http://allendowney.blogspot.com/2016/06/there-is-still-only-one-test.html?m=1). Adding observe() would get us closer to that.

simonpcouch · 2021-01-04T03:46:01Z

Much appreciated!😄

If we decide that these changes are a go, I'll open up a separate PR/issue with some thoughts on observe(). The implementation of that function should be pretty straightforward, as we can rely on the internals of the existing core verbs for argument checking, but the integration into documentation and tests would be more significant.

echasnovski · 2021-01-04T14:30:15Z

Wow, that's a lot of good work! Thanks!
And a lot to review too :). Now I truly understand the importance of a byte-sized PRs.

echasnovski

Thanks again for great work!
After reviewing it does look like splitting this PR into at least two parts (utility functions tidying and calculate() update) would lead to a better review.

R/calculate.R

R/hypothesize.R

R/utils.R

dispatch to helpers within calculate avoid eval(parse( in favor of switch some style improvements

simonpcouch · 2021-01-04T23:58:45Z

@echasnovski Lots of helpful changes coming from your first round of review! Much appreciated. :-)

I hear you on keeping future PRs smaller--will do. Probably should have mentioned that switching out that one vdiffr .svg resulted in ~500 lines changed.

I’m not sure if you mean splitting up the PR itself or its review. Totally on board for the latter—feel free to take your time reviewing. We probably could/should nudge the merging of these changes until after the release of 0.5.4 (addressing the vdiffr issues) in the next few days. This PR will be pretty related to coming observe() and constructor PRs here soon, too, so maybe a 0.6.0 release in a few months along with those new features is a better home for these changes. If you meant splitting up the code in this PR into several PRs, the changes to utilities + helpers and the features in calculate() are pretty interdependent here; I think splitting them up would have the effect of decontextualizing the changes rather than easing review.

R/utils.R

R/calculate.R

R/utils.R

echasnovski · 2021-01-05T10:01:16Z

I’m not sure if you mean splitting up the PR itself or its review. Totally on board for the latter—feel free to take your time reviewing. We probably could/should nudge the merging of these changes until after the release of 0.5.4 (addressing the vdiffr issues) in the next few days. This PR will be pretty related to coming observe() and constructor PRs here soon, too, so maybe a 0.6.0 release in a few months along with those new features is a better home for these changes. If you meant splitting up the code in this PR into several PRs, the changes to utilities + helpers and the features in calculate() are pretty interdependent here; I think splitting them up would have the effect of decontextualizing the changes rather than easing review.

That was me thinking out loud, sorry. I meant that some changes (utilities wrapper or rename) need relatively small amount of targeted conversation but lead to cluttering the whole PR. When looking at every change made in PR I had to determine if it is an "attention needed" change or a simple move/rename. If it was a separate PR, it will need only skipping through changes after agreeing in pre-conversation. Never fully internalized this until actually making review of a rather big PR :)

Yes, making CRAN release soon is a priority here. These changes are better to wait until 0.6.0.

simonpcouch · 2021-01-06T02:19:00Z

All for it, thanks for clarifying + dealing with the clutter in this PR.

I'll holler here and merge upstream once 0.5.4 is on CRAN! Starting that release process tonight.

echasnovski · 2021-01-07T13:33:38Z

Finally got an opportunity to write code. Made some updates myself, hope you don't mind.

simonpcouch · 2021-01-07T15:55:00Z

Thanks for catching that negation for has_attr()--woops. Those changes look great.

notably, fixes merge conflicts in NEWS Merge branch 'master' into obs-stat # Conflicts: # NEWS.md

github-actions · 2021-03-08T00:16:43Z

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

simonpcouch added 25 commits December 28, 2020 10:37

workshop specify()

82a29f0

workshop hypothesize() and generate()

4c106b4

moves one-off helpers into the file they're used in

calculate cleanup, use new helpers elsewhere

3c29436

restore set_params logic, rename null attr helper

aefa73e

mention stat = "odds ratio" in documentation

05e5850

restore stat_types

0ec7e20

this should be useful in determining whether a supplied statistic is appropriate given variable types.

unify check for malspecified stat

086dd93

workshop inappropriate stat error

383fcbe

organize + shuffle around helpers

b1d06f2

introduce getters for response and explanotory

c9c2847

allow "two.sided" spelling

39ece6e

also fixes a bug in copying attrs to calculate results: generate -> generated closes #355

implement warnings for insufficient/excessive null

49a1c60

improve warning on insufficient null

eee6f21

small bug fixes

99b5f21

adjust newly failing unit tests

508eefa

pass mu through ... when stat = "t"

f7537bb

improve warning on insufficient null

7981a44

adjust line break, use appropriate formatting for t statistic / null mu

improve documentation for calculate()

7b414bb

update NEWS

df76f2e

delete redundant checks

c23b3d6

add unit tests

921b03b

some for new functionality, some making up for coverage that we let slip in recent PRs

add global variables

adcfe87

address vdiffr failures

8592307

one was indistinguishable, the other had to do with copying attributes.

error after generate when null needed

bbd832b

roll back generate %>% calculate without null error

5a1c53d

Turns out previous behavior wasn't to error when calculating a _distribution_ of statistics without hypothesizing. This will allow calculate to be internally consistent with its handling of null hypotheses.

simonpcouch requested review from ismayc and echasnovski January 4, 2021 02:28

echasnovski requested changes Jan 4, 2021

View reviewed changes

evgeni's edits

341c554

dispatch to helpers within calculate avoid eval(parse( in favor of switch some style improvements

simplify print_params

4352ecc

echasnovski requested changes Jan 5, 2021

View reviewed changes

R/utils.R Outdated Show resolved Hide resolved

R/calculate.R Outdated Show resolved Hide resolved

R/utils.R Outdated Show resolved Hide resolved

attr_is_null -> has_attr, implement stat_type helpers

11997ce

simonpcouch marked this pull request as draft January 6, 2021 02:15

echasnovski added 2 commits January 7, 2021 14:41

Update has_attr() helper.

3c173d0

Update checking validity of stat in calculate().

1825ecf

echasnovski self-requested a review January 7, 2021 13:34

echasnovski approved these changes Jan 7, 2021

View reviewed changes

simonpcouch changed the base branch from develop to master January 13, 2021 17:06

merge upstream

f812215

notably, fixes merge conflicts in NEWS Merge branch 'master' into obs-stat # Conflicts: # NEWS.md

simonpcouch marked this pull request as ready for review January 13, 2021 17:13

simonpcouch merged commit e084945 into master Jan 17, 2021

simonpcouch deleted the obs-stat branch January 17, 2021 00:12

simonpcouch mentioned this pull request Jan 18, 2021

add function to calculate observed statistics #359

Merged

github-actions bot locked and limited conversation to collaborators Mar 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

behavioral consistency of calculate() #356

behavioral consistency of calculate() #356

simonpcouch commented Jan 4, 2021 •

edited

Loading

ismayc commented Jan 4, 2021

ismayc commented Jan 4, 2021

simonpcouch commented Jan 4, 2021

echasnovski commented Jan 4, 2021

echasnovski left a comment

simonpcouch commented Jan 4, 2021

echasnovski commented Jan 5, 2021

simonpcouch commented Jan 6, 2021

echasnovski commented Jan 7, 2021

simonpcouch commented Jan 7, 2021

github-actions bot commented Mar 8, 2021

behavioral consistency of calculate() #356

behavioral consistency of calculate() #356

Conversation

simonpcouch commented Jan 4, 2021 • edited Loading

Error informatively with inappropriate stat

Calculating observed statistics

ismayc commented Jan 4, 2021

ismayc commented Jan 4, 2021

simonpcouch commented Jan 4, 2021

echasnovski commented Jan 4, 2021

echasnovski left a comment

Choose a reason for hiding this comment

simonpcouch commented Jan 4, 2021

echasnovski commented Jan 5, 2021

simonpcouch commented Jan 6, 2021

echasnovski commented Jan 7, 2021

simonpcouch commented Jan 7, 2021

github-actions bot commented Mar 8, 2021

simonpcouch commented Jan 4, 2021 •

edited

Loading

Error informatively with inappropriate `stat`