Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R: new formula & vegan breaks #299

Closed
jarioksa opened this issue Dec 28, 2018 · 5 comments
Closed

R: new formula & vegan breaks #299

jarioksa opened this issue Dec 28, 2018 · 5 comments

Comments

@jarioksa
Copy link
Contributor

I just got the following email message from Martin Maechler:

just a quick note before I embark on longish trip to Calgary Canada:
Following up the thread on the R-devel mailing list,
I've recently changed formula(<model.frame>) behavior such that it
returns the formula from the "terms" attribute of the model frame,
rather than a newly constructed formula from the model.frame's
variables (as it does for non-model.frame data frames).

You may have noticed on the CRAN checks for vegan that this breaks at
least one of vegan's examples on "R-devel" (R's development version).

I may take time on the long plane flight to see how to fix it... or
may not: vegan seems to be doing pretty "strange" contortions in
evaluating formulas,
as far as I could tell, quite a bit non-standard and different than
say lm(), glm() etc.

This may mean that we have to do something with the formula. Most likely this concerns cca.formula and its ilk. Other functions have more usual handling of formula, although we do have matrix-like right-hand-side in most functions.

@gavinsimpson
Copy link
Contributor

The proximal issue of the error is in ordiParseFormula() on L76. There we call

mf <- model.frame(formula(mf), mf, xlev = xlev,
                  na.action = na.action, drop.unused.levels = TRUE)

and in R-patched formula(mf) returns

> formula(mf)
Management ~ `poly(A1, 2)` + spno

whereas in R-devel it returns

> formula(mf)
~Management + poly(A1, 2) + spno

The current behaviour seems wrong — why is Management now the response? — but the direct issue is that currently we expect poly(A1, 2) to be quoted as we want it to find the variable (a matrix) of that same name in mf:

> str(mf)
'data.frame':   13 obs. of  3 variables:
 $ Management : Factor w/ 4 levels "BF","HF","NM",..: 1 4 4 NA 2 2 1 4 4 3 ...
 $ poly(A1, 2): num [1:13, 1:2] -0.1421 -0.0579 -0.0684 0.1526 -0.0579 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr  "1" "2"
....

instead of re-evaluating the formula — we don't store A1 or at least it's not available by the time we evaluate the model frame on L76. If it was R-devel would work.

@jarioksa
Copy link
Contributor Author

jarioksa commented Dec 29, 2018

There is a plenty of cruft (and nothing's plenty for me) in ordiParseFormula. Some of this is just to get around quirks in R formula and model.frame behaviour when we need to pass the structures to various functions. For instance, capscale was calling rda.default in previous versions of vegan so that the data was handled in different environment (depth) in rda.default when it was called from rda directly or via capscale. Things have been simplified a lot and the interface unified in the current vegan allowing for simplification (but the old code still worked and was not changed).

About the formula(mf) issue: this is a change in R behaviour, and I am not sure if it is important for our case. The current R just assumes that model frame has a response, and takes the first variable to be that response. We removed the response (community data), but current R (release) does not know about this possibility. The original formula without the lhs can be extracted with

## R 3.5.x
formula(terms(mf))

but stats:::formula.data.frame does not query this, but it reconstructs the formula from names(mf), always taking the first name as the lhs, and quoting the names like poly(A1, 2). So I guess that R-devel has switched to different way of extracting the formula.

jarioksa pushed a commit that referenced this issue Jan 3, 2019
R 3.6.0 changed the behaviour of stats:::formula.data.frame, but
provided a compatibility function DF2formula. We use this function
in R 3.6.0, but R CMD check will issue a NOTE for this function
missing in R version where we do not use it. Now we define this
function when it is never used to keep R happy (and it exists when
it is used and we need to do nothing).

Together with the previous commit, this provides a quick & dirty
fix to github.com/vegandeves/vegan issue #299. ordiParseFormula
is cruft upon cruft, and this adds still one layer. The real
solution is to completely re-write formula interpretation in
constrained ordination, but that will not be quick.
jarioksa pushed a commit that referenced this issue Jan 3, 2019
For scoping, variable names with functions must be quoted in the
formula (for instance, `poly(A1, 2)`, `log(A1)`) so that the variable
(A1) is later found. stats:::formula.data.frame() did this in
R < 3.6.0, but now it returns unquoted formula and we must be
explicit.

This is a fix to github.com/vegandevs/vegan issue #299, and a bit
cleaner than the previous version. Now we explitly re-construct the
formula with back-quotes when needed, instead of using different
stats functions in R < 3.6.0 (stats:::formula.data.frame) and
R >= 3.6.0 (stats::DF2formula), and need no code dependent on
R version.
@jarioksa
Copy link
Contributor Author

jarioksa commented Jan 3, 2019

This really seems to be an issue with changes in stats:::formula.data.frame behaviour. Earlier it reconstructed the formula from model.frame names and back-quoted names which contained functions, but now it just fetches the saved formula. With current scoping in vegan, the new behaviour fails. This is pretty easy to fix by adding still one kludge to vegan:::ordiParseFormula, and @mmaechler even provides compatibility function stats::DF2formula to keep the old behaviour. However, I now suggest a fix that explicitly takes care of having back-quotes when we need them, and we do not need to query R version.

The changes are in branch https://github.com/vegandevs/vegan/tree/issue-%23299-kludge

This is yet another kludge in a kludgy function and the real solution is to re-write (or scrap) vegan:::ordiParseFormula. This is not something that you can do while having your afternoon coffee break, but needs time and will break vegan for a long time before getting everything to work again. Someday, someone, somehow...

@gavinsimpson
Copy link
Contributor

This looks good @jarioksa I mean ideally we'd rewrite it but that is a longer-term project and if this maintains current behaviour then that's good.

jarioksa pushed a commit that referenced this issue Jan 4, 2019
This closes issue #299 with a (temporary) kludge. The issue was
caused by changes in R-devel rev 75891 (maechler Sun, 23 Dec 2018)
which changed the behaviour of stats:::formula.data.frame() with
model.frames.
@jarioksa
Copy link
Contributor Author

jarioksa commented Jan 4, 2019

Closed with commit 0e7b8be. This is a kluge, and in long term, whole ordiParseFormula should be re-designed and re-written.

@jarioksa jarioksa closed this as completed Jan 4, 2019
jarioksa pushed a commit that referenced this issue Jan 4, 2019
R 3.6.0 changed the behaviour of stats:::formula.data.frame, but
provided a compatibility function DF2formula. We use this function
in R 3.6.0, but R CMD check will issue a NOTE for this function
missing in R version where we do not use it. Now we define this
function when it is never used to keep R happy (and it exists when
it is used and we need to do nothing).

Together with the previous commit, this provides a quick & dirty
fix to github.com/vegandeves/vegan issue #299. ordiParseFormula
is cruft upon cruft, and this adds still one layer. The real
solution is to completely re-write formula interpretation in
constrained ordination, but that will not be quick.

(cherry picked from commit ef5cdfe)
jarioksa pushed a commit that referenced this issue Jan 4, 2019
For scoping, variable names with functions must be quoted in the
formula (for instance, `poly(A1, 2)`, `log(A1)`) so that the variable
(A1) is later found. stats:::formula.data.frame() did this in
R < 3.6.0, but now it returns unquoted formula and we must be
explicit.

This is a fix to github.com/vegandevs/vegan issue #299, and a bit
cleaner than the previous version. Now we explitly re-construct the
formula with back-quotes when needed, instead of using different
stats functions in R < 3.6.0 (stats:::formula.data.frame) and
R >= 3.6.0 (stats::DF2formula), and need no code dependent on
R version.

(cherry picked from commit 114b319)
jarioksa pushed a commit that referenced this issue Jan 6, 2019
This is triggered by github.com/vegandevs/vegan issue #299. The
purpose is to have much simpler, clearly faster and more robust
version of ordiParseFormula. The work starts from the scratch and
implements first only a part of the original functionality thus
breaking vegan, but we try to gain the original functionality with
time.
@jarioksa jarioksa added this to the 2.5-4 milestone Jan 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants