Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resurrect obsolete permatswap & permatfull #159

Open
jarioksa opened this issue Mar 30, 2016 · 4 comments
Open

resurrect obsolete permatswap & permatfull #159

jarioksa opened this issue Mar 30, 2016 · 4 comments
Milestone

Comments

@jarioksa
Copy link
Contributor

permatswap and permatfull provided the first quantitative null models in vegan and they were used in simulation in adipart and multipart. Later, the quantitative null models were transfrerred to make.commsim & nullmodel and adipart & multipart shifted away from permatswap and permatfull. Most of this work (both adding the functions and making them obsolote) was carried out by @psolymos . Currently no vegan function uses these functions, and their output is incompatible with simulate.nullmodel. However, these functions have some good properties:

  • they provide an alternative interface for defining a null model via desired properties instead of navigating through alternative models with cryptic names.
  • they also provide an option for stratified null models that we do not have in simulate.nullmodel framework.
  • they provide a range of diagnostic functions that can be directly applied to simulated models — oecosimu has a set of similar functions that can be used for the test statistics, but it may be useful to have dissimilarity-based diagnostics for generated matrices.

To resurrect permatswap and permatfull, we should do the following:

  1. permatswap & permatfull should produce a “simmat” object. That is, a 3-D array with attributes.
  2. The current diagnostic tools should be adapted to simmat objects so that they can be directly used with any simulate.nullmodel output. For this, we may need to add attribute “orig” to save the original file with the "simmat" array.
  3. I think the diagnostic tools (as.ts, as.mcmc) should probably be documented in a separate manual page (.Rd file) which also should contain the documentation of similar tools for the oecosimu results.
  4. I think we could also have a dissimilarity-against-orig style summary() method for all "simmat" object, also for non-sequential. That could only report the average properties of simulations against the “orig”. A summary() could also break these properties by row and by column to see how each of these varied separately in simulations.
  5. We also have simulate.rda, simulate.cca: we should study how these could be better linked with null model simulations. These functions simulate data under alternative model (= fit + randomized residual) and provide an intriguing alternative to quantitative null models.

Points 1 & 2 are the most important. The others are “nice to have or perhaps not” and not so urgent.

An alternative is to remove these functions. However, I have received some reports which indicate that people use these -- and even use them instead of our preferred nullmodel framework.

@psolymos
Copy link
Contributor

psolymos commented Apr 1, 2016

Nice outline. Note that the “orig” attribute exists under the name “data” (attr(*, "data")).

I'd add a 6th point to the list: add option for stratified null models in simulate.nullmodel.

I would also ponder about the kinds of diagnostics one might want to have for mull models. Some kind of distance metric with an option to choose what distance to use (arguments passed to vegdist or a function returning a dist object) is the best option to get a sense of convergence. What else might be useful? I can be tricky to implement effective sample size or autocorrelation metrics for multivariate (mostly discrete) distributions. Maybe we can calculate not only distance between the original matrix and subsequent null matrices, but also “step lengths” between subsequent matrices.

@jarioksa
Copy link
Contributor Author

jarioksa commented Apr 1, 2016

vegan does not have any proper and efficient way of handling dissimilarities between two data sets: we don't have vegdist(x, y) where each row of x is compared against the same line y. Therefore we cannot use any vegdist index, but we are limited in indices that deal with elementwise differences in matrices (x-orig, abs(x-orig), (x-orig)^2). I agree that we need to calculate only the kind of dissimilarities based on these simple comparison with two matrices. These would also be faster than doing pairwise vegdist or similar. We can have a plenty of dissimilarities with these elements (Euclidean, Manhattan, cosine, Bray, Ochiai, Chi-square etc.). Adding "auto-dissimilarity" between lagged pairs would also be doable. However, having all pairwise dissimilarities between all simulation steps would lead to a catastrophe & memory exhaustion.

@jarioksa jarioksa mentioned this issue May 20, 2016
6 tasks
@jarioksa jarioksa added this to the 2.4-0 milestone May 20, 2016
@jarioksa jarioksa modified the milestones: 2.4-0, 2.5-0 Jun 14, 2016
@jarioksa
Copy link
Contributor Author

@psolymos : I have started wrapping vegan up for the 2.5-0 release, and I noticed that this issue is still open. Is there a need to do something for this issue, or do we just let it be? Surely, this is not such a critical issue that this would delay release.

@psolymos
Copy link
Contributor

@jarioksa : let's leave them be for now.

@jarioksa jarioksa modified the milestones: 2.5-0, 2.6-0 Mar 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants