Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add veganCovEllipse() as an option to plot.betadisper() #166

Closed
friendly opened this issue Apr 16, 2016 · 16 comments
Closed

add veganCovEllipse() as an option to plot.betadisper() #166

friendly opened this issue Apr 16, 2016 · 16 comments
Assignees
Milestone

Comments

@friendly
Copy link
Contributor

Hulls are useful for one type of display, but covariance (data) ellipses provided by
veganCovEllipse are also useful. It would not be hard to add this via an option
ellipse=T/F, as an alternative to hull=T/F

@gavinsimpson
Copy link
Contributor

Sure, should be relatively easy to implement.

@gavinsimpson
Copy link
Contributor

I've added an option to do this, though I'm not 100% convinced I have this right yet. If you have any comment or input, see PR #167

@gavinsimpson
Copy link
Contributor

Closed via #167

@friendly
Copy link
Contributor Author

The most recent version is now extremely nice. thanks for the work on this./

@jarioksa
Copy link
Contributor

@gavinsimpson : When I look at the graph in the bottom of the heap, it seems that plot with ellipses uses centroids whereas the hulls (and the analysis) used spatial medians. It may be difficult to reconcile covariance ellipses (L2 norm) and median statistics (L1 norm).

@friendly
Copy link
Contributor Author

When I look at the plots on that page, it seems to me that the centers (group labels) are in the same positions in both the plot with hulls and that with ellipsoids. If you're referring to the fact that
the centers from which the segments are drawn are not at the ellipse centers (L2 norm), I don't find this a problem, as long as it is understood that the covariance ellipse is just a visual summary of the bivariate scatter of the points shown under normal theory in the space of the principal coordinates.

Ellipses may not be that useful for some distance metrics, but they certainly are for Euclidean distances.

@jarioksa
Copy link
Contributor

It's not about the metric (Euclidean or non-Euclidean), but it is about the way betadisper makes its calculations: we have argument type = c("median", "centroid"), where the default is "median" which also is displayed as the centroid of hulls. However, ellipses will always show "centroid" even when the analysis is not based on them. It seems we could use the calculated centroids, as cov.wt takes a center argument which can be a user-supplied vector of centres used in calculating covariances.

@gavinsimpson
Copy link
Contributor

@friendly the problem is that I didn't plot the ellipsoid centre at all; the code just plots the centroid component of the analysis, which by default is the cluster median (I'd forgotten we'd changed the default back when we were discussing bias adjustments in a previous update).

So two options:

  1. draw the ellipse centre if user asks for ellipses as the method used type = "median",
  2. refuse to draw ellipses if type = "median", with "refuse" indicated by a warning and no plotting.

At the very least this should go in the Rd file.

@jarioksa
Copy link
Contributor

I think there are two major-ish issues that should be solved:

  1. The ellipses should be drawn and covariances calculated w.r.t. centres used in the analysis. With default type = "median" these would be spatial medians. This is easily solved because centres can be given as argument center in cov.wt().
  2. The standard error and confidence ellipses should be disabled: betadisper() is about multivariate dispersion and that is displayed by covariance ellipses. Standard error and confidence ellipses display the error of the location of the centre (for median this is a bit involved, though) which is not the subject of the analysis in betadisper(). There are other tools in vegan to display ellipses for centres (display with ordiellipse(), test with envfit() or with adonis() using only one class variable).

@jarioksa jarioksa reopened this Apr 21, 2016
@gavinsimpson
Copy link
Contributor

I agree here, after a little thought, and have implemented both in my local branch.

One question arises; do we still want to adjust the size of the sd ellipse or just leave it drawn at 1sd? I presume we could allow this sort of user-control via (suitably renamed) ellipse.conf argument. Or should we just plot 1-sd ellipses without allowing user control of this?

For future changes, I am looking into drawing a bag plot for the groups as that seems quite aligned with the goals of assessing bivariate distributions.

@jarioksa
Copy link
Contributor

I have no firm opinion for SD-multiplier. I could live without that and have a simpler UI and shorter man page with shorter list of arguments.

About bag plots: the plot is bivariate, but it is only a bivariate shadow (well, projection) of n-dimensional space. The actual statistics (we report distances of points to centres) and tests work in n dimensions, but we only plot two in hope that this would show much of the real thing. BTW, it could be possible to add a plot with 3D ellipses in vegan3d.

@friendly
Copy link
Contributor Author

I'm sorry if I raised a can of worms here to cause you to debate all the possible options for plot.betadisper() and whether my request for an option to draw the data ellipses for the points in this space was useful or could be potentially misleading for vegan users, or didn't fit somehow with your philosophy for the package. This last, of course, is up to you, and I was happy to see how you @gavinsimpson and @jarioksa work jointly on this project, and respond to requests by users.

All I can say, is that in my work on a current project concerning visualizing tests of covariance matrices in multivariate linear models, betadisper() was perfect for me to illustrate the dispersion approach stemming from Anderson, but the plot method, with convex hulls was unsuitable.

Using a previous commit, #167, I could get what I wanted for one example via

library(vegan)

dst <- dist(iris[,1:4])
iris.bd <- betadisper(dst, iris$Species)
labs <- paste("Dimension", 1:4, "(", 
              round(100*iris.bd$eig / sum(iris.bd$eig), 2), "%)")
plot(iris.bd, cex=1, pch=15:17,
     main="Iris data: MDS coordinates", cex.lab=1.25,
     xlab=labs[1], ylab=labs[2],
     hull=FALSE, ellipse=TRUE, ellipse.conf=0.68, lwd=2)

This gave the plot below, that illustrated exactly what I wanted to show for this example.

iris-betadisp-plot-1

Here, it was important to me to show the 68% (~ \pm 1 bivariate sd) data ellipses and I didn't care how precisely they were centered, because the main thing was the sizes and shapes of the covariance ellipses.

If you decide to eliminate this feature from plot.betadisper(), I would appreciate if you could
tell me how to produce an equivalent display with other functions in vegan.

Thanks once again for being responsive to requests by users.

@jarioksa
Copy link
Contributor

jarioksa commented Apr 25, 2016

The intention of betadisper is indeed to compare only sizes of ellipses. These sizes (covariances) will be calculated with respect to the centres used, and these centres default to spatial medians. These are somewhat wider than covariances calculated with respect to sample means. Therefore the centres are also important. We are maintaining the options to plot SD ellipses, but you must set type="centroid" to get the ellipses of correct sizes for the plot. Jury is still out for the modification of SD's that you seem to ask for.

It is purely incidental that betadisper gave results you wanted to have. It was not our intention. However, there is a direct way of doing the same thing directly on PCoA (or any ordination) and with support functions ordispider and ordiellipse:

library(vegan)
dst <- dist(iris[,1:4])
iris.bd <- wcmdscale(dst, eig = TRUE)  # PCoA
labs <- paste0("Dimension ", 1:4, " (", 
              round(100*iris.bd$eig / sum(iris.bd$eig), 2), "%)")
pl <- plot(iris.bd, main="Iris data: MDS coordinates", cex.lab=1.25,
     xlab=labs[1], ylab=labs[2], type = "n")
points(pl, "sites", pch=as.numeric(iris$Species)+14, col=as.numeric(iris$Species))
ordispider(iris.bd, iris$Species, col=1:3)    # col= usage needs vegan 2.4-0
ordiellipse(iris.bd, iris$Species, col=1:3, draw="poly", conf=0.68, label = TRUE)

@friendly
Copy link
Contributor Author

That is very helpful; thanks.

BTW, ordiArgAbsorber() provides a nice way to handle ... in functions that use graphics parameters to several different graphics functions.

@jarioksa
Copy link
Contributor

You should thank @gavinsimpson for ordiArgAbsorber: it was his itch that he scratched. I didn't care about all those warnings of handling ..., but Gav wanted to have things right.

@jarioksa jarioksa mentioned this issue May 20, 2016
6 tasks
@jarioksa jarioksa added this to the 2.4-0 milestone May 20, 2016
@gavinsimpson
Copy link
Contributor

I've just merged changes from my add-ellipse-betadisper branch to the master branch here: baea831

I think this addresses the concerns about focussing too much on tests for location shifts whilst maintaining the idea of using a data ellipse rather than a convex hull.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants