-
Notifications
You must be signed in to change notification settings - Fork 44
Numerical problem with AU test #86
Comments
7-Sep-2017 I have attached a plot with the comparison of the AU p-values w/ and w/o consel. Since the p-values with iqtree/consel are comparable to raxml/consel it seems pretty clear that the issue is with the AU calculation w/in iqtree. I can only guess about what might be going wrong. There are some trees that have very large p-values under AU but that are small even under SH. I would guess that the BPs that AU uses for these trees can be very small or even 0. AU uses these BPs through Phi**(-1)(1-BP). If BP is 0, Phi**(-1)(1-BP) is infinite, so maybe some adjustment for BP=0 is what causes consel results to differ. The iqtree version of the AU test is much faster than iqtree/consel, so if you are able to make progress on this, I would be happy to hear about it. A student and I have been looking at procedures for calculating confidence sets of trees. In the course of doing so we seem to have come across a potential problem with the calculation of AU p-values in IQTREE. I have attached a plot for an example data set comparing p-values: AU RAXML/CONSEL vs AU IQTREE - many more large pvalues for IQTREE As you may know, SH should be a conservative test so the fact that AU IQTREE gives many more large pvalues than SH and gives many more large pvalues than AU RAXML/CONSEL suggests that it is the AU IQTREE values that are problematic rather than the RAXML/CONSEL ones I have attached the data file and tree file that I used. To obtain the p-values for RAXML/CONSEL I used $ raxmlHPC -s tmp.infile -n out -m GTRGAMMA -fg -T2 -z eight-taxon.trees > tmp.out To obtain the p-values for IQTREE, I used $ iqtree-omp -s tmp.infile -z eight-taxon.trees nt 1 -m GTR+G -n 0 -zb 10000 -au -wsl Note that this is a large example. We were comparing all 10395 eight-taxon trees. But it does seem that there is a discrepancy between the way that AU RAXML/CONSEL is calculating p-values and the way that IQTREE does. Further, in this case, AU RAXML/CONSEL seems more likely correct. So I thought you might want to know. Drop me a note if you make progress. The student will continue his work on the topic, so if a fix is available, it would be valuable to know about. |
After a careful investigation, the conclusion is that this problem is because the weighted least square approach to estimate AU p-values is NOT accurate when bp-RELL are zero or close to zero. So IQ-Tree now switches to the maximum likelihood estimates of AU p-values. This is included in the version v1.6.8. |
Personal comm. with Shimodaira:
July-Sep-2018
Minh & Heiko:
Dear Shimo,
I am writing to you regarding a question about some extreme cases in the AU test. (My colleague Minh has written about this also mid of July, but seemingly you have been too busy to answer.)
The question is about the following part in your AU paper:
Screen Shot 2018-09-10 at 10.05.25 am.png
The problem is that the quantile function of the standard normal distribution
\Phi^{-1}(0) = -infinity and
\Phi^{-1}(1) = +infinity
So the question is, how do you handle the cases in CONSEL (or in your R package) if BP is either 0% or 100%?
25-Sep-2018
Shimodaira:
Here is my quick answer based on my memory. If you think further issues, I may look at the code.
In Consel and pvclust, there are two estimation methods are implemented: weighted least square (WLS) and maximum likelihood estimator (MLE). The above formula is for WLS. WLS is used for computing the initial value for the iterative computation of MLE.
For WLE, the zero or one bp values are simply ignored.
For MLE, the zero and one values are properly included in the estimation (fully used). However, if all the scales has zero or one values, mle will not make sense, so I simply set AU = 0 or 1.
Probably, there are fine tuning in the code so that there is a threshold of the number of zero or one values to decide if AU = 0 or 1 used or try estimation.
The text was updated successfully, but these errors were encountered: